Nala is a voice-first conversational AI web app. Users speak into their mic, a local LLM running in the browser via WebLLM (WebGPU) generates responses, and the browser speaks them aloud via SpeechSynthesis. All conversations persist in PostgreSQL. No external API dependencies.
Browser (React + Vite + TS)
├── WebLLM (MLCEngine) — local LLM inference via WebGPU
├── Web Speech API (SpeechRecognition for input, SpeechSynthesis for output)
├── Voice waveform visualization (canvas-based)
└── REST client
│
▼
Express.js API Server (TypeScript) — persistence only
├── POST /api/conversations — create conversation
├── GET /api/conversations — list conversations
├── GET /api/conversations/:id — get conversation with messages
├── PATCH /api/conversations/:id — update title
├── DELETE /api/conversations/:id — delete conversation
└── POST /api/conversations/:id/messages — persist a single message
│
└── PostgreSQL — conversation and message persistence
- users: id (uuid), name, created_at
- conversations: id (uuid), user_id (fk), title, created_at, updated_at
- messages: id (uuid), conversation_id (fk), role (user|assistant), content (text), created_at
- WebLLM over Claude API: no external API coupling, no API keys, no per-request costs, fully self-contained. Requires WebGPU (Chrome 113+).
- Web Speech API over third-party STT/TTS: zero cost, no API keys, works in modern browsers, good enough quality for MVP.
- PostgreSQL over SQLite/file storage: proper relational model for conversations + messages, scales if needed.
- REST over WebSocket: simpler to implement and debug. Voice and AI run entirely in the browser, so the server only handles persistence.
- Vite over CRA/Next: fast dev server, simple config, no SSR needed for a SPA.
- Users table but no auth: schema is multi-user ready (conversations have user_id FK), hardcoded default user for now.
- Backend as pure persistence layer: all AI inference happens client-side. Server stores and retrieves data only.
- TypeScript strict mode everywhere (
"strict": true) - ESLint + Prettier for formatting
- Functional React components with hooks only — no class components
- Named exports over default exports
- Use
async/awaitover.then()chains - Error handling: try/catch in async functions, error boundaries in React
- Files:
kebab-case.ts/kebab-case.tsx - Components:
PascalCase(e.g.,Waveform,ConversationList) - Functions/variables:
camelCase - Constants:
UPPER_SNAKE_CASE - Types/interfaces:
PascalCase, noIprefix - Database columns:
snake_case - API routes:
kebab-case(e.g.,/api/conversations)
nala/
├── client/ # React frontend
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── hooks/ # Custom hooks (useVoiceInput, useWebLLM, etc.)
│ │ ├── services/ # API client functions
│ │ ├── main.tsx
│ │ └── index.css
│ ├── index.html
│ ├── vite.config.ts
│ └── tsconfig.json
├── server/ # Express backend (persistence only)
│ ├── src/
│ │ ├── routes/ # Express route handlers
│ │ ├── db/ # Database connection, queries, migrations
│ │ ├── types/ # Server-specific types
│ │ └── index.ts # Server entry point
│ └── tsconfig.json
├── shared/ # Shared TypeScript types
│ ├── types.ts
│ └── package.json
├── CLAUDE.md
└── package.json # Root package.json with workspaces
- Conversation context: Client fetches last N messages from the server and builds the prompt locally for WebLLM.
- Voice flow: Browser captures speech → WebLLM generates response locally → browser speaks it via SpeechSynthesis → both messages persisted to server. All processing stays in the browser.
- Waveform visualization: Use
AnalyserNodefrom Web Audio API connected to the mic stream. Render frequency data on a<canvas>element withrequestAnimationFrame. Switches to pulse animation while LLM is generating. - Message persistence: Client sends two separate POST requests per exchange — user message before generation, assistant message after. Crash-safe: user input is never lost.