# Vela Architecture ## High-Level Architecture ```text [ Browser (PWA UI) ] | WebSocket | [ Vela Gateway (NanoPi R6S) ] | +--> STT (local or NAS) +--> Ollama (NAS GPU) +--> Kokoro TTS (NAS or NanoPi) +--> Home Assistant +--> SearXNG ``` ## Core Components ## Repository Structure ```text apps/ vela-ui/ vela-gateway/ ``` The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation. ### Frontend — `vela-ui` #### Tech - SvelteKit - PWA enabled - WebSocket client The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), and surface session metadata for developers. Microphone capture, transcript rendering, interrupt controls, streamed assistant response display, and audio playback are not part of the current shell and remain future work. #### Responsibilities Current shell responsibilities: - connection state rendering - developer-oriented session metadata rendering - browser session connect/disconnect controls Future UI responsibilities: - audio capture from microphone - audio playback for TTS - broader voice-session UI state rendering - interrupt handling #### Main Screen Current shell: - developer-focused voice-session panel - connect button - disconnect button - connection status indicator - session metadata display Future interactive voice screen: - large mic button - live transcript - streamed assistant response text - state indicator: - idle - listening - thinking - speaking - interrupt button during speaking ### Backend — `vela-gateway` #### Tech - Fastify (Node) - WebSocket-based session layer The current implementation is a minimal Fastify service with `/`, `/health`, and a documented `/ws` WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect. #### Responsibilities - session lifecycle - audio ingestion - STT orchestration - LLM orchestration - tool execution - TTS orchestration - event streaming #### Current WebSocket skeleton - `GET /ws` documents the route for plain HTTP clients and returns `426 Upgrade Required` - WebSocket upgrades on `/ws` create an ephemeral session immediately - the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established - valid minimal client events can move the session between `idle` and `listening` - invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up ### Current UI shell behavior - renders a minimal developer-focused voice-session panel - exposes connect and disconnect controls only - does not request microphone permission - does not send or process audio data - reads `session.ready`, `session.state`, and `error` messages from the shared protocol contract ## Voice Pipeline ```text Mic → Gateway → STT → Transcript → LLM → Tool Calls → Results → LLM → Final Response → TTS → Audio Stream → UI ``` ## Gateway Internal Flow ```text 1. Receive audio 2. Run STT (streaming) 3. Emit partial transcripts 4. On final: → call LLM 5. LLM decides: → direct response OR tool call 6. Execute tool 7. Feed result back to LLM 8. Generate final response 9. Send text stream 10. Send TTS stream ``` ## LLM Layer ### Location - NAS with RTX 3050 8GB ### Role - intent parsing - tool selection - response generation ### Constraints - must use a tool-calling schema - must not directly control systems - target approximately 7B-class models because of hardware limits ## Naming - system: **Vela** - gateway: `vela-gateway` - UI: `vela-ui` - voice profile: `vela-neutral`