# Vela Architecture ## High-Level Architecture ```text [ Browser (PWA UI) ] | WebSocket | [ Vela Gateway (NanoPi R6S) ] | +--> STT (local or NAS) +--> Ollama (NAS GPU) +--> Kokoro TTS (NAS or NanoPi) +--> Home Assistant +--> SearXNG ``` ## Core Components ## Repository Structure ```text apps/ vela-ui/ vela-gateway/ ``` The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation. ### Frontend — `vela-ui` #### Tech - SvelteKit - PWA enabled - WebSocket client The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments. #### Responsibilities - audio capture from microphone - audio playback for TTS - UI state rendering - session management - interrupt handling #### Main Screen - large mic button - live transcript - streamed assistant response text - state indicator: - idle - listening - thinking - speaking - interrupt button during speaking ### Backend — `vela-gateway` #### Tech - Fastify (Node) - WebSocket-based session layer The current implementation is a minimal Fastify service with `/`, `/health`, and a documented `/ws` WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect. #### Responsibilities - session lifecycle - audio ingestion - STT orchestration - LLM orchestration - tool execution - TTS orchestration - event streaming #### Current WebSocket skeleton - `GET /ws` documents the route for plain HTTP clients and returns `426 Upgrade Required` - WebSocket upgrades on `/ws` create an ephemeral session immediately - the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established - valid minimal client events can move the session between `idle` and `listening` - invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up ## Voice Pipeline ```text Mic → Gateway → STT → Transcript → LLM → Tool Calls → Results → LLM → Final Response → TTS → Audio Stream → UI ``` ## Gateway Internal Flow ```text 1. Receive audio 2. Run STT (streaming) 3. Emit partial transcripts 4. On final: → call LLM 5. LLM decides: → direct response OR tool call 6. Execute tool 7. Feed result back to LLM 8. Generate final response 9. Send text stream 10. Send TTS stream ``` ## LLM Layer ### Location - NAS with RTX 3050 8GB ### Role - intent parsing - tool selection - response generation ### Constraints - must use a tool-calling schema - must not directly control systems - target approximately 7B-class models because of hardware limits ## Naming - system: **Vela** - gateway: `vela-gateway` - UI: `vela-ui` - voice profile: `vela-neutral`