Add a minimal UI shell that connects to the gateway WebSocket and exposes developer-visible session state. Align the architecture, protocol, setup, integration, and backlog docs with the current UI increment.
3.9 KiB
Vela Architecture
High-Level Architecture
[ Browser (PWA UI) ]
|
WebSocket
|
[ Vela Gateway (NanoPi R6S) ]
|
+--> STT (local or NAS)
+--> Ollama (NAS GPU)
+--> Kokoro TTS (NAS or NanoPi)
+--> Home Assistant
+--> SearXNG
Core Components
Repository Structure
apps/
vela-ui/
vela-gateway/
The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
Frontend — vela-ui
Tech
- SvelteKit
- PWA enabled
- WebSocket client
The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway /ws endpoint, show explicit connection status (not connected, connecting, connected, disconnected, error), and surface session metadata for developers. Microphone capture, transcript rendering, interrupt controls, streamed assistant response display, and audio playback are not part of the current shell and remain future work.
Responsibilities
Current shell responsibilities:
- connection state rendering
- developer-oriented session metadata rendering
- browser session connect/disconnect controls
Future UI responsibilities:
- audio capture from microphone
- audio playback for TTS
- broader voice-session UI state rendering
- interrupt handling
Main Screen
Current shell:
- developer-focused voice-session panel
- connect button
- disconnect button
- connection status indicator
- session metadata display
Future interactive voice screen:
- large mic button
- live transcript
- streamed assistant response text
- state indicator:
- idle
- listening
- thinking
- speaking
- interrupt button during speaking
Backend — vela-gateway
Tech
- Fastify (Node)
- WebSocket-based session layer
The current implementation is a minimal Fastify service with /, /health, and a documented /ws WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect.
Responsibilities
- session lifecycle
- audio ingestion
- STT orchestration
- LLM orchestration
- tool execution
- TTS orchestration
- event streaming
Current WebSocket skeleton
GET /wsdocuments the route for plain HTTP clients and returns426 Upgrade Required- WebSocket upgrades on
/wscreate an ephemeral session immediately - the gateway sends
session.readyfollowed bysession.state(idle) when the socket is established - valid minimal client events can move the session between
idleandlistening - invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up
Current UI shell behavior
- renders a minimal developer-focused voice-session panel
- exposes connect and disconnect controls only
- does not request microphone permission
- does not send or process audio data
- reads
session.ready,session.state, anderrormessages from the shared protocol contract
Voice Pipeline
Mic → Gateway → STT → Transcript
→ LLM → Tool Calls → Results
→ LLM → Final Response
→ TTS → Audio Stream → UI
Gateway Internal Flow
1. Receive audio
2. Run STT (streaming)
3. Emit partial transcripts
4. On final:
→ call LLM
5. LLM decides:
→ direct response OR tool call
6. Execute tool
7. Feed result back to LLM
8. Generate final response
9. Send text stream
10. Send TTS stream
LLM Layer
Location
- NAS with RTX 3050 8GB
Role
- intent parsing
- tool selection
- response generation
Constraints
- must use a tool-calling schema
- must not directly control systems
- target approximately 7B-class models because of hardware limits
Naming
- system: Vela
- gateway:
vela-gateway - UI:
vela-ui - voice profile:
vela-neutral