Files
assistant/docs/architecture.md
Johannes Kresner 4b11703c93 feat(vela-ui): add voice session shell
Add a minimal UI shell that connects to the gateway WebSocket and exposes developer-visible session state. Align the architecture, protocol, setup, integration, and backlog docs with the current UI increment.
2026-04-08 18:40:45 +02:00

163 lines
3.9 KiB
Markdown

# Vela Architecture
## High-Level Architecture
```text
[ Browser (PWA UI) ]
|
WebSocket
|
[ Vela Gateway (NanoPi R6S) ]
|
+--> STT (local or NAS)
+--> Ollama (NAS GPU)
+--> Kokoro TTS (NAS or NanoPi)
+--> Home Assistant
+--> SearXNG
```
## Core Components
## Repository Structure
```text
apps/
vela-ui/
vela-gateway/
```
The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
### Frontend — `vela-ui`
#### Tech
- SvelteKit
- PWA enabled
- WebSocket client
The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), and surface session metadata for developers. Microphone capture, transcript rendering, interrupt controls, streamed assistant response display, and audio playback are not part of the current shell and remain future work.
#### Responsibilities
Current shell responsibilities:
- connection state rendering
- developer-oriented session metadata rendering
- browser session connect/disconnect controls
Future UI responsibilities:
- audio capture from microphone
- audio playback for TTS
- broader voice-session UI state rendering
- interrupt handling
#### Main Screen
Current shell:
- developer-focused voice-session panel
- connect button
- disconnect button
- connection status indicator
- session metadata display
Future interactive voice screen:
- large mic button
- live transcript
- streamed assistant response text
- state indicator:
- idle
- listening
- thinking
- speaking
- interrupt button during speaking
### Backend — `vela-gateway`
#### Tech
- Fastify (Node)
- WebSocket-based session layer
The current implementation is a minimal Fastify service with `/`, `/health`, and a documented `/ws` WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect.
#### Responsibilities
- session lifecycle
- audio ingestion
- STT orchestration
- LLM orchestration
- tool execution
- TTS orchestration
- event streaming
#### Current WebSocket skeleton
- `GET /ws` documents the route for plain HTTP clients and returns `426 Upgrade Required`
- WebSocket upgrades on `/ws` create an ephemeral session immediately
- the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established
- valid minimal client events can move the session between `idle` and `listening`
- invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up
### Current UI shell behavior
- renders a minimal developer-focused voice-session panel
- exposes connect and disconnect controls only
- does not request microphone permission
- does not send or process audio data
- reads `session.ready`, `session.state`, and `error` messages from the shared protocol contract
## Voice Pipeline
```text
Mic → Gateway → STT → Transcript
→ LLM → Tool Calls → Results
→ LLM → Final Response
→ TTS → Audio Stream → UI
```
## Gateway Internal Flow
```text
1. Receive audio
2. Run STT (streaming)
3. Emit partial transcripts
4. On final:
→ call LLM
5. LLM decides:
→ direct response OR tool call
6. Execute tool
7. Feed result back to LLM
8. Generate final response
9. Send text stream
10. Send TTS stream
```
## LLM Layer
### Location
- NAS with RTX 3050 8GB
### Role
- intent parsing
- tool selection
- response generation
### Constraints
- must use a tool-calling schema
- must not directly control systems
- target approximately 7B-class models because of hardware limits
## Naming
- system: **Vela**
- gateway: `vela-gateway`
- UI: `vela-ui`
- voice profile: `vela-neutral`