Files
assistant/docs/architecture.md

2.9 KiB

Vela Architecture

High-Level Architecture

[ Browser (PWA UI) ]
        |
   WebSocket
        |
[ Vela Gateway (NanoPi R6S) ]
        |
        +--> STT (local or NAS)
        +--> Ollama (NAS GPU)
        +--> Kokoro TTS (NAS or NanoPi)
        +--> Home Assistant
        +--> SearXNG

Core Components

Repository Structure

apps/
  vela-ui/
  vela-gateway/

The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.

Frontend — vela-ui

Tech

  • SvelteKit
  • PWA enabled
  • WebSocket client

The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments.

Responsibilities

  • audio capture from microphone
  • audio playback for TTS
  • UI state rendering
  • session management
  • interrupt handling

Main Screen

  • large mic button
  • live transcript
  • streamed assistant response text
  • state indicator:
    • idle
    • listening
    • thinking
    • speaking
  • interrupt button during speaking

Backend — vela-gateway

Tech

  • Fastify (Node)
  • WebSocket-based session layer

The current implementation is a minimal Fastify service with /, /health, and a documented /ws WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect.

Responsibilities

  • session lifecycle
  • audio ingestion
  • STT orchestration
  • LLM orchestration
  • tool execution
  • TTS orchestration
  • event streaming

Current WebSocket skeleton

  • GET /ws documents the route for plain HTTP clients and returns 426 Upgrade Required
  • WebSocket upgrades on /ws create an ephemeral session immediately
  • the gateway sends session.ready followed by session.state (idle) when the socket is established
  • valid minimal client events can move the session between idle and listening
  • invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up

Voice Pipeline

Mic → Gateway → STT → Transcript
→ LLM → Tool Calls → Results
→ LLM → Final Response
→ TTS → Audio Stream → UI

Gateway Internal Flow

1. Receive audio
2. Run STT (streaming)
3. Emit partial transcripts
4. On final:
   → call LLM
5. LLM decides:
   → direct response OR tool call
6. Execute tool
7. Feed result back to LLM
8. Generate final response
9. Send text stream
10. Send TTS stream

LLM Layer

Location

  • NAS with RTX 3050 8GB

Role

  • intent parsing
  • tool selection
  • response generation

Constraints

  • must use a tool-calling schema
  • must not directly control systems
  • target approximately 7B-class models because of hardware limits

Naming

  • system: Vela
  • gateway: vela-gateway
  • UI: vela-ui
  • voice profile: vela-neutral