feat: bootstrap vela UI and gateway workspace

Establish the monorepo, tooling, and starter apps so UI and gateway development can begin from a documented, runnable baseline.
2026-04-08 17:49:46 +02:00
commit bba0095bc0
23 changed files with 2023 additions and 0 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,129 @@
+# Vela Architecture
+
+## High-Level Architecture
+
+```text
+[ Browser (PWA UI) ]
+        |
+   WebSocket
+        |
+[ Vela Gateway (NanoPi R6S) ]
+        |
+        +--> STT (local or NAS)
+        +--> Ollama (NAS GPU)
+        +--> Kokoro TTS (NAS or NanoPi)
+        +--> Home Assistant
+        +--> SearXNG
+```
+
+## Core Components
+
+## Repository Structure
+
+```text
+apps/
+  vela-ui/
+  vela-gateway/
+```
+
+The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
+
+### Frontend — `vela-ui`
+
+#### Tech
+
+- SvelteKit
+- PWA enabled
+- WebSocket client
+
+The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments.
+
+#### Responsibilities
+
+- audio capture from microphone
+- audio playback for TTS
+- UI state rendering
+- session management
+- interrupt handling
+
+#### Main Screen
+
+- large mic button
+- live transcript
+- streamed assistant response text
+- state indicator:
+  - idle
+  - listening
+  - thinking
+  - speaking
+- interrupt button during speaking
+
+### Backend — `vela-gateway`
+
+#### Tech
+
+- Fastify (Node)
+- WebSocket-based session layer
+
+The current implementation is a minimal Fastify service with `/` and `/health` HTTP endpoints. The WebSocket session layer is a later increment.
+
+#### Responsibilities
+
+- session lifecycle
+- audio ingestion
+- STT orchestration
+- LLM orchestration
+- tool execution
+- TTS orchestration
+- event streaming
+
+## Voice Pipeline
+
+```text
+Mic → Gateway → STT → Transcript
+→ LLM → Tool Calls → Results
+→ LLM → Final Response
+→ TTS → Audio Stream → UI
+```
+
+## Gateway Internal Flow
+
+```text
+1. Receive audio
+2. Run STT (streaming)
+3. Emit partial transcripts
+4. On final:
+   → call LLM
+5. LLM decides:
+   → direct response OR tool call
+6. Execute tool
+7. Feed result back to LLM
+8. Generate final response
+9. Send text stream
+10. Send TTS stream
+```
+
+## LLM Layer
+
+### Location
+
+- NAS with RTX 3050 8GB
+
+### Role
+
+- intent parsing
+- tool selection
+- response generation
+
+### Constraints
+
+- must use a tool-calling schema
+- must not directly control systems
+- target approximately 7B-class models because of hardware limits
+
+## Naming
+
+- system: **Vela**
+- gateway: `vela-gateway`
+- UI: `vela-ui`
+- voice profile: `vela-neutral`
--- a/docs/backlog.md
+++ b/docs/backlog.md
@@ -0,0 +1,183 @@
+# Vela Phased Backlog
+
+This backlog is the implementation plan translated into phased, actionable work. It should be updated whenever implementation changes scope, ordering, or done criteria.
+
+## Phase 1 — Foundation and Contracts
+
+### Goal
+
+Establish the boundaries, protocol, and state model for the system before integrating providers.
+
+### Backlog Items
+
+- [x] define repository structure for `vela-ui` and `vela-gateway`
+- define the WebSocket event contract used by the UI and gateway
+- define the session state machine and interrupt semantics
+- define provider adapter interfaces for STT, LLM, TTS, and tools
+- document error handling and cancellation behavior
+
+### Exit Criteria
+
+- protocol and state machine are documented
+- UI and gateway responsibilities are explicit
+- interrupt behavior is defined for every active phase
+- provider boundaries are clear enough to implement mocks first
+
+## Phase 2 — Vertical Slice Skeleton
+
+### Goal
+
+Prove the end-to-end interaction model with mocked or stubbed providers.
+
+### Backlog Items
+
+- [x] bootstrap `vela-ui` as a runnable SvelteKit app in the Yarn workspace
+- [x] bootstrap `vela-gateway` as a runnable Fastify app in the Yarn workspace
+- create a minimal UI with mic control, state indicator, transcript, and response text
+- create a gateway WebSocket session skeleton
+- implement mocked STT flow for partial and final transcript events
+- implement mocked LLM response streaming
+- implement stubbed audio playback or placeholder TTS output
+- implement interrupt handling across the mocked pipeline
+
+### Exit Criteria
+
+- one client can complete a voice turn through the real UI↔gateway contract
+- transcript appears in the UI
+- assistant text appears progressively or in structured steps
+- audio playback or stubbed playback is visible to the user
+- interrupt stops the active response and resets state cleanly
+
+## Phase 3 — Real STT Integration
+
+### Goal
+
+Replace the mocked transcription layer with a real streaming STT provider.
+
+### Backlog Items
+
+- integrate `whisper.cpp` behind the STT adapter
+- support partial and final transcript delivery
+- handle audio format conversion if browser capture format differs
+- handle late transcript events after cancellation
+- expose recoverable error handling for STT failures
+
+### Exit Criteria
+
+- live mic audio produces usable transcripts
+- partial and final results reach the UI
+- cancellation prevents late transcript results from corrupting session state
+- STT failure paths are visible and recoverable
+
+## Phase 4 — Ollama Streaming and Tool Calling
+
+### Goal
+
+Replace the mocked reasoning layer with real LLM orchestration.
+
+### Backlog Items
+
+- integrate Ollama behind the LLM adapter
+- stream assistant text deltas to the UI
+- define and validate tool-calling schema
+- reject invalid or unsafe tool calls
+- support interrupt during active thinking
+
+### Exit Criteria
+
+- assistant responses stream from Ollama
+- invalid tool requests fail safely
+- cancellation stops active model work
+- the LLM cannot directly execute external actions
+
+## Phase 5 — Tool Layer
+
+### Goal
+
+Introduce useful tools in increasing order of operational risk.
+
+### Backlog Items
+
+- implement SearXNG search adapter
+- normalize search results for LLM consumption
+- implement Home Assistant read actions
+- implement Home Assistant write actions gated by confirmation
+- implement clarification flow for ambiguous tool requests
+
+### Exit Criteria
+
+- web search works end-to-end
+- Home Assistant read queries work for approved entities
+- Home Assistant write actions require explicit confirmation
+- ambiguous actions do not execute automatically
+
+## Phase 6 — Kokoro TTS
+
+### Goal
+
+Convert assistant text responses into spoken output.
+
+### Backlog Items
+
+- integrate Kokoro behind the TTS adapter
+- support streamed audio when practical
+- add a temporary fallback for full-response playback if streaming is not ready
+- stop or suppress playback correctly on interrupt
+
+### Exit Criteria
+
+- spoken output plays in the UI
+- interrupt stops or suppresses playback reliably
+- any non-streaming fallback is explicitly documented as temporary
+
+## Phase 7 — Resilience and Performance
+
+### Goal
+
+Make the system robust enough for routine use on the target hardware.
+
+### Backlog Items
+
+- handle disconnect and reconnect cleanly
+- add bounded timeouts for STT, LLM, tool, and TTS calls
+- measure latency by pipeline stage
+- improve buffering and recovery paths for flaky network dependencies
+- validate behavior under cancellation and partial failure
+
+### Exit Criteria
+
+- common network and provider failures do not leave sessions stuck
+- latency is measurable at each major stage
+- user-visible recovery paths exist for expected failure modes
+
+## Phase 8 — Productization and Secondary Surfaces
+
+### Goal
+
+Polish the system after the core voice loop is reliable.
+
+### Backlog Items
+
+- add PWA installability and UX polish
+- implement `/history`
+- implement `/settings`
+- implement `/admin`
+- document operational settings and maintenance guidance
+
+### Exit Criteria
+
+- the app is installable as a PWA
+- secondary screens exist without degrading the core voice loop
+- supporting docs reflect the implemented behavior
+
+## Ongoing Documentation Tasks
+
+- update docs whenever implementation changes the protocol, architecture, integrations, deployment, or backlog order
+- mark completed backlog items or split phases into smaller slices as work progresses
+- keep root `README.md` as the entrypoint and keep detailed technical docs in `docs/`
+
+## Current Progress Notes
+
+- `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page
+- `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints
+- backend framework choice is now concrete: Fastify
--- a/docs/deployment.md
+++ b/docs/deployment.md
@@ -0,0 +1,67 @@
+# Vela Deployment and Operations
+
+## Deployment Layout
+
+### NanoPi R6S
+
+```yaml
+services:
+  ui:
+    build: ./apps/vela-ui
+
+  gateway:
+    build: ./apps/vela-gateway
+    environment:
+      OLLAMA_URL: http://nas:11434
+      KOKORO_URL: http://nas:8880
+      HASS_URL: http://homeassistant:8123
+      SEARXNG_URL: http://searxng:8080/search
+```
+
+### NAS
+
+```yaml
+services:
+  ollama:
+    image: ollama/ollama
+
+  kokoro:
+    image: kokoro-tts
+```
+
+## Networking
+
+- all services should be reachable on the internal network
+- expected reverse proxy routes:
+  - `/` → UI
+  - `/api` or `/health` → gateway HTTP routes
+  - `/ws` → WebSocket
+
+## Security
+
+- Home Assistant token stored server-side only
+- no secrets in the frontend
+- internal network isolation preferred
+- optional gateway auth can be added later if deployment needs it
+
+## Performance Targets
+
+- wake to response start: under 1.5s
+- STT latency: under 800ms
+- TTS start latency: under 500ms
+- full roundtrip: under 3s
+
+## Key Risks
+
+| Risk | Mitigation |
+| --- | --- |
+| STT latency on NanoPi | move STT to NAS |
+| TTS performance | run TTS on NAS |
+| LLM hallucinating actions | enforce strict tool schema |
+| WebSocket instability | add heartbeat and reconnect handling |
+| Audio sync issues | use chunked streaming and buffering |
+
+## Documentation Maintenance
+
+- update this document when deployment topology, networking, or service placement changes
+- keep performance targets and risk mitigations aligned with the current implementation state
--- a/docs/integrations.md
+++ b/docs/integrations.md
@@ -0,0 +1,97 @@
+# Vela Integrations and Tool Safety
+
+## Current Runtime Baseline
+
+- `vela-ui` is implemented as a SvelteKit application
+- `vela-gateway` is implemented as a Fastify service
+- current integration work beyond the gateway HTTP baseline remains future implementation
+
+## STT (Speech-to-Text)
+
+### Primary Option
+
+- `whisper.cpp`
+
+### Deployment
+
+- start on NanoPi
+- move to NAS if latency is insufficient
+
+### Requirements
+
+- streaming transcription
+- partial and final output
+- low latency, with sub-second response preferred
+
+## TTS (Text-to-Speech)
+
+### Engine
+
+- Kokoro TTS
+
+### Deployment
+
+- prefer NAS for more compute headroom
+
+### API Contract
+
+```http
+POST /speak
+{
+  "text": "...",
+  "voice": "vela",
+  "format": "wav"
+}
+```
+
+### Requirements
+
+- streaming audio preferred
+- low startup latency
+- interrupt support
+
+## Tool System
+
+### Home Assistant Tool
+
+#### Functions
+
+```ts
+turn_on(entity_id);
+turn_off(entity_id);
+set_temperature(entity_id, value);
+get_state(entity_id);
+```
+
+#### Backend
+
+- REST API
+- optional Conversation API
+
+#### Safety
+
+- require confirmation for destructive actions
+- require confirmation for irreversible or significant state changes
+- keep secrets server-side only
+
+### SearXNG Tool
+
+#### Endpoint
+
+```http
+GET /search?q=...&format=json
+```
+
+#### Flow
+
+- query SearXNG
+- return top results
+- let the LLM summarize the result set
+
+## Safety Rules
+
+- the LLM does not directly control systems
+- all external actions go through explicit tool adapters
+- Home Assistant write actions require confirmation
+- frontend must not contain Home Assistant tokens or other secrets
+- ambiguous tool intents should be clarified instead of guessed
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -0,0 +1,92 @@
+# Vela Overview
+
+## Objective
+
+Vela is a fully local, voice-first assistant system with:
+
+- local-first architecture and no mandatory cloud dependencies
+- natural TTS output via Kokoro
+- voice-driven interaction as the primary interface
+- integrations with Home Assistant and SearXNG
+- a lightweight SvelteKit PWA
+- remote LLM inference via Ollama on a NAS
+
+## Core Design Principles
+
+### Voice-first
+
+- UI optimized for speaking instead of typing
+- minimal visual clutter
+- real-time feedback through partial transcripts and streaming responses
+
+### Local-first
+
+- no required cloud APIs
+- all services self-hosted
+- browser used for capture and playback only
+
+### Tool-driven intelligence
+
+- the LLM does not directly control external systems
+- all external actions route through explicit tools
+
+### Low-latency interaction
+
+- streaming STT partial results
+- streaming LLM token output
+- streaming TTS audio chunks
+- interruptible responses
+
+## Product Scope
+
+### Primary Interface
+
+- browser-based PWA
+- push-to-talk interaction
+- transcript and response display
+- playback of streamed or returned audio
+
+### Secondary Screens
+
+- `/history`
+- `/settings`
+- `/admin`
+
+These screens are lower priority than the main voice loop and should be implemented after the core interaction path is stable.
+
+## Repository Layout
+
+- `apps/vela-ui` — minimal SvelteKit browser UI
+- `apps/vela-gateway` — minimal Fastify gateway service
+- `docs/` — technical documentation and phased backlog
+
+Use Yarn workspaces from the repository root to manage these packages.
+
+## Primary User Flow
+
+```text
+User presses mic
+→ audio streaming starts
+→ transcript appears
+→ final transcript sent
+→ assistant processes
+→ response streams as text and audio
+→ user can interrupt anytime
+```
+
+## Non-Goals for v1
+
+- full conversational memory system
+- emotion simulation or personality modeling
+- multi-user identity separation
+- offline LLM on the NanoPi
+- wake word and other future extensions listed in architecture docs
+
+## Documentation Map
+
+- [Architecture](architecture.md)
+- [Protocol](protocol.md)
+- [Integrations](integrations.md)
+- [Deployment](deployment.md)
+- [Setup](setup.md)
+- [Backlog](backlog.md)
--- a/docs/protocol.md
+++ b/docs/protocol.md
@@ -0,0 +1,65 @@
+# Vela Protocol and State Machine
+
+## Event Protocol
+
+### Client → Server
+
+```ts
+type ClientEvent =
+  | { type: "start_listening" }
+  | { type: "stop_listening" }
+  | { type: "audio_chunk"; data: string } // PCM16 base64
+  | { type: "interrupt" };
+```
+
+### Server → Client
+
+```ts
+type ServerEvent =
+  | { type: "state"; value: "idle" | "listening" | "thinking" | "speaking" }
+  | { type: "partial_transcript"; text: string }
+  | { type: "final_transcript"; text: string }
+  | { type: "assistant_text_delta"; text: string }
+  | { type: "tool_call_started"; tool: string }
+  | { type: "tool_call_finished"; tool: string; result: unknown }
+  | { type: "tts_audio_chunk"; data: string }
+  | { type: "assistant_done" }
+  | { type: "error"; message: string };
+```
+
+## State Machine
+
+```text
+idle
+ → listening
+ → thinking
+ → speaking
+ → idle
+```
+
+Interrupt can occur at:
+
+- listening → restart
+- thinking → cancel
+- speaking → stop immediately
+
+## Interrupt Handling Requirements
+
+- immediate stop of TTS playback
+- immediate stop of LLM streaming
+- reset session state to listening or idle, depending on UX decision
+
+### Mechanism
+
+The `interrupt` event cancels:
+
+- TTS process
+- current LLM request
+- tool execution when possible
+
+## Protocol Notes for Implementation
+
+- keep the protocol backward compatible when possible
+- prefer additive event changes over breaking renames
+- document protocol updates in this file whenever implementation changes behavior
+- when implementation diverges from the initial contract, update this document in the same change
--- a/docs/readme-migration-map.md
+++ b/docs/readme-migration-map.md
@@ -0,0 +1,31 @@
+# README Migration Map
+
+This file maps the original README sections to their new documentation locations after the restructure.
+
+| Old README section | Status | New location | Notes |
+| --- | --- | --- | --- |
+| Objective | Migrated | `docs/overview.md` | Covered in the Objective section. |
+| System overview | Migrated | `docs/overview.md`, `docs/architecture.md` | Split between product scope and high-level architecture. |
+| Components | Migrated | `docs/architecture.md` | Covered in Core Components. |
+| Voice pipeline | Migrated | `docs/architecture.md` | Covered in Voice Pipeline. |
+| Protocol | Migrated | `docs/protocol.md` | Covered in Event Protocol. |
+| STT | Migrated | `docs/integrations.md` | Covered in STT (Speech-to-Text). |
+| TTS | Migrated | `docs/integrations.md` | Covered in TTS (Text-to-Speech). |
+| LLM layer | Migrated | `docs/architecture.md` | Covered in LLM Layer. |
+| Tool system | Migrated | `docs/integrations.md` | Covered in Tool System and Safety Rules. |
+| Gateway flow | Migrated | `docs/architecture.md` | Covered in Gateway Internal Flow. |
+| Interrupt handling | Migrated | `docs/protocol.md` | Covered in Interrupt Handling Requirements and Mechanism. |
+| State machine | Migrated | `docs/protocol.md` | Covered in State Machine. |
+| Deployment | Migrated | `docs/deployment.md` | Covered in Deployment Layout. |
+| Networking | Migrated | `docs/deployment.md` | Covered in Networking. |
+| Security | Migrated | `docs/deployment.md`, `docs/integrations.md` | Deployment covers hosting/security posture; integrations covers tool safety. |
+| Performance targets | Migrated | `docs/deployment.md` | Covered in Performance Targets. |
+| Future extensions | Partially migrated | `docs/backlog.md`, `docs/overview.md` | Future work is tracked in the phased backlog; v1 exclusions are noted in Non-Goals for v1. |
+| Non-goals | Migrated | `docs/overview.md` | Covered in Non-Goals for v1. |
+| Naming | Migrated | `docs/architecture.md` | Covered in Naming. |
+| Implementation order | Migrated | `docs/backlog.md` | Reframed as phased implementation backlog. |
+| Key risks | Migrated | `docs/deployment.md` | Covered in Key Risks. |
+
+## Intentionally not migrated as standalone sections
+
+- `Future extensions` was not kept as its own top-level document section. It was intentionally folded into `docs/backlog.md` and `docs/overview.md` to keep future work and v1 exclusions close to planning and scope.
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -0,0 +1,72 @@
+# Vela Setup and Workspace Layout
+
+## Tooling and Package Management
+
+- Use **mise** to provision repo tools.
+- Use **Yarn** for dependency management and workspace commands in this repository.
+
+The repo-level tool configuration lives in `mise.toml`.
+
+## Workspace Layout
+
+```text
+apps/
+  vela-ui/
+  vela-gateway/
+docs/
+AGENTS.md
+README.md
+mise.toml
+package.json
+```
+
+## Workspace Purpose
+
+### `apps/vela-ui`
+
+- minimal SvelteKit browser application
+- current starter page confirms the workspace boots correctly
+- intended to grow into the SvelteKit PWA implementation
+
+### `apps/vela-gateway`
+
+- minimal Fastify gateway service
+- current HTTP endpoints provide a runnable baseline at `/` and `/health`
+- intended to grow into the WebSocket session and orchestration layer
+
+## Initial Commands
+
+Install repo tools:
+
+```bash
+mise install
+```
+
+Install dependencies:
+
+```bash
+mise exec -- yarn install
+```
+
+Run the current workspaces:
+
+```bash
+mise exec -- yarn dev:ui
+mise exec -- yarn dev:gateway
+```
+
+Additional verification commands:
+
+```bash
+mise exec -- yarn check:ui
+mise exec -- yarn build:ui
+mise exec -- yarn build:gateway
+```
+
+## Notes
+
+- the concrete framework choices are now SvelteKit for `vela-ui` and Fastify for `vela-gateway`
+- the UI is intentionally minimal and does not yet include mic capture, transcript rendering, or WebSocket session state
+- the gateway is intentionally minimal and does not yet expose the planned WebSocket contract
+- if your shell is configured for mise activation, plain `yarn` commands can be used after `mise install`
+- update this document when the repo layout or package manager workflow changes