feat: bootstrap vela UI and gateway workspace

Establish the monorepo, tooling, and starter apps so UI and gateway development can begin from a documented, runnable baseline.
This commit is contained in:
2026-04-08 17:49:46 +02:00
commit bba0095bc0
23 changed files with 2023 additions and 0 deletions

129
docs/architecture.md Normal file
View File

@@ -0,0 +1,129 @@
# Vela Architecture
## High-Level Architecture
```text
[ Browser (PWA UI) ]
|
WebSocket
|
[ Vela Gateway (NanoPi R6S) ]
|
+--> STT (local or NAS)
+--> Ollama (NAS GPU)
+--> Kokoro TTS (NAS or NanoPi)
+--> Home Assistant
+--> SearXNG
```
## Core Components
## Repository Structure
```text
apps/
vela-ui/
vela-gateway/
```
The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
### Frontend — `vela-ui`
#### Tech
- SvelteKit
- PWA enabled
- WebSocket client
The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments.
#### Responsibilities
- audio capture from microphone
- audio playback for TTS
- UI state rendering
- session management
- interrupt handling
#### Main Screen
- large mic button
- live transcript
- streamed assistant response text
- state indicator:
- idle
- listening
- thinking
- speaking
- interrupt button during speaking
### Backend — `vela-gateway`
#### Tech
- Fastify (Node)
- WebSocket-based session layer
The current implementation is a minimal Fastify service with `/` and `/health` HTTP endpoints. The WebSocket session layer is a later increment.
#### Responsibilities
- session lifecycle
- audio ingestion
- STT orchestration
- LLM orchestration
- tool execution
- TTS orchestration
- event streaming
## Voice Pipeline
```text
Mic → Gateway → STT → Transcript
→ LLM → Tool Calls → Results
→ LLM → Final Response
→ TTS → Audio Stream → UI
```
## Gateway Internal Flow
```text
1. Receive audio
2. Run STT (streaming)
3. Emit partial transcripts
4. On final:
→ call LLM
5. LLM decides:
→ direct response OR tool call
6. Execute tool
7. Feed result back to LLM
8. Generate final response
9. Send text stream
10. Send TTS stream
```
## LLM Layer
### Location
- NAS with RTX 3050 8GB
### Role
- intent parsing
- tool selection
- response generation
### Constraints
- must use a tool-calling schema
- must not directly control systems
- target approximately 7B-class models because of hardware limits
## Naming
- system: **Vela**
- gateway: `vela-gateway`
- UI: `vela-ui`
- voice profile: `vela-neutral`

183
docs/backlog.md Normal file
View File

@@ -0,0 +1,183 @@
# Vela Phased Backlog
This backlog is the implementation plan translated into phased, actionable work. It should be updated whenever implementation changes scope, ordering, or done criteria.
## Phase 1 — Foundation and Contracts
### Goal
Establish the boundaries, protocol, and state model for the system before integrating providers.
### Backlog Items
- [x] define repository structure for `vela-ui` and `vela-gateway`
- define the WebSocket event contract used by the UI and gateway
- define the session state machine and interrupt semantics
- define provider adapter interfaces for STT, LLM, TTS, and tools
- document error handling and cancellation behavior
### Exit Criteria
- protocol and state machine are documented
- UI and gateway responsibilities are explicit
- interrupt behavior is defined for every active phase
- provider boundaries are clear enough to implement mocks first
## Phase 2 — Vertical Slice Skeleton
### Goal
Prove the end-to-end interaction model with mocked or stubbed providers.
### Backlog Items
- [x] bootstrap `vela-ui` as a runnable SvelteKit app in the Yarn workspace
- [x] bootstrap `vela-gateway` as a runnable Fastify app in the Yarn workspace
- create a minimal UI with mic control, state indicator, transcript, and response text
- create a gateway WebSocket session skeleton
- implement mocked STT flow for partial and final transcript events
- implement mocked LLM response streaming
- implement stubbed audio playback or placeholder TTS output
- implement interrupt handling across the mocked pipeline
### Exit Criteria
- one client can complete a voice turn through the real UI↔gateway contract
- transcript appears in the UI
- assistant text appears progressively or in structured steps
- audio playback or stubbed playback is visible to the user
- interrupt stops the active response and resets state cleanly
## Phase 3 — Real STT Integration
### Goal
Replace the mocked transcription layer with a real streaming STT provider.
### Backlog Items
- integrate `whisper.cpp` behind the STT adapter
- support partial and final transcript delivery
- handle audio format conversion if browser capture format differs
- handle late transcript events after cancellation
- expose recoverable error handling for STT failures
### Exit Criteria
- live mic audio produces usable transcripts
- partial and final results reach the UI
- cancellation prevents late transcript results from corrupting session state
- STT failure paths are visible and recoverable
## Phase 4 — Ollama Streaming and Tool Calling
### Goal
Replace the mocked reasoning layer with real LLM orchestration.
### Backlog Items
- integrate Ollama behind the LLM adapter
- stream assistant text deltas to the UI
- define and validate tool-calling schema
- reject invalid or unsafe tool calls
- support interrupt during active thinking
### Exit Criteria
- assistant responses stream from Ollama
- invalid tool requests fail safely
- cancellation stops active model work
- the LLM cannot directly execute external actions
## Phase 5 — Tool Layer
### Goal
Introduce useful tools in increasing order of operational risk.
### Backlog Items
- implement SearXNG search adapter
- normalize search results for LLM consumption
- implement Home Assistant read actions
- implement Home Assistant write actions gated by confirmation
- implement clarification flow for ambiguous tool requests
### Exit Criteria
- web search works end-to-end
- Home Assistant read queries work for approved entities
- Home Assistant write actions require explicit confirmation
- ambiguous actions do not execute automatically
## Phase 6 — Kokoro TTS
### Goal
Convert assistant text responses into spoken output.
### Backlog Items
- integrate Kokoro behind the TTS adapter
- support streamed audio when practical
- add a temporary fallback for full-response playback if streaming is not ready
- stop or suppress playback correctly on interrupt
### Exit Criteria
- spoken output plays in the UI
- interrupt stops or suppresses playback reliably
- any non-streaming fallback is explicitly documented as temporary
## Phase 7 — Resilience and Performance
### Goal
Make the system robust enough for routine use on the target hardware.
### Backlog Items
- handle disconnect and reconnect cleanly
- add bounded timeouts for STT, LLM, tool, and TTS calls
- measure latency by pipeline stage
- improve buffering and recovery paths for flaky network dependencies
- validate behavior under cancellation and partial failure
### Exit Criteria
- common network and provider failures do not leave sessions stuck
- latency is measurable at each major stage
- user-visible recovery paths exist for expected failure modes
## Phase 8 — Productization and Secondary Surfaces
### Goal
Polish the system after the core voice loop is reliable.
### Backlog Items
- add PWA installability and UX polish
- implement `/history`
- implement `/settings`
- implement `/admin`
- document operational settings and maintenance guidance
### Exit Criteria
- the app is installable as a PWA
- secondary screens exist without degrading the core voice loop
- supporting docs reflect the implemented behavior
## Ongoing Documentation Tasks
- update docs whenever implementation changes the protocol, architecture, integrations, deployment, or backlog order
- mark completed backlog items or split phases into smaller slices as work progresses
- keep root `README.md` as the entrypoint and keep detailed technical docs in `docs/`
## Current Progress Notes
- `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page
- `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints
- backend framework choice is now concrete: Fastify

67
docs/deployment.md Normal file
View File

@@ -0,0 +1,67 @@
# Vela Deployment and Operations
## Deployment Layout
### NanoPi R6S
```yaml
services:
ui:
build: ./apps/vela-ui
gateway:
build: ./apps/vela-gateway
environment:
OLLAMA_URL: http://nas:11434
KOKORO_URL: http://nas:8880
HASS_URL: http://homeassistant:8123
SEARXNG_URL: http://searxng:8080/search
```
### NAS
```yaml
services:
ollama:
image: ollama/ollama
kokoro:
image: kokoro-tts
```
## Networking
- all services should be reachable on the internal network
- expected reverse proxy routes:
- `/` → UI
- `/api` or `/health` → gateway HTTP routes
- `/ws` → WebSocket
## Security
- Home Assistant token stored server-side only
- no secrets in the frontend
- internal network isolation preferred
- optional gateway auth can be added later if deployment needs it
## Performance Targets
- wake to response start: under 1.5s
- STT latency: under 800ms
- TTS start latency: under 500ms
- full roundtrip: under 3s
## Key Risks
| Risk | Mitigation |
| --- | --- |
| STT latency on NanoPi | move STT to NAS |
| TTS performance | run TTS on NAS |
| LLM hallucinating actions | enforce strict tool schema |
| WebSocket instability | add heartbeat and reconnect handling |
| Audio sync issues | use chunked streaming and buffering |
## Documentation Maintenance
- update this document when deployment topology, networking, or service placement changes
- keep performance targets and risk mitigations aligned with the current implementation state

97
docs/integrations.md Normal file
View File

@@ -0,0 +1,97 @@
# Vela Integrations and Tool Safety
## Current Runtime Baseline
- `vela-ui` is implemented as a SvelteKit application
- `vela-gateway` is implemented as a Fastify service
- current integration work beyond the gateway HTTP baseline remains future implementation
## STT (Speech-to-Text)
### Primary Option
- `whisper.cpp`
### Deployment
- start on NanoPi
- move to NAS if latency is insufficient
### Requirements
- streaming transcription
- partial and final output
- low latency, with sub-second response preferred
## TTS (Text-to-Speech)
### Engine
- Kokoro TTS
### Deployment
- prefer NAS for more compute headroom
### API Contract
```http
POST /speak
{
"text": "...",
"voice": "vela",
"format": "wav"
}
```
### Requirements
- streaming audio preferred
- low startup latency
- interrupt support
## Tool System
### Home Assistant Tool
#### Functions
```ts
turn_on(entity_id);
turn_off(entity_id);
set_temperature(entity_id, value);
get_state(entity_id);
```
#### Backend
- REST API
- optional Conversation API
#### Safety
- require confirmation for destructive actions
- require confirmation for irreversible or significant state changes
- keep secrets server-side only
### SearXNG Tool
#### Endpoint
```http
GET /search?q=...&format=json
```
#### Flow
- query SearXNG
- return top results
- let the LLM summarize the result set
## Safety Rules
- the LLM does not directly control systems
- all external actions go through explicit tool adapters
- Home Assistant write actions require confirmation
- frontend must not contain Home Assistant tokens or other secrets
- ambiguous tool intents should be clarified instead of guessed

92
docs/overview.md Normal file
View File

@@ -0,0 +1,92 @@
# Vela Overview
## Objective
Vela is a fully local, voice-first assistant system with:
- local-first architecture and no mandatory cloud dependencies
- natural TTS output via Kokoro
- voice-driven interaction as the primary interface
- integrations with Home Assistant and SearXNG
- a lightweight SvelteKit PWA
- remote LLM inference via Ollama on a NAS
## Core Design Principles
### Voice-first
- UI optimized for speaking instead of typing
- minimal visual clutter
- real-time feedback through partial transcripts and streaming responses
### Local-first
- no required cloud APIs
- all services self-hosted
- browser used for capture and playback only
### Tool-driven intelligence
- the LLM does not directly control external systems
- all external actions route through explicit tools
### Low-latency interaction
- streaming STT partial results
- streaming LLM token output
- streaming TTS audio chunks
- interruptible responses
## Product Scope
### Primary Interface
- browser-based PWA
- push-to-talk interaction
- transcript and response display
- playback of streamed or returned audio
### Secondary Screens
- `/history`
- `/settings`
- `/admin`
These screens are lower priority than the main voice loop and should be implemented after the core interaction path is stable.
## Repository Layout
- `apps/vela-ui` — minimal SvelteKit browser UI
- `apps/vela-gateway` — minimal Fastify gateway service
- `docs/` — technical documentation and phased backlog
Use Yarn workspaces from the repository root to manage these packages.
## Primary User Flow
```text
User presses mic
→ audio streaming starts
→ transcript appears
→ final transcript sent
→ assistant processes
→ response streams as text and audio
→ user can interrupt anytime
```
## Non-Goals for v1
- full conversational memory system
- emotion simulation or personality modeling
- multi-user identity separation
- offline LLM on the NanoPi
- wake word and other future extensions listed in architecture docs
## Documentation Map
- [Architecture](architecture.md)
- [Protocol](protocol.md)
- [Integrations](integrations.md)
- [Deployment](deployment.md)
- [Setup](setup.md)
- [Backlog](backlog.md)

65
docs/protocol.md Normal file
View File

@@ -0,0 +1,65 @@
# Vela Protocol and State Machine
## Event Protocol
### Client → Server
```ts
type ClientEvent =
| { type: "start_listening" }
| { type: "stop_listening" }
| { type: "audio_chunk"; data: string } // PCM16 base64
| { type: "interrupt" };
```
### Server → Client
```ts
type ServerEvent =
| { type: "state"; value: "idle" | "listening" | "thinking" | "speaking" }
| { type: "partial_transcript"; text: string }
| { type: "final_transcript"; text: string }
| { type: "assistant_text_delta"; text: string }
| { type: "tool_call_started"; tool: string }
| { type: "tool_call_finished"; tool: string; result: unknown }
| { type: "tts_audio_chunk"; data: string }
| { type: "assistant_done" }
| { type: "error"; message: string };
```
## State Machine
```text
idle
→ listening
→ thinking
→ speaking
→ idle
```
Interrupt can occur at:
- listening → restart
- thinking → cancel
- speaking → stop immediately
## Interrupt Handling Requirements
- immediate stop of TTS playback
- immediate stop of LLM streaming
- reset session state to listening or idle, depending on UX decision
### Mechanism
The `interrupt` event cancels:
- TTS process
- current LLM request
- tool execution when possible
## Protocol Notes for Implementation
- keep the protocol backward compatible when possible
- prefer additive event changes over breaking renames
- document protocol updates in this file whenever implementation changes behavior
- when implementation diverges from the initial contract, update this document in the same change

View File

@@ -0,0 +1,31 @@
# README Migration Map
This file maps the original README sections to their new documentation locations after the restructure.
| Old README section | Status | New location | Notes |
| --- | --- | --- | --- |
| Objective | Migrated | `docs/overview.md` | Covered in the Objective section. |
| System overview | Migrated | `docs/overview.md`, `docs/architecture.md` | Split between product scope and high-level architecture. |
| Components | Migrated | `docs/architecture.md` | Covered in Core Components. |
| Voice pipeline | Migrated | `docs/architecture.md` | Covered in Voice Pipeline. |
| Protocol | Migrated | `docs/protocol.md` | Covered in Event Protocol. |
| STT | Migrated | `docs/integrations.md` | Covered in STT (Speech-to-Text). |
| TTS | Migrated | `docs/integrations.md` | Covered in TTS (Text-to-Speech). |
| LLM layer | Migrated | `docs/architecture.md` | Covered in LLM Layer. |
| Tool system | Migrated | `docs/integrations.md` | Covered in Tool System and Safety Rules. |
| Gateway flow | Migrated | `docs/architecture.md` | Covered in Gateway Internal Flow. |
| Interrupt handling | Migrated | `docs/protocol.md` | Covered in Interrupt Handling Requirements and Mechanism. |
| State machine | Migrated | `docs/protocol.md` | Covered in State Machine. |
| Deployment | Migrated | `docs/deployment.md` | Covered in Deployment Layout. |
| Networking | Migrated | `docs/deployment.md` | Covered in Networking. |
| Security | Migrated | `docs/deployment.md`, `docs/integrations.md` | Deployment covers hosting/security posture; integrations covers tool safety. |
| Performance targets | Migrated | `docs/deployment.md` | Covered in Performance Targets. |
| Future extensions | Partially migrated | `docs/backlog.md`, `docs/overview.md` | Future work is tracked in the phased backlog; v1 exclusions are noted in Non-Goals for v1. |
| Non-goals | Migrated | `docs/overview.md` | Covered in Non-Goals for v1. |
| Naming | Migrated | `docs/architecture.md` | Covered in Naming. |
| Implementation order | Migrated | `docs/backlog.md` | Reframed as phased implementation backlog. |
| Key risks | Migrated | `docs/deployment.md` | Covered in Key Risks. |
## Intentionally not migrated as standalone sections
- `Future extensions` was not kept as its own top-level document section. It was intentionally folded into `docs/backlog.md` and `docs/overview.md` to keep future work and v1 exclusions close to planning and scope.

72
docs/setup.md Normal file
View File

@@ -0,0 +1,72 @@
# Vela Setup and Workspace Layout
## Tooling and Package Management
- Use **mise** to provision repo tools.
- Use **Yarn** for dependency management and workspace commands in this repository.
The repo-level tool configuration lives in `mise.toml`.
## Workspace Layout
```text
apps/
vela-ui/
vela-gateway/
docs/
AGENTS.md
README.md
mise.toml
package.json
```
## Workspace Purpose
### `apps/vela-ui`
- minimal SvelteKit browser application
- current starter page confirms the workspace boots correctly
- intended to grow into the SvelteKit PWA implementation
### `apps/vela-gateway`
- minimal Fastify gateway service
- current HTTP endpoints provide a runnable baseline at `/` and `/health`
- intended to grow into the WebSocket session and orchestration layer
## Initial Commands
Install repo tools:
```bash
mise install
```
Install dependencies:
```bash
mise exec -- yarn install
```
Run the current workspaces:
```bash
mise exec -- yarn dev:ui
mise exec -- yarn dev:gateway
```
Additional verification commands:
```bash
mise exec -- yarn check:ui
mise exec -- yarn build:ui
mise exec -- yarn build:gateway
```
## Notes
- the concrete framework choices are now SvelteKit for `vela-ui` and Fastify for `vela-gateway`
- the UI is intentionally minimal and does not yet include mic capture, transcript rendering, or WebSocket session state
- the gateway is intentionally minimal and does not yet expose the planned WebSocket contract
- if your shell is configured for mise activation, plain `yarn` commands can be used after `mise install`
- update this document when the repo layout or package manager workflow changes