Add a minimal UI shell that connects to the gateway WebSocket and exposes developer-visible session state. Align the architecture, protocol, setup, integration, and backlog docs with the current UI increment.
6.1 KiB
6.1 KiB
Vela Phased Backlog
This backlog is the implementation plan translated into phased, actionable work. It should be updated whenever implementation changes scope, ordering, or done criteria.
Phase 1 — Foundation and Contracts
Goal
Establish the boundaries, protocol, and state model for the system before integrating providers.
Backlog Items
- define repository structure for
vela-uiandvela-gateway - define the WebSocket event contract used by the UI and gateway via shared package
- define the session state machine and interrupt semantics
- define provider adapter interfaces for STT, LLM, TTS, and tools
- document error handling and cancellation behavior
Exit Criteria
- protocol and state machine are documented
- UI and gateway responsibilities are explicit
- interrupt behavior is defined for every active phase
- provider boundaries are clear enough to implement mocks first
Phase 2 — Vertical Slice Skeleton
Goal
Prove the end-to-end interaction model with mocked or stubbed providers.
Backlog Items
- bootstrap
vela-uias a runnable SvelteKit app in the Yarn workspace - bootstrap
vela-gatewayas a runnable Fastify app in the Yarn workspace - add the first UI voice-session shell with connect/disconnect controls and explicit WebSocket status
- create a minimal UI with mic control, transcript, and response text
- create a gateway WebSocket session skeleton
- implement mocked STT flow for partial and final transcript events
- implement mocked LLM response streaming
- implement stubbed audio playback or placeholder TTS output
- implement interrupt handling across the mocked pipeline
Exit Criteria
- one client can complete a voice turn through the real UI↔gateway contract
- transcript appears in the UI
- assistant text appears progressively or in structured steps
- audio playback or stubbed playback is visible to the user
- interrupt stops the active response and resets state cleanly
Phase 3 — Real STT Integration
Goal
Replace the mocked transcription layer with a real streaming STT provider.
Backlog Items
- integrate
whisper.cppbehind the STT adapter - support partial and final transcript delivery
- handle audio format conversion if browser capture format differs
- handle late transcript events after cancellation
- expose recoverable error handling for STT failures
Exit Criteria
- live mic audio produces usable transcripts
- partial and final results reach the UI
- cancellation prevents late transcript results from corrupting session state
- STT failure paths are visible and recoverable
Phase 4 — Ollama Streaming and Tool Calling
Goal
Replace the mocked reasoning layer with real LLM orchestration.
Backlog Items
- integrate Ollama behind the LLM adapter
- stream assistant text deltas to the UI
- define and validate tool-calling schema
- reject invalid or unsafe tool calls
- support interrupt during active thinking
Exit Criteria
- assistant responses stream from Ollama
- invalid tool requests fail safely
- cancellation stops active model work
- the LLM cannot directly execute external actions
Phase 5 — Tool Layer
Goal
Introduce useful tools in increasing order of operational risk.
Backlog Items
- implement SearXNG search adapter
- normalize search results for LLM consumption
- implement Home Assistant read actions
- implement Home Assistant write actions gated by confirmation
- implement clarification flow for ambiguous tool requests
Exit Criteria
- web search works end-to-end
- Home Assistant read queries work for approved entities
- Home Assistant write actions require explicit confirmation
- ambiguous actions do not execute automatically
Phase 6 — Kokoro TTS
Goal
Convert assistant text responses into spoken output.
Backlog Items
- integrate Kokoro behind the TTS adapter
- support streamed audio when practical
- add a temporary fallback for full-response playback if streaming is not ready
- stop or suppress playback correctly on interrupt
Exit Criteria
- spoken output plays in the UI
- interrupt stops or suppresses playback reliably
- any non-streaming fallback is explicitly documented as temporary
Phase 7 — Resilience and Performance
Goal
Make the system robust enough for routine use on the target hardware.
Backlog Items
- handle disconnect and reconnect cleanly
- add bounded timeouts for STT, LLM, tool, and TTS calls
- measure latency by pipeline stage
- improve buffering and recovery paths for flaky network dependencies
- validate behavior under cancellation and partial failure
Exit Criteria
- common network and provider failures do not leave sessions stuck
- latency is measurable at each major stage
- user-visible recovery paths exist for expected failure modes
Phase 8 — Productization and Secondary Surfaces
Goal
Polish the system after the core voice loop is reliable.
Backlog Items
- add PWA installability and UX polish
- implement
/history - implement
/settings - implement
/admin - document operational settings and maintenance guidance
Exit Criteria
- the app is installable as a PWA
- secondary screens exist without degrading the core voice loop
- supporting docs reflect the implemented behavior
Ongoing Documentation Tasks
- update docs whenever implementation changes the protocol, architecture, integrations, deployment, or backlog order
- mark completed backlog items or split phases into smaller slices as work progresses
- keep root
README.mdas the entrypoint and keep detailed technical docs indocs/
Current Progress Notes
apps/vela-uinow boots as a minimal SvelteKit app with a starter pageapps/vela-uinow includes a minimal voice-session shell that can connect to the gateway/wsendpoint and display developer-visible session statusapps/vela-gatewaynow boots as a minimal Fastify app with/and/healthendpointsapps/vela-gatewaynow exposes a minimal/wsWebSocket session skeleton with ephemeral in-memory sessions and defensive message handlingapps/vela-protocolnow provides the shared WebSocket event contract for the UI and gateway- backend framework choice is now concrete: Fastify