7.5 KiB
7.5 KiB
Vela Phased Backlog
This backlog is the implementation plan translated into phased, actionable work. It should be updated whenever implementation changes scope, ordering, or done criteria.
Phase 1 — Foundation and Contracts
Goal
Establish the boundaries, protocol, and state model for the system before integrating providers.
Backlog Items
- define repository structure for
vela-uiandvela-gateway - define the WebSocket event contract used by the UI and gateway via shared package
- define the session state machine and interrupt semantics
- define provider adapter interfaces for STT, LLM, TTS, and tools
- document error handling and cancellation behavior
Exit Criteria
- protocol and state machine are documented
- UI and gateway responsibilities are explicit
- interrupt behavior is defined for every active phase
- provider boundaries are clear enough to implement mocks first
Phase 2 — Vertical Slice Skeleton
Goal
Prove the end-to-end interaction model with mocked or stubbed providers.
Backlog Items
- bootstrap
vela-uias a runnable SvelteKit app in the Yarn workspace - bootstrap
vela-gatewayas a runnable Fastify app in the Yarn workspace - add the first UI voice-session shell with connect/disconnect controls and explicit WebSocket status
- create a minimal mocked-turn UI with transcript and response text over the shared WebSocket session
- create a minimal UI with mic control
- create a gateway WebSocket session skeleton
- implement a mocked transcript/response vertical slice over the existing WebSocket session
- implement mocked STT flow for partial transcript events
- implement mocked LLM response streaming beyond the fixed deterministic slice
- implement stubbed audio playback or placeholder TTS output
- implement interrupt handling across the mocked pipeline
Exit Criteria
- one client can complete a voice turn through the real UI↔gateway contract
- transcript appears in the UI
- assistant text appears progressively or in structured steps
- audio playback or stubbed playback is visible to the user
- interrupt stops the active response and resets state cleanly
Phase 3 — Real STT Integration
Goal
Replace the mocked transcription layer with a real streaming STT provider.
Backlog Items
- integrate
whisper.cppbehind the STT adapter - support partial and final transcript delivery
- handle audio format conversion if browser capture format differs
- handle late transcript events after cancellation
- expose recoverable error handling for STT failures
Exit Criteria
- live mic audio produces usable transcripts
- partial and final results reach the UI
- cancellation prevents late transcript results from corrupting session state
- STT failure paths are visible and recoverable
Phase 4 — Ollama Streaming and Tool Calling
Goal
Replace the mocked reasoning layer with real LLM orchestration.
Backlog Items
- integrate Ollama behind the LLM adapter
- stream assistant text deltas to the UI
- define and validate tool-calling schema
- reject invalid or unsafe tool calls
- support interrupt during active thinking
Exit Criteria
- assistant responses stream from Ollama
- invalid tool requests fail safely
- cancellation stops active model work
- the LLM cannot directly execute external actions
Phase 5 — Tool Layer
Goal
Introduce useful tools in increasing order of operational risk.
Backlog Items
- implement SearXNG search adapter
- normalize search results for LLM consumption
- implement Home Assistant read actions
- implement Home Assistant write actions gated by confirmation
- implement clarification flow for ambiguous tool requests
Exit Criteria
- web search works end-to-end
- Home Assistant read queries work for approved entities
- Home Assistant write actions require explicit confirmation
- ambiguous actions do not execute automatically
Phase 6 — Kokoro TTS
Goal
Convert assistant text responses into spoken output.
Backlog Items
- integrate Kokoro behind the TTS adapter
- support streamed audio when practical
- add a temporary fallback for full-response playback if streaming is not ready
- stop or suppress playback correctly on interrupt
Exit Criteria
- spoken output plays in the UI
- interrupt stops or suppresses playback reliably
- any non-streaming fallback is explicitly documented as temporary
Phase 7 — Resilience and Performance
Goal
Make the system robust enough for routine use on the target hardware.
Backlog Items
- handle disconnect and reconnect cleanly
- add bounded timeouts for STT, LLM, tool, and TTS calls
- measure latency by pipeline stage
- improve buffering and recovery paths for flaky network dependencies
- validate behavior under cancellation and partial failure
Exit Criteria
- common network and provider failures do not leave sessions stuck
- latency is measurable at each major stage
- user-visible recovery paths exist for expected failure modes
Phase 8 — Productization and Secondary Surfaces
Goal
Polish the system after the core voice loop is reliable.
Backlog Items
- add PWA installability and UX polish
- implement
/history - implement
/settings - implement
/admin - document operational settings and maintenance guidance
Exit Criteria
- the app is installable as a PWA
- secondary screens exist without degrading the core voice loop
- supporting docs reflect the implemented behavior
Ongoing Documentation Tasks
- update docs whenever implementation changes the protocol, architecture, integrations, deployment, or backlog order
- mark completed backlog items or split phases into smaller slices as work progresses
- keep root
README.mdas the entrypoint and keep detailed technical docs indocs/
Current Progress Notes
apps/vela-uinow boots as a minimal SvelteKit app with a starter pageapps/vela-uinow includes a minimal voice-session shell that can connect to the gateway/wsendpoint and display developer-visible session statusapps/vela-uican now trigger one deterministic mocked turn while connected and render the mocked transcript plus assistant response for the active sessionapps/vela-uinow exposes a visible push-to-talk mic control shell that sends placeholderinput_audio.append/input_audio.commitevents without requesting browser mic permission or capturing real audioapps/vela-uinow includes browser-level coverage for the mocked transcript/response slice, including connect, disconnect, and disconnected-state trigger guardingapps/vela-gatewaynow boots as a minimal Fastify app with/and/healthendpointsapps/vela-gatewaynow exposes a minimal/wsWebSocket session skeleton with ephemeral in-memory sessions and defensive message handlingapps/vela-gatewaynow acceptsmocked.turn.triggerand emits protocol-valid mocked transcript/response events with one in-flight mocked turn per sessionapps/vela-gatewaynow supports placeholder input-audio append/commit cycles before running another mocked turn on the same socketapps/vela-uinow exposes a cancel control for active mocked turns and keeps already-rendered transcript/response text visible after cancellationapps/vela-gatewaynow honorsresponse.cancelduring mocked turns by stopping pending mocked response events, returning the session toidle, and allowing a new mocked turn on the same socketapps/vela-protocolnow provides the shared WebSocket event contract for the UI and gateway- backend framework choice is now concrete: Fastify