# Vela Phased Backlog This backlog is the implementation plan translated into phased, actionable work. It should be updated whenever implementation changes scope, ordering, or done criteria. ## Phase 1 — Foundation and Contracts ### Goal Establish the boundaries, protocol, and state model for the system before integrating providers. ### Backlog Items - [x] define repository structure for `vela-ui` and `vela-gateway` - [x] define the WebSocket event contract used by the UI and gateway via shared package - define the session state machine and interrupt semantics - define provider adapter interfaces for STT, LLM, TTS, and tools - document error handling and cancellation behavior ### Exit Criteria - protocol and state machine are documented - UI and gateway responsibilities are explicit - interrupt behavior is defined for every active phase - provider boundaries are clear enough to implement mocks first ## Phase 2 — Vertical Slice Skeleton ### Goal Prove the end-to-end interaction model with mocked or stubbed providers. ### Backlog Items - [x] bootstrap `vela-ui` as a runnable SvelteKit app in the Yarn workspace - [x] bootstrap `vela-gateway` as a runnable Fastify app in the Yarn workspace - [x] add the first UI voice-session shell with connect/disconnect controls and explicit WebSocket status - [x] create a minimal mocked-turn UI with transcript and response text over the shared WebSocket session - [x] create a minimal UI with mic control - [x] create a gateway WebSocket session skeleton - [x] implement a mocked transcript/response vertical slice over the existing WebSocket session - [x] implement mocked STT flow for partial transcript events - implement mocked LLM response streaming beyond the fixed deterministic slice - implement stubbed audio playback or placeholder TTS output - [x] implement interrupt handling across the mocked pipeline ### Exit Criteria - one client can complete a voice turn through the real UI↔gateway contract - transcript appears in the UI - assistant text appears progressively or in structured steps - audio playback or stubbed playback is visible to the user - interrupt stops the active response and resets state cleanly ## Phase 3 — Real STT Integration ### Goal Replace the mocked transcription layer with a real streaming STT provider. ### Backlog Items - integrate `whisper.cpp` behind the STT adapter - support partial and final transcript delivery - handle audio format conversion if browser capture format differs - handle late transcript events after cancellation - expose recoverable error handling for STT failures ### Exit Criteria - live mic audio produces usable transcripts - partial and final results reach the UI - cancellation prevents late transcript results from corrupting session state - STT failure paths are visible and recoverable ## Phase 4 — Ollama Streaming and Tool Calling ### Goal Replace the mocked reasoning layer with real LLM orchestration. ### Backlog Items - integrate Ollama behind the LLM adapter - stream assistant text deltas to the UI - define and validate tool-calling schema - reject invalid or unsafe tool calls - support interrupt during active thinking ### Exit Criteria - assistant responses stream from Ollama - invalid tool requests fail safely - cancellation stops active model work - the LLM cannot directly execute external actions ## Phase 5 — Tool Layer ### Goal Introduce useful tools in increasing order of operational risk. ### Backlog Items - implement SearXNG search adapter - normalize search results for LLM consumption - implement Home Assistant read actions - implement Home Assistant write actions gated by confirmation - implement clarification flow for ambiguous tool requests ### Exit Criteria - web search works end-to-end - Home Assistant read queries work for approved entities - Home Assistant write actions require explicit confirmation - ambiguous actions do not execute automatically ## Phase 6 — Kokoro TTS ### Goal Convert assistant text responses into spoken output. ### Backlog Items - integrate Kokoro behind the TTS adapter - support streamed audio when practical - add a temporary fallback for full-response playback if streaming is not ready - stop or suppress playback correctly on interrupt ### Exit Criteria - spoken output plays in the UI - interrupt stops or suppresses playback reliably - any non-streaming fallback is explicitly documented as temporary ## Phase 7 — Resilience and Performance ### Goal Make the system robust enough for routine use on the target hardware. ### Backlog Items - handle disconnect and reconnect cleanly - add bounded timeouts for STT, LLM, tool, and TTS calls - measure latency by pipeline stage - improve buffering and recovery paths for flaky network dependencies - validate behavior under cancellation and partial failure ### Exit Criteria - common network and provider failures do not leave sessions stuck - latency is measurable at each major stage - user-visible recovery paths exist for expected failure modes ## Phase 8 — Productization and Secondary Surfaces ### Goal Polish the system after the core voice loop is reliable. ### Backlog Items - add PWA installability and UX polish - implement `/history` - implement `/settings` - implement `/admin` - document operational settings and maintenance guidance ### Exit Criteria - the app is installable as a PWA - secondary screens exist without degrading the core voice loop - supporting docs reflect the implemented behavior ## Ongoing Documentation Tasks - update docs whenever implementation changes the protocol, architecture, integrations, deployment, or backlog order - mark completed backlog items or split phases into smaller slices as work progresses - keep root `README.md` as the entrypoint and keep detailed technical docs in `docs/` ## Current Progress Notes - `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page - `apps/vela-ui` now includes a minimal voice-session shell that can connect to the gateway `/ws` endpoint and display developer-visible session status - `apps/vela-ui` now exposes a visible push-to-talk mic control shell that sends placeholder `input_audio.append` / `input_audio.commit` events without requesting browser mic permission or capturing real audio - placeholder push-to-talk via `input_audio.append` + `input_audio.commit` is now the only supported mocked turn entry path in the shared client protocol contract - `apps/vela-ui` now includes browser-level coverage for the placeholder push-to-talk mocked transcript/response slice, including connect, disconnect, and cancel behavior - `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints - `apps/vela-gateway` now exposes a minimal `/ws` WebSocket session skeleton with ephemeral in-memory sessions and defensive message handling - `apps/vela-gateway` now rejects retired `mocked.turn.trigger` requests with a deterministic recoverable error instead of starting a mocked turn - `apps/vela-gateway` now supports repeated placeholder input-audio append/commit cycles on the same socket - `apps/vela-gateway` now emits deterministic `transcript.partial` events for placeholder `input_audio.append` messages and, after each accepted `input_audio.commit`, reuses the mocked response engine to stream a deterministic assistant reply for that push-to-talk turn - `apps/vela-ui` now renders the latest placeholder partial transcript during the push-to-talk shell turn, replaces it with the deterministic final transcript on commit, and shows streamed assistant text for that same push-to-talk flow - `apps/vela-ui` now exposes a cancel control for active push-to-talk-triggered mocked responses, and keeps already-rendered transcript/response text visible after cancellation - `apps/vela-gateway` now honors `response.cancel` during push-to-talk-triggered mocked responses by stopping pending mocked response events, returning the session to `idle`, and allowing a new turn on the same socket - `apps/vela-protocol` now provides the shared WebSocket event contract for the UI and gateway - backend framework choice is now concrete: Fastify