feat(vela): retire legacy mocked turn trigger

2026-04-08 21:50:18 +02:00
parent 28712443cc
commit 8e14eaeed0
10 changed files with 78 additions and 378 deletions
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -36,14 +36,13 @@ The repository now includes separate runnable workspaces for the UI and gateway
 - PWA enabled
 - WebSocket client

-The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), expose mic control shell interactions that emit placeholder `input_audio.append` / `input_audio.commit` events, trigger one deterministic mocked turn while connected, render deterministic placeholder partial/final transcripts for the push-to-talk shell, and stream the mocked assistant response both for `mocked.turn.trigger` and for push-to-talk commits. This remains a shell only: there is no real microphone capture, real provider integration, or audio playback yet.
+The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), expose mic control shell interactions that emit placeholder `input_audio.append` / `input_audio.commit` events, render deterministic placeholder partial/final transcripts for the push-to-talk shell, and stream the mocked assistant response after push-to-talk commit. This remains a shell only: there is no real microphone capture, real provider integration, or audio playback yet.

 #### Responsibilities

 Current shell responsibilities:

 - connection state rendering
- mocked-turn trigger rendering with disconnected/in-flight guards
 - mocked transcript and mocked assistant response rendering
 - developer-oriented session metadata rendering
 - browser session connect/disconnect controls
@@ -62,7 +61,6 @@ Current shell:
 - developer-focused voice-session panel
 - connect button
 - disconnect button
- mocked-turn button
 - connection status indicator
 - mocked transcript display
 - mocked assistant response display
@@ -106,14 +104,14 @@ The current implementation is a minimal Fastify service with `/`, `/health`, and
 - the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established
 - valid minimal client events, including placeholder `input_audio.append` / `input_audio.commit`, can move the session through the mocked turn states on one socket
 - placeholder `input_audio.append` emits deterministic mocked `transcript.partial` events and `input_audio.commit` emits one deterministic mocked `transcript.final` before starting the existing mocked assistant response stream
- `mocked.turn.trigger` drives a fixed transcript/response event sequence over the existing shared protocol
 - only one mocked turn is allowed in flight per session at a time
 - invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up
+- retired `mocked.turn.trigger` messages are rejected with a deterministic recoverable error

 ### Current UI shell behavior

 - renders a minimal developer-focused voice-session panel
- exposes connect, disconnect, mic-control shell interactions, and mocked-turn controls
+- exposes connect, disconnect, and mic-control shell interactions
 - does not request microphone permission or capture real microphone audio
 - only emits placeholder `input_audio.append` / `input_audio.commit` events; it does not send real audio data or play back audio
 - renders the latest placeholder partial transcript during a push-to-talk shell turn, replaces it with the final deterministic transcript on commit, and appends streamed mocked assistant text for that same push-to-talk turn
@@ -122,7 +120,7 @@ The current implementation is a minimal Fastify service with `/`, `/health`, and
 ## Voice Pipeline

 ```text
-Mic control shell / mocked turn button → Placeholder `input_audio.append` / `input_audio.commit` or mocked session flow → Deterministic transcript events → Shared mocked response engine → Mocked response text events → UI
+Mic control shell → Placeholder `input_audio.append` / `input_audio.commit` → Deterministic transcript events → Shared mocked response engine → Mocked response text events → UI
 ```

 This mocked vertical slice intentionally stands in for the future real pipeline:
--- a/docs/backlog.md
+++ b/docs/backlog.md
@@ -183,16 +183,15 @@ Polish the system after the core voice loop is reliable.

 - `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page
 - `apps/vela-ui` now includes a minimal voice-session shell that can connect to the gateway `/ws` endpoint and display developer-visible session status
- `apps/vela-ui` can now trigger one deterministic mocked turn while connected and render the mocked transcript plus assistant response for the active session
 - `apps/vela-ui` now exposes a visible push-to-talk mic control shell that sends placeholder `input_audio.append` / `input_audio.commit` events without requesting browser mic permission or capturing real audio
- `apps/vela-ui` now includes browser-level coverage for the mocked transcript/response slice, including connect, disconnect, and disconnected-state trigger guarding
+- `apps/vela-ui` now includes browser-level coverage for the placeholder push-to-talk mocked transcript/response slice, including connect, disconnect, and cancel behavior
 - `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints
 - `apps/vela-gateway` now exposes a minimal `/ws` WebSocket session skeleton with ephemeral in-memory sessions and defensive message handling
- `apps/vela-gateway` now accepts `mocked.turn.trigger` and emits protocol-valid mocked transcript/response events with one in-flight mocked turn per session
- `apps/vela-gateway` now supports placeholder input-audio append/commit cycles before running another mocked turn on the same socket
+- `apps/vela-gateway` now rejects retired `mocked.turn.trigger` requests with a deterministic recoverable error instead of starting a mocked turn
+- `apps/vela-gateway` now supports repeated placeholder input-audio append/commit cycles on the same socket
 - `apps/vela-gateway` now emits deterministic `transcript.partial` events for placeholder `input_audio.append` messages and, after each accepted `input_audio.commit`, reuses the mocked response engine to stream a deterministic assistant reply for that push-to-talk turn
- `apps/vela-ui` now renders the latest placeholder partial transcript during the push-to-talk shell turn, replaces it with the deterministic final transcript on commit, and shows streamed assistant text for the same push-to-talk flow
- `apps/vela-ui` now exposes a cancel control for active mocked turns and mocked push-to-talk responses, and keeps already-rendered transcript/response text visible after cancellation
- `apps/vela-gateway` now honors `response.cancel` during mocked turns and push-to-talk-triggered mocked responses by stopping pending mocked response events, returning the session to `idle`, and allowing a new turn on the same socket
+- `apps/vela-ui` now renders the latest placeholder partial transcript during the push-to-talk shell turn, replaces it with the deterministic final transcript on commit, and shows streamed assistant text for that same push-to-talk flow
+- `apps/vela-ui` now exposes a cancel control for active push-to-talk-triggered mocked responses, and keeps already-rendered transcript/response text visible after cancellation
+- `apps/vela-gateway` now honors `response.cancel` during push-to-talk-triggered mocked responses by stopping pending mocked response events, returning the session to `idle`, and allowing a new turn on the same socket
 - `apps/vela-protocol` now provides the shared WebSocket event contract for the UI and gateway
 - backend framework choice is now concrete: Fastify
--- a/docs/overview.md
+++ b/docs/overview.md
@@ -43,6 +43,7 @@ Vela is a fully local, voice-first assistant system with:

 - browser-based PWA
 - push-to-talk interaction
+- current mocked vertical slice enters turns only through the placeholder push-to-talk shell
 - transcript and response display
 - playback of streamed or returned audio

--- a/docs/protocol.md
+++ b/docs/protocol.md
@@ -15,8 +15,8 @@ Current UI baseline:

 - the browser opens a WebSocket directly to `/ws`
 - the UI tracks connection status separately from gateway session status
- the UI can send `mocked.turn.trigger` after `session.ready` while connected to request one deterministic mocked turn for the active session
 - the UI exposes a push-to-talk mic control shell that sends placeholder `input_audio.append` on press and `input_audio.commit` on release without capturing real audio
+- the push-to-talk shell is the only supported mocked turn entry path from the shipped UI

 ## WebSocket Message Envelope

@@ -50,7 +50,7 @@ type ClientEvent =
 #### Client event intent

 - `session.start` initializes a voice session without locking in transport or auth details yet
- `mocked.turn.trigger` asks the gateway to run one obviously mocked, deterministic transcript/response turn
+- `mocked.turn.trigger` is a retired legacy event name that the gateway now rejects with a deterministic recoverable error
 - `input_audio.append` carries a chunk of captured input audio as an encoded string
 - `input_audio.commit` marks the current buffered user turn as ready for downstream processing
 - `response.cancel` interrupts the active listen/think/speak flow
@@ -59,15 +59,13 @@ type ClientEvent =

 - on connect, the gateway creates an ephemeral in-memory session and emits `session.ready` plus `session.state`
 - `session.start` is accepted as an idempotent session acknowledgment and re-sends readiness/state
- `mocked.turn.trigger` is accepted only when no other mocked turn is already in flight for that session
- a mocked turn emits deterministic `transcript.final`, `response.text.delta`, `response.completed`, and `session.state` events in protocol-valid order
+- `mocked.turn.trigger` is rejected deterministically with `error.code = unsupported_mocked_turn_trigger`
 - `input_audio.append` updates the ephemeral session record and moves the session to `listening`
 - each accepted `input_audio.append` emits one deterministic `transcript.partial` for the current placeholder turn
- `input_audio.commit` emits exactly one deterministic `transcript.final` and then starts the same deterministic mocked assistant response stream used by `mocked.turn.trigger`
- after a completed placeholder input cycle, the same socket can still send `mocked.turn.trigger`
+- `input_audio.commit` emits exactly one deterministic `transcript.final` and then starts the deterministic mocked assistant response stream for that push-to-talk turn
+- after a completed placeholder input cycle, the same socket can start another placeholder push-to-talk turn without reconnecting
 - `response.cancel` is safe to send even when no mocked turn is active
 - `response.cancel` stops any still-pending mocked turn events for the active turn and resets the minimal session state back to `idle`
- a second mocked-turn trigger during an active mocked turn produces `error` with code `mocked_turn_in_flight`
 - malformed JSON produces `error` with code `invalid_json`
 - invalid envelopes or unsupported client event names produce `error` with code `invalid_message`
 - malformed WebSocket frames are rejected without crashing the gateway process
@@ -88,7 +86,6 @@ Notes:

 - this UI state is transport-oriented and is separate from the shared gateway `session.state` payload
 - `session.state` currently reflects the gateway session phase (`idle`, `listening`, `thinking`, `speaking`)
- the UI disables the mocked-turn control until `session.ready` arrives, while disconnected, or while a mocked turn is already in flight
 - the UI disables the mic control while disconnected, before `session.ready`, or while a mocked turn is already in flight
 - pressing the mic control sends one placeholder `input_audio.append` chunk and releasing it sends `input_audio.commit`
 - while a placeholder push-to-talk turn is in progress, the UI renders the latest `transcript.partial`
@@ -126,26 +123,19 @@ type ServerEvent =
 - `response.completed` marks the current assistant turn as done
 - `error` is the minimal recoverable failure shape for both UI and gateway work

-### Deterministic mocked turn sequence
+### Legacy mocked turn trigger rejection

-For this increment, `mocked.turn.trigger` produces one fixed interaction for the active session:
+For this increment, direct `mocked.turn.trigger` requests no longer start a mocked turn:

 ```text
-session.state(listening)
-→ transcript.final("[mocked user] What is the current mocked vertical slice?")
-→ session.state(thinking)
-→ session.state(speaking)
-→ response.text.delta("[mocked assistant] ")
-→ response.text.delta("This is a deterministic mocked response from the gateway vertical slice.")
-→ response.completed
-→ session.state(idle)
+mocked.turn.trigger
+→ error(code="unsupported_mocked_turn_trigger", message="mocked.turn.trigger is no longer supported; use input_audio.append and input_audio.commit instead.")
 ```

 Notes:

- the content is intentionally fixed and obviously mocked
- no audio, STT, LLM, TTS, or external providers participate in this flow
- `response.cancel` can stop the mocked turn early, suppress any later mocked response events for that turn, and return the session to `idle`
+- this rejection is deterministic and recoverable
+- the session remains available for the supported push-to-talk flow on the same socket

 ### Deterministic placeholder push-to-talk transcript and mocked response sequence

@@ -173,8 +163,8 @@ Safe deterministic edge cases for this mocked placeholder flow:

 - commit without any prior append is accepted and emits `transcript.final("[mocked final] Placeholder push-to-talk transcript completed without appended audio.")`
 - repeated appends during one placeholder turn are accepted and each append replaces the latest partial transcript with a chunk-count-based deterministic value
- after the final transcript, placeholder commit follows the same mocked `thinking → speaking → response.text.delta* → response.completed → idle` path as `mocked.turn.trigger`
- `response.cancel` can interrupt this mocked post-commit response path the same way it interrupts `mocked.turn.trigger`; already-rendered transcript or assistant text is not retracted
+- after the final transcript, placeholder commit follows the deterministic mocked `thinking → speaking → response.text.delta* → response.completed → idle` path
+- `response.cancel` can interrupt this mocked post-commit response path; already-rendered transcript or assistant text is not retracted

 ## Contract Scope for This Increment

@@ -207,7 +197,7 @@ Current mocked-pipeline behavior:
 - during an active mocked turn, `response.cancel` returns the session to `idle` immediately
 - any mocked turn timers that have not fired yet are dropped, so no later `response.text.delta` or `response.completed` events are emitted for the cancelled turn
 - the same cancellation behavior applies when a mocked turn was started by `input_audio.commit`
- once `idle` is restored, the same WebSocket session can start another mocked turn without reconnecting
+- once `idle` is restored, the same WebSocket session can start another placeholder push-to-talk turn without reconnecting

 More general future-state expectations: