feat(vela): start mocked response flow after push-to-talk commit
This commit is contained in:
@@ -36,7 +36,7 @@ The repository now includes separate runnable workspaces for the UI and gateway
|
||||
- PWA enabled
|
||||
- WebSocket client
|
||||
|
||||
The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), expose mic control shell interactions that emit placeholder `input_audio.append` / `input_audio.commit` events, trigger one deterministic mocked turn while connected, render deterministic placeholder partial/final transcripts for the push-to-talk shell, and render the mocked user transcript plus mocked assistant response for the existing mocked-turn path. This remains a shell only: there is no real microphone capture, real provider integration, or audio playback yet.
|
||||
The current implementation is a minimal SvelteKit app with a single voice-session shell page. The shipped UI can open and close a browser WebSocket connection to the gateway `/ws` endpoint, show explicit connection status (`not connected`, `connecting`, `connected`, `disconnected`, `error`), expose mic control shell interactions that emit placeholder `input_audio.append` / `input_audio.commit` events, trigger one deterministic mocked turn while connected, render deterministic placeholder partial/final transcripts for the push-to-talk shell, and stream the mocked assistant response both for `mocked.turn.trigger` and for push-to-talk commits. This remains a shell only: there is no real microphone capture, real provider integration, or audio playback yet.
|
||||
|
||||
#### Responsibilities
|
||||
|
||||
@@ -104,8 +104,8 @@ The current implementation is a minimal Fastify service with `/`, `/health`, and
|
||||
- `GET /ws` documents the route for plain HTTP clients and returns `426 Upgrade Required`
|
||||
- WebSocket upgrades on `/ws` create an ephemeral session immediately
|
||||
- the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established
|
||||
- valid minimal client events, including placeholder `input_audio.append` / `input_audio.commit`, can move the session between `idle` and `listening`
|
||||
- placeholder `input_audio.append` emits deterministic mocked `transcript.partial` events and `input_audio.commit` emits one deterministic mocked `transcript.final`
|
||||
- valid minimal client events, including placeholder `input_audio.append` / `input_audio.commit`, can move the session through the mocked turn states on one socket
|
||||
- placeholder `input_audio.append` emits deterministic mocked `transcript.partial` events and `input_audio.commit` emits one deterministic mocked `transcript.final` before starting the existing mocked assistant response stream
|
||||
- `mocked.turn.trigger` drives a fixed transcript/response event sequence over the existing shared protocol
|
||||
- only one mocked turn is allowed in flight per session at a time
|
||||
- invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up
|
||||
@@ -116,13 +116,13 @@ The current implementation is a minimal Fastify service with `/`, `/health`, and
|
||||
- exposes connect, disconnect, mic-control shell interactions, and mocked-turn controls
|
||||
- does not request microphone permission or capture real microphone audio
|
||||
- only emits placeholder `input_audio.append` / `input_audio.commit` events; it does not send real audio data or play back audio
|
||||
- renders the latest placeholder partial transcript during a push-to-talk shell turn and replaces it with the final deterministic transcript on commit
|
||||
- renders the latest placeholder partial transcript during a push-to-talk shell turn, replaces it with the final deterministic transcript on commit, and appends streamed mocked assistant text for that same push-to-talk turn
|
||||
- reads mocked transcript and mocked response events from the shared protocol contract
|
||||
|
||||
## Voice Pipeline
|
||||
|
||||
```text
|
||||
Mic control shell / mocked turn button → Placeholder `input_audio.append` / `input_audio.commit` or mocked session flow → Deterministic transcript events → Mocked response text events when using mocked.turn.trigger → UI
|
||||
Mic control shell / mocked turn button → Placeholder `input_audio.append` / `input_audio.commit` or mocked session flow → Deterministic transcript events → Shared mocked response engine → Mocked response text events → UI
|
||||
```
|
||||
|
||||
This mocked vertical slice intentionally stands in for the future real pipeline:
|
||||
|
||||
@@ -190,9 +190,9 @@ Polish the system after the core voice loop is reliable.
|
||||
- `apps/vela-gateway` now exposes a minimal `/ws` WebSocket session skeleton with ephemeral in-memory sessions and defensive message handling
|
||||
- `apps/vela-gateway` now accepts `mocked.turn.trigger` and emits protocol-valid mocked transcript/response events with one in-flight mocked turn per session
|
||||
- `apps/vela-gateway` now supports placeholder input-audio append/commit cycles before running another mocked turn on the same socket
|
||||
- `apps/vela-gateway` now emits deterministic `transcript.partial` events for placeholder `input_audio.append` messages and exactly one deterministic `transcript.final` for each placeholder `input_audio.commit`
|
||||
- `apps/vela-ui` now renders the latest placeholder partial transcript during the push-to-talk shell turn and replaces it with the deterministic final transcript on commit
|
||||
- `apps/vela-ui` now exposes a cancel control for active mocked turns and keeps already-rendered transcript/response text visible after cancellation
|
||||
- `apps/vela-gateway` now honors `response.cancel` during mocked turns by stopping pending mocked response events, returning the session to `idle`, and allowing a new mocked turn on the same socket
|
||||
- `apps/vela-gateway` now emits deterministic `transcript.partial` events for placeholder `input_audio.append` messages and, after each accepted `input_audio.commit`, reuses the mocked response engine to stream a deterministic assistant reply for that push-to-talk turn
|
||||
- `apps/vela-ui` now renders the latest placeholder partial transcript during the push-to-talk shell turn, replaces it with the deterministic final transcript on commit, and shows streamed assistant text for the same push-to-talk flow
|
||||
- `apps/vela-ui` now exposes a cancel control for active mocked turns and mocked push-to-talk responses, and keeps already-rendered transcript/response text visible after cancellation
|
||||
- `apps/vela-gateway` now honors `response.cancel` during mocked turns and push-to-talk-triggered mocked responses by stopping pending mocked response events, returning the session to `idle`, and allowing a new turn on the same socket
|
||||
- `apps/vela-protocol` now provides the shared WebSocket event contract for the UI and gateway
|
||||
- backend framework choice is now concrete: Fastify
|
||||
|
||||
@@ -63,7 +63,7 @@ type ClientEvent =
|
||||
- a mocked turn emits deterministic `transcript.final`, `response.text.delta`, `response.completed`, and `session.state` events in protocol-valid order
|
||||
- `input_audio.append` updates the ephemeral session record and moves the session to `listening`
|
||||
- each accepted `input_audio.append` emits one deterministic `transcript.partial` for the current placeholder turn
|
||||
- `input_audio.commit` emits exactly one deterministic `transcript.final`, resets the minimal buffered state, and returns the session to `idle`
|
||||
- `input_audio.commit` emits exactly one deterministic `transcript.final` and then starts the same deterministic mocked assistant response stream used by `mocked.turn.trigger`
|
||||
- after a completed placeholder input cycle, the same socket can still send `mocked.turn.trigger`
|
||||
- `response.cancel` is safe to send even when no mocked turn is active
|
||||
- `response.cancel` stops any still-pending mocked turn events for the active turn and resets the minimal session state back to `idle`
|
||||
@@ -92,7 +92,7 @@ Notes:
|
||||
- the UI disables the mic control while disconnected, before `session.ready`, or while a mocked turn is already in flight
|
||||
- pressing the mic control sends one placeholder `input_audio.append` chunk and releasing it sends `input_audio.commit`
|
||||
- while a placeholder push-to-talk turn is in progress, the UI renders the latest `transcript.partial`
|
||||
- after placeholder commit, the UI renders the `transcript.final` and clears the partial-only display
|
||||
- after placeholder commit, the UI renders the `transcript.final`, clears the partial-only display, and streams the mocked assistant text from the downstream response events
|
||||
- the UI copy explicitly labels the mic button as a control shell and not real microphone capture
|
||||
- the UI shows a cancel control and enables it only while a mocked turn is active
|
||||
- after cancel returns the gateway to `idle`, the UI clears the active-turn indicator but keeps any transcript or response text that was already rendered
|
||||
@@ -147,9 +147,9 @@ Notes:
|
||||
- no audio, STT, LLM, TTS, or external providers participate in this flow
|
||||
- `response.cancel` can stop the mocked turn early, suppress any later mocked response events for that turn, and return the session to `idle`
|
||||
|
||||
### Deterministic placeholder push-to-talk transcript sequence
|
||||
### Deterministic placeholder push-to-talk transcript and mocked response sequence
|
||||
|
||||
For this increment, the existing mic-control shell still sends placeholder `input_audio.append` on press and `input_audio.commit` on release. The gateway now translates that shell flow into deterministic mocked transcript events only:
|
||||
For this increment, the existing mic-control shell still sends placeholder `input_audio.append` on press and `input_audio.commit` on release. The gateway now translates that shell flow into deterministic mocked transcript events and then reuses the existing mocked response stream:
|
||||
|
||||
```text
|
||||
input_audio.append #1
|
||||
@@ -161,6 +161,11 @@ input_audio.append #N (N > 1)
|
||||
|
||||
input_audio.commit after N appends
|
||||
→ transcript.final("[mocked final] Placeholder push-to-talk transcript completed from N appended chunk(s).")
|
||||
→ session.state(thinking)
|
||||
→ session.state(speaking)
|
||||
→ response.text.delta("[mocked assistant] ")
|
||||
→ response.text.delta("This is a deterministic mocked response from the gateway vertical slice.")
|
||||
→ response.completed
|
||||
→ session.state(idle)
|
||||
```
|
||||
|
||||
@@ -168,7 +173,8 @@ Safe deterministic edge cases for this mocked placeholder flow:
|
||||
|
||||
- commit without any prior append is accepted and emits `transcript.final("[mocked final] Placeholder push-to-talk transcript completed without appended audio.")`
|
||||
- repeated appends during one placeholder turn are accepted and each append replaces the latest partial transcript with a chunk-count-based deterministic value
|
||||
- placeholder commit does not automatically start assistant thinking, response streaming, or audio playback
|
||||
- after the final transcript, placeholder commit follows the same mocked `thinking → speaking → response.text.delta* → response.completed → idle` path as `mocked.turn.trigger`
|
||||
- `response.cancel` can interrupt this mocked post-commit response path the same way it interrupts `mocked.turn.trigger`; already-rendered transcript or assistant text is not retracted
|
||||
|
||||
## Contract Scope for This Increment
|
||||
|
||||
@@ -200,6 +206,7 @@ Current mocked-pipeline behavior:
|
||||
|
||||
- during an active mocked turn, `response.cancel` returns the session to `idle` immediately
|
||||
- any mocked turn timers that have not fired yet are dropped, so no later `response.text.delta` or `response.completed` events are emitted for the cancelled turn
|
||||
- the same cancellation behavior applies when a mocked turn was started by `input_audio.commit`
|
||||
- once `idle` is restored, the same WebSocket session can start another mocked turn without reconnecting
|
||||
|
||||
More general future-state expectations:
|
||||
|
||||
Reference in New Issue
Block a user