feat(protocol): add shared WebSocket contract package
This commit is contained in:
@@ -11,7 +11,7 @@ Establish the boundaries, protocol, and state model for the system before integr
|
||||
### Backlog Items
|
||||
|
||||
- [x] define repository structure for `vela-ui` and `vela-gateway`
|
||||
- define the WebSocket event contract used by the UI and gateway
|
||||
- [x] define the WebSocket event contract used by the UI and gateway via shared package
|
||||
- define the session state machine and interrupt semantics
|
||||
- define provider adapter interfaces for STT, LLM, TTS, and tools
|
||||
- document error handling and cancellation behavior
|
||||
@@ -180,4 +180,5 @@ Polish the system after the core voice loop is reliable.
|
||||
|
||||
- `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page
|
||||
- `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints
|
||||
- `apps/vela-protocol` now provides the shared WebSocket event contract for the UI and gateway
|
||||
- backend framework choice is now concrete: Fastify
|
||||
|
||||
@@ -2,31 +2,87 @@
|
||||
|
||||
## Event Protocol
|
||||
|
||||
The shared code-level contract lives in the Yarn workspace package `@vela/protocol` so both the
|
||||
gateway and UI import the same event names and envelope shape.
|
||||
|
||||
## WebSocket Message Envelope
|
||||
|
||||
Every WebSocket message uses one envelope format:
|
||||
|
||||
```ts
|
||||
type MessageEnvelope<TType extends string, TPayload> = {
|
||||
type: TType;
|
||||
payload: TPayload;
|
||||
};
|
||||
```
|
||||
|
||||
This increment intentionally keeps the envelope minimal:
|
||||
|
||||
- `type` identifies the event
|
||||
- `payload` carries the event body
|
||||
- no sequence numbers, timestamps, or protocol version fields yet
|
||||
- future changes should be additive when possible
|
||||
|
||||
### Client → Server
|
||||
|
||||
```ts
|
||||
type ClientEvent =
|
||||
| { type: "start_listening" }
|
||||
| { type: "stop_listening" }
|
||||
| { type: "audio_chunk"; data: string } // PCM16 base64
|
||||
| { type: "interrupt" };
|
||||
| { type: "session.start"; payload: {} }
|
||||
| { type: "input_audio.append"; payload: { chunk: string } }
|
||||
| { type: "input_audio.commit"; payload: {} }
|
||||
| { type: "response.cancel"; payload: {} };
|
||||
```
|
||||
|
||||
#### Client event intent
|
||||
|
||||
- `session.start` initializes a voice session without locking in transport or auth details yet
|
||||
- `input_audio.append` carries a chunk of captured input audio as an encoded string
|
||||
- `input_audio.commit` marks the current buffered user turn as ready for downstream processing
|
||||
- `response.cancel` interrupts the active listen/think/speak flow
|
||||
|
||||
### Server → Client
|
||||
|
||||
```ts
|
||||
type ServerEvent =
|
||||
| { type: "state"; value: "idle" | "listening" | "thinking" | "speaking" }
|
||||
| { type: "partial_transcript"; text: string }
|
||||
| { type: "final_transcript"; text: string }
|
||||
| { type: "assistant_text_delta"; text: string }
|
||||
| { type: "tool_call_started"; tool: string }
|
||||
| { type: "tool_call_finished"; tool: string; result: unknown }
|
||||
| { type: "tts_audio_chunk"; data: string }
|
||||
| { type: "assistant_done" }
|
||||
| { type: "error"; message: string };
|
||||
| { type: "session.ready"; payload: { sessionId: string } }
|
||||
| {
|
||||
type: "session.state";
|
||||
payload: { value: "idle" | "listening" | "thinking" | "speaking" };
|
||||
}
|
||||
| { type: "transcript.partial"; payload: { text: string } }
|
||||
| { type: "transcript.final"; payload: { text: string } }
|
||||
| { type: "response.text.delta"; payload: { text: string } }
|
||||
| { type: "response.completed"; payload: {} }
|
||||
| {
|
||||
type: "error";
|
||||
payload: { code: string; message: string; retryable?: boolean };
|
||||
};
|
||||
```
|
||||
|
||||
#### Server event intent
|
||||
|
||||
- `session.ready` confirms that the gateway created a session identity
|
||||
- `session.state` exposes the coarse session phase needed by the later UI shell
|
||||
- `transcript.partial` and `transcript.final` support incremental and completed user text display
|
||||
- `response.text.delta` supports streamed assistant text without committing to audio output details yet
|
||||
- `response.completed` marks the current assistant turn as done
|
||||
- `error` is the minimal recoverable failure shape for both UI and gateway work
|
||||
|
||||
## Contract Scope for This Increment
|
||||
|
||||
This contract is intentionally limited to the smallest event set needed to unblock:
|
||||
|
||||
- the later gateway WebSocket session skeleton
|
||||
- the later UI voice-session shell
|
||||
|
||||
Explicitly deferred for later increments:
|
||||
|
||||
- tool-calling events
|
||||
- streamed TTS/output-audio events
|
||||
- reconnect/resume semantics
|
||||
- protocol version negotiation
|
||||
- provider-specific metadata fields
|
||||
|
||||
## State Machine
|
||||
|
||||
```text
|
||||
@@ -37,13 +93,13 @@ idle
|
||||
→ idle
|
||||
```
|
||||
|
||||
Interrupt can occur at:
|
||||
`response.cancel` can occur at:
|
||||
|
||||
- listening → restart
|
||||
- thinking → cancel
|
||||
- speaking → stop immediately
|
||||
|
||||
## Interrupt Handling Requirements
|
||||
## `response.cancel` Handling Requirements
|
||||
|
||||
- immediate stop of TTS playback
|
||||
- immediate stop of LLM streaming
|
||||
@@ -51,12 +107,14 @@ Interrupt can occur at:
|
||||
|
||||
### Mechanism
|
||||
|
||||
The `interrupt` event cancels:
|
||||
The `response.cancel` event cancels:
|
||||
|
||||
- TTS process
|
||||
- current LLM request
|
||||
- tool execution when possible
|
||||
|
||||
This shared contract uses `response.cancel` consistently for that cancellation signal.
|
||||
|
||||
## Protocol Notes for Implementation
|
||||
|
||||
- keep the protocol backward compatible when possible
|
||||
|
||||
Reference in New Issue
Block a user