3.6 KiB
3.6 KiB
Vela Protocol and State Machine
Event Protocol
The shared code-level contract lives in the Yarn workspace package @vela/protocol so both the
gateway and UI import the same event names and envelope shape.
WebSocket Message Envelope
Every WebSocket message uses one envelope format:
type MessageEnvelope<TType extends string, TPayload> = {
type: TType;
payload: TPayload;
};
This increment intentionally keeps the envelope minimal:
typeidentifies the eventpayloadcarries the event body- no sequence numbers, timestamps, or protocol version fields yet
- future changes should be additive when possible
Client → Server
type ClientEvent =
| { type: "session.start"; payload: {} }
| { type: "input_audio.append"; payload: { chunk: string } }
| { type: "input_audio.commit"; payload: {} }
| { type: "response.cancel"; payload: {} };
Client event intent
session.startinitializes a voice session without locking in transport or auth details yetinput_audio.appendcarries a chunk of captured input audio as an encoded stringinput_audio.commitmarks the current buffered user turn as ready for downstream processingresponse.cancelinterrupts the active listen/think/speak flow
Server → Client
type ServerEvent =
| { type: "session.ready"; payload: { sessionId: string } }
| {
type: "session.state";
payload: { value: "idle" | "listening" | "thinking" | "speaking" };
}
| { type: "transcript.partial"; payload: { text: string } }
| { type: "transcript.final"; payload: { text: string } }
| { type: "response.text.delta"; payload: { text: string } }
| { type: "response.completed"; payload: {} }
| {
type: "error";
payload: { code: string; message: string; retryable?: boolean };
};
Server event intent
session.readyconfirms that the gateway created a session identitysession.stateexposes the coarse session phase needed by the later UI shelltranscript.partialandtranscript.finalsupport incremental and completed user text displayresponse.text.deltasupports streamed assistant text without committing to audio output details yetresponse.completedmarks the current assistant turn as doneerroris the minimal recoverable failure shape for both UI and gateway work
Contract Scope for This Increment
This contract is intentionally limited to the smallest event set needed to unblock:
- the later gateway WebSocket session skeleton
- the later UI voice-session shell
Explicitly deferred for later increments:
- tool-calling events
- streamed TTS/output-audio events
- reconnect/resume semantics
- protocol version negotiation
- provider-specific metadata fields
State Machine
idle
→ listening
→ thinking
→ speaking
→ idle
response.cancel can occur at:
- listening → restart
- thinking → cancel
- speaking → stop immediately
response.cancel Handling Requirements
- immediate stop of TTS playback
- immediate stop of LLM streaming
- reset session state to listening or idle, depending on UX decision
Mechanism
The response.cancel event cancels:
- TTS process
- current LLM request
- tool execution when possible
This shared contract uses response.cancel consistently for that cancellation signal.
Protocol Notes for Implementation
- keep the protocol backward compatible when possible
- prefer additive event changes over breaking renames
- document protocol updates in this file whenever implementation changes behavior
- when implementation diverges from the initial contract, update this document in the same change