feat(vela-gateway): add websocket session skeleton

This commit is contained in:
2026-04-08 18:30:21 +02:00
parent 4fd27db11e
commit fa5a458003
8 changed files with 655 additions and 7 deletions

View File

@@ -65,7 +65,7 @@ The current implementation is a minimal SvelteKit app with a single starter page
- Fastify (Node)
- WebSocket-based session layer
The current implementation is a minimal Fastify service with `/` and `/health` HTTP endpoints. The WebSocket session layer is a later increment.
The current implementation is a minimal Fastify service with `/`, `/health`, and a documented `/ws` WebSocket session endpoint. The gateway keeps one ephemeral in-memory session record per live socket connection and removes it on disconnect.
#### Responsibilities
@@ -77,6 +77,14 @@ The current implementation is a minimal Fastify service with `/` and `/health` H
- TTS orchestration
- event streaming
#### Current WebSocket skeleton
- `GET /ws` documents the route for plain HTTP clients and returns `426 Upgrade Required`
- WebSocket upgrades on `/ws` create an ephemeral session immediately
- the gateway sends `session.ready` followed by `session.state` (`idle`) when the socket is established
- valid minimal client events can move the session between `idle` and `listening`
- invalid JSON, invalid envelopes, and malformed frames are handled defensively so the process stays up
## Voice Pipeline
```text

View File

@@ -34,7 +34,7 @@ Prove the end-to-end interaction model with mocked or stubbed providers.
- [x] bootstrap `vela-ui` as a runnable SvelteKit app in the Yarn workspace
- [x] bootstrap `vela-gateway` as a runnable Fastify app in the Yarn workspace
- create a minimal UI with mic control, state indicator, transcript, and response text
- create a gateway WebSocket session skeleton
- [x] create a gateway WebSocket session skeleton
- implement mocked STT flow for partial and final transcript events
- implement mocked LLM response streaming
- implement stubbed audio playback or placeholder TTS output
@@ -180,5 +180,6 @@ Polish the system after the core voice loop is reliable.
- `apps/vela-ui` now boots as a minimal SvelteKit app with a starter page
- `apps/vela-gateway` now boots as a minimal Fastify app with `/` and `/health` endpoints
- `apps/vela-gateway` now exposes a minimal `/ws` WebSocket session skeleton with ephemeral in-memory sessions and defensive message handling
- `apps/vela-protocol` now provides the shared WebSocket event contract for the UI and gateway
- backend framework choice is now concrete: Fastify

View File

@@ -4,7 +4,16 @@
- `vela-ui` is implemented as a SvelteKit application
- `vela-gateway` is implemented as a Fastify service
- current integration work beyond the gateway HTTP baseline remains future implementation
- `vela-gateway` now exposes `/ws` as the minimal WebSocket session entrypoint using the shared `@vela/protocol` contract
- current integration work beyond the gateway WebSocket/session baseline remains future implementation
## Gateway Session Contract
- transport: WebSocket on `/ws`
- session storage: in-memory only, one ephemeral record per live connection
- message format: `@vela/protocol` `MessageEnvelope<{ type, payload }>`
- current server behavior: acknowledge connect with `session.ready` and `session.state`
- safety baseline: invalid JSON, invalid envelopes, and malformed frames return protocol errors or close that socket without taking down the service
## STT (Speech-to-Text)

View File

@@ -5,6 +5,12 @@
The shared code-level contract lives in the Yarn workspace package `@vela/protocol` so both the
gateway and UI import the same event names and envelope shape.
Current gateway baseline:
- WebSocket endpoint: `/ws`
- the gateway sends `session.ready` and `session.state` immediately after a successful socket upgrade
- the gateway accepts JSON text messages only in the shared envelope shape
## WebSocket Message Envelope
Every WebSocket message uses one envelope format:
@@ -40,6 +46,17 @@ type ClientEvent =
- `input_audio.commit` marks the current buffered user turn as ready for downstream processing
- `response.cancel` interrupts the active listen/think/speak flow
### Current skeleton behavior
- on connect, the gateway creates an ephemeral in-memory session and emits `session.ready` plus `session.state`
- `session.start` is accepted as an idempotent session acknowledgment and re-sends readiness/state
- `input_audio.append` updates the ephemeral session record and moves the session to `listening`
- `input_audio.commit` resets the minimal buffered state and returns the session to `idle`
- `response.cancel` resets the minimal session state back to `idle`
- malformed JSON produces `error` with code `invalid_json`
- invalid envelopes or unsupported client event names produce `error` with code `invalid_message`
- malformed WebSocket frames are rejected without crashing the gateway process
### Server → Client
```ts