Add a minimal UI shell that connects to the gateway WebSocket and exposes developer-visible session state. Align the architecture, protocol, setup, integration, and backlog docs with the current UI increment.
109 lines
2.5 KiB
Markdown
109 lines
2.5 KiB
Markdown
# Vela Integrations and Tool Safety
|
|
|
|
## Current Runtime Baseline
|
|
|
|
- `vela-ui` is implemented as a SvelteKit application
|
|
- `vela-gateway` is implemented as a Fastify service
|
|
- `vela-gateway` now exposes `/ws` as the minimal WebSocket session entrypoint using the shared `@vela/protocol` contract
|
|
- `vela-ui` now opens a minimal browser WebSocket client against that `/ws` entrypoint and surfaces connection/session status for developers
|
|
- current integration work beyond the gateway WebSocket/session baseline remains future implementation
|
|
|
|
## Gateway Session Contract
|
|
|
|
- transport: WebSocket on `/ws`
|
|
- session storage: in-memory only, one ephemeral record per live connection
|
|
- message format: `@vela/protocol` `MessageEnvelope<{ type, payload }>`
|
|
- current server behavior: acknowledge connect with `session.ready` and `session.state`
|
|
- safety baseline: invalid JSON, invalid envelopes, and malformed frames return protocol errors or close that socket without taking down the service
|
|
- current UI behavior: connect/disconnect only, no microphone access, no audio payloads, and safe error-state handling for `open`/`error`/`close`
|
|
|
|
## STT (Speech-to-Text)
|
|
|
|
### Primary Option
|
|
|
|
- `whisper.cpp`
|
|
|
|
### Deployment
|
|
|
|
- start on NanoPi
|
|
- move to NAS if latency is insufficient
|
|
|
|
### Requirements
|
|
|
|
- streaming transcription
|
|
- partial and final output
|
|
- low latency, with sub-second response preferred
|
|
|
|
## TTS (Text-to-Speech)
|
|
|
|
### Engine
|
|
|
|
- Kokoro TTS
|
|
|
|
### Deployment
|
|
|
|
- prefer NAS for more compute headroom
|
|
|
|
### API Contract
|
|
|
|
```http
|
|
POST /speak
|
|
{
|
|
"text": "...",
|
|
"voice": "vela",
|
|
"format": "wav"
|
|
}
|
|
```
|
|
|
|
### Requirements
|
|
|
|
- streaming audio preferred
|
|
- low startup latency
|
|
- interrupt support
|
|
|
|
## Tool System
|
|
|
|
### Home Assistant Tool
|
|
|
|
#### Functions
|
|
|
|
```ts
|
|
turn_on(entity_id);
|
|
turn_off(entity_id);
|
|
set_temperature(entity_id, value);
|
|
get_state(entity_id);
|
|
```
|
|
|
|
#### Backend
|
|
|
|
- REST API
|
|
- optional Conversation API
|
|
|
|
#### Safety
|
|
|
|
- require confirmation for destructive actions
|
|
- require confirmation for irreversible or significant state changes
|
|
- keep secrets server-side only
|
|
|
|
### SearXNG Tool
|
|
|
|
#### Endpoint
|
|
|
|
```http
|
|
GET /search?q=...&format=json
|
|
```
|
|
|
|
#### Flow
|
|
|
|
- query SearXNG
|
|
- return top results
|
|
- let the LLM summarize the result set
|
|
|
|
## Safety Rules
|
|
|
|
- the LLM does not directly control systems
|
|
- all external actions go through explicit tool adapters
|
|
- Home Assistant write actions require confirmation
|
|
- frontend must not contain Home Assistant tokens or other secrets
|
|
- ambiguous tool intents should be clarified instead of guessed
|