Establish the monorepo, tooling, and starter apps so UI and gateway development can begin from a documented, runnable baseline.
130 lines
2.3 KiB
Markdown
130 lines
2.3 KiB
Markdown
# Vela Architecture
|
|
|
|
## High-Level Architecture
|
|
|
|
```text
|
|
[ Browser (PWA UI) ]
|
|
|
|
|
WebSocket
|
|
|
|
|
[ Vela Gateway (NanoPi R6S) ]
|
|
|
|
|
+--> STT (local or NAS)
|
|
+--> Ollama (NAS GPU)
|
|
+--> Kokoro TTS (NAS or NanoPi)
|
|
+--> Home Assistant
|
|
+--> SearXNG
|
|
```
|
|
|
|
## Core Components
|
|
|
|
## Repository Structure
|
|
|
|
```text
|
|
apps/
|
|
vela-ui/
|
|
vela-gateway/
|
|
```
|
|
|
|
The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
|
|
|
|
### Frontend — `vela-ui`
|
|
|
|
#### Tech
|
|
|
|
- SvelteKit
|
|
- PWA enabled
|
|
- WebSocket client
|
|
|
|
The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments.
|
|
|
|
#### Responsibilities
|
|
|
|
- audio capture from microphone
|
|
- audio playback for TTS
|
|
- UI state rendering
|
|
- session management
|
|
- interrupt handling
|
|
|
|
#### Main Screen
|
|
|
|
- large mic button
|
|
- live transcript
|
|
- streamed assistant response text
|
|
- state indicator:
|
|
- idle
|
|
- listening
|
|
- thinking
|
|
- speaking
|
|
- interrupt button during speaking
|
|
|
|
### Backend — `vela-gateway`
|
|
|
|
#### Tech
|
|
|
|
- Fastify (Node)
|
|
- WebSocket-based session layer
|
|
|
|
The current implementation is a minimal Fastify service with `/` and `/health` HTTP endpoints. The WebSocket session layer is a later increment.
|
|
|
|
#### Responsibilities
|
|
|
|
- session lifecycle
|
|
- audio ingestion
|
|
- STT orchestration
|
|
- LLM orchestration
|
|
- tool execution
|
|
- TTS orchestration
|
|
- event streaming
|
|
|
|
## Voice Pipeline
|
|
|
|
```text
|
|
Mic → Gateway → STT → Transcript
|
|
→ LLM → Tool Calls → Results
|
|
→ LLM → Final Response
|
|
→ TTS → Audio Stream → UI
|
|
```
|
|
|
|
## Gateway Internal Flow
|
|
|
|
```text
|
|
1. Receive audio
|
|
2. Run STT (streaming)
|
|
3. Emit partial transcripts
|
|
4. On final:
|
|
→ call LLM
|
|
5. LLM decides:
|
|
→ direct response OR tool call
|
|
6. Execute tool
|
|
7. Feed result back to LLM
|
|
8. Generate final response
|
|
9. Send text stream
|
|
10. Send TTS stream
|
|
```
|
|
|
|
## LLM Layer
|
|
|
|
### Location
|
|
|
|
- NAS with RTX 3050 8GB
|
|
|
|
### Role
|
|
|
|
- intent parsing
|
|
- tool selection
|
|
- response generation
|
|
|
|
### Constraints
|
|
|
|
- must use a tool-calling schema
|
|
- must not directly control systems
|
|
- target approximately 7B-class models because of hardware limits
|
|
|
|
## Naming
|
|
|
|
- system: **Vela**
|
|
- gateway: `vela-gateway`
|
|
- UI: `vela-ui`
|
|
- voice profile: `vela-neutral`
|