feat: bootstrap vela UI and gateway workspace
Establish the monorepo, tooling, and starter apps so UI and gateway development can begin from a documented, runnable baseline.
This commit is contained in:
129
docs/architecture.md
Normal file
129
docs/architecture.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# Vela Architecture
|
||||
|
||||
## High-Level Architecture
|
||||
|
||||
```text
|
||||
[ Browser (PWA UI) ]
|
||||
|
|
||||
WebSocket
|
||||
|
|
||||
[ Vela Gateway (NanoPi R6S) ]
|
||||
|
|
||||
+--> STT (local or NAS)
|
||||
+--> Ollama (NAS GPU)
|
||||
+--> Kokoro TTS (NAS or NanoPi)
|
||||
+--> Home Assistant
|
||||
+--> SearXNG
|
||||
```
|
||||
|
||||
## Core Components
|
||||
|
||||
## Repository Structure
|
||||
|
||||
```text
|
||||
apps/
|
||||
vela-ui/
|
||||
vela-gateway/
|
||||
```
|
||||
|
||||
The repository now includes separate runnable workspaces for the UI and gateway so implementation can proceed independently while staying aligned through shared documentation.
|
||||
|
||||
### Frontend — `vela-ui`
|
||||
|
||||
#### Tech
|
||||
|
||||
- SvelteKit
|
||||
- PWA enabled
|
||||
- WebSocket client
|
||||
|
||||
The current implementation is a minimal SvelteKit app with a single starter page. PWA behavior, microphone capture, and the WebSocket client are later increments.
|
||||
|
||||
#### Responsibilities
|
||||
|
||||
- audio capture from microphone
|
||||
- audio playback for TTS
|
||||
- UI state rendering
|
||||
- session management
|
||||
- interrupt handling
|
||||
|
||||
#### Main Screen
|
||||
|
||||
- large mic button
|
||||
- live transcript
|
||||
- streamed assistant response text
|
||||
- state indicator:
|
||||
- idle
|
||||
- listening
|
||||
- thinking
|
||||
- speaking
|
||||
- interrupt button during speaking
|
||||
|
||||
### Backend — `vela-gateway`
|
||||
|
||||
#### Tech
|
||||
|
||||
- Fastify (Node)
|
||||
- WebSocket-based session layer
|
||||
|
||||
The current implementation is a minimal Fastify service with `/` and `/health` HTTP endpoints. The WebSocket session layer is a later increment.
|
||||
|
||||
#### Responsibilities
|
||||
|
||||
- session lifecycle
|
||||
- audio ingestion
|
||||
- STT orchestration
|
||||
- LLM orchestration
|
||||
- tool execution
|
||||
- TTS orchestration
|
||||
- event streaming
|
||||
|
||||
## Voice Pipeline
|
||||
|
||||
```text
|
||||
Mic → Gateway → STT → Transcript
|
||||
→ LLM → Tool Calls → Results
|
||||
→ LLM → Final Response
|
||||
→ TTS → Audio Stream → UI
|
||||
```
|
||||
|
||||
## Gateway Internal Flow
|
||||
|
||||
```text
|
||||
1. Receive audio
|
||||
2. Run STT (streaming)
|
||||
3. Emit partial transcripts
|
||||
4. On final:
|
||||
→ call LLM
|
||||
5. LLM decides:
|
||||
→ direct response OR tool call
|
||||
6. Execute tool
|
||||
7. Feed result back to LLM
|
||||
8. Generate final response
|
||||
9. Send text stream
|
||||
10. Send TTS stream
|
||||
```
|
||||
|
||||
## LLM Layer
|
||||
|
||||
### Location
|
||||
|
||||
- NAS with RTX 3050 8GB
|
||||
|
||||
### Role
|
||||
|
||||
- intent parsing
|
||||
- tool selection
|
||||
- response generation
|
||||
|
||||
### Constraints
|
||||
|
||||
- must use a tool-calling schema
|
||||
- must not directly control systems
|
||||
- target approximately 7B-class models because of hardware limits
|
||||
|
||||
## Naming
|
||||
|
||||
- system: **Vela**
|
||||
- gateway: `vela-gateway`
|
||||
- UI: `vela-ui`
|
||||
- voice profile: `vela-neutral`
|
||||
Reference in New Issue
Block a user