Esoteric v0.2 uses ten specialist model seats. No single model does everything. Each seat has a specific purpose, latency target, and activation condition.
This is the opposite of "one LLM to rule them all." The renderer speaks. The code specialist reasons about code. The vision specialist sees. The verifier judges. Code orchestrates all of them.
gemma4:e2b
Fallback: gemma4:e4b
Latency:<50ms
Fast realtime responses, simple routing, event watching
Majority of turns
gemma4:e2b
Latency:<2s
Screen capture, OCR, visual understanding for quick tasks
On-demand visual queries
qwen3-vl:8b
Latency:Background
Heavy visual analysis, detailed image understanding
BG1 worker lane
deepcoder:14b
Latency:Background
Code generation, debugging, analysis, refactoring (Pass 1)
BG1 worker lane
rnj-1:8b
Latency:Background
Reviews and profiles deepcoder output (Pass 2 of Sequential Relay)
BG1 worker lane
deepseek-r1:8b
Latency:Background
Deep reasoning for math, programming logic, and research (300s timeout)
BG1 worker lane
nomic-embed-text-v2-moe
Latency:<200ms
Semantic search, RAG retrieval, intent miss analysis
Memory enrichment
gemma4:e4b
Latency:<100ms
Evidence checking, claim tagging via deterministic code (judge.py)
Pre-output verification
small.en
Latency:<200ms
Local whisper inference for realtime voice ingress
Voice input lane
Kokoro-82M
Latency:<500ms
High-quality neural TTS via ONNX runtime
Voice output lane
Realtime Lane Priority
Renderer handles 70-85% of turns without calling other seats
BG1 Sequential Relay
Code generation (14b) followed by automated review (8b) for maximum reliability
VRAM Keepalive Strategy
2h keepalive for renderer/embedding; 0s for BG1 specialists to free VRAM immediately
Fallback Chain
Renderer preferred (e2b) → fallback (e4b) → deterministic text if all models unavailable
Latency Targets by Seat
Code Specialist
Background
Logic Specialist
Background
Renderer target is sub-50ms for instant responses. BG1 specialists run asynchronously with progress updates at 15%, 60%, 95%.
Model Seat Orchestration
Realtime Lane
Renderer (e2b/e4b)
Embedding
Verifier
│
BG1 Worker
Code (14b) → Review (8b)
Vision (8b)
Logic (8b)
The 5-stage dispatcher routes to realtime lane for quick tasks (70-85% of turns). Heavy tasks enter BG1 queue and activate specialist seats on-demand, including the sequential code relay and deep reasoning logic specialist.