How Herald works, step by step.

Explore the full architecture in 3D
Orbit, zoom, and click through every component. See how data flows from input to output.

The Pipeline

Ingress
Raw input accepted
Normalizer
Wake word stripped, profile mapped
Dispatcher
Intent classified deterministically
Lane
Realtime or background worker
Compiler
Facts assembled into packet
Renderer
Facts rephrased into speech
Formatter
Internal tokens cleaned
Output
Delivered to sink

5-Stage Deterministic Routing

No LLM decides where your input goes. Five stages of deterministic classification run in sequence. The first match wins.

Stage 1: Exact MatchO(1)
Hardcoded phrases. "time", "status", "help" match instantly. No inference, no scoring.
Stage 2: Strict DeterministicO(1)
Signal-based pattern matching (e.g. asks_time, asks_cancel, asks_weather).
Stage 3: Soft DeterministicO(n)
Synonym expansion and entity extraction for greetings, identity, and app operations.
Stage 4: Semantic MatchVector
Embedding-based intent matching via nomic-embed-text for fuzzy intent detection.
Stage 5: Classifier FallbackO(n)
Keyword router that assigns unhandled queries to BG1 specialists or general_chat.

Try It: Dispatcher Playground (Simulation)

This is a JavaScript approximation of the Python dispatcher. Type any query and see which tier would catch it, where it routes, and whether the LLM is called. Latency estimates are design targets, not benchmarked values.

>
MatchedStage 2: Strict Deterministic
Patterntime
Routed toRealtime
LLM calledNo
Est. latency<10ms

Real-Time BG1 Streaming

Background tasks (BG1) use specialized models like deepseek-r1:8b for deep reasoning. You can see the "thinking" process in real-time before the final answer is rendered.

BG1 Specialist: deepseek-r1:8b
[System] Routing to Research Specialist...
[System] Context loaded (128k window).
[BG1 thinking] The user is asking about quantum computing. I need to explain the core concepts of qubits, superposition, and entanglement. I should also mention current limitations like decoherence and the state of NISQ devices. Searching evidence store for "quantum computing basics"... Found 3 corroborated claims.
Quantum computing leverages the principles of quantum mechanics to perform calculations that are currently impossible for classical computers...
Degraded Mode

When the LLM is unavailable—Ollama crashes, VRAM fills up, model corrupts—Herald continues operating. The deterministic brain never stops. Responses lose their natural language polish but remain functionally correct.

Skeptic: The Cognitive Layer

This is the Herald pipeline—the rendering layer. Skeptic adds a cognitive layer above it: a persistent world-state model, a planner, specialist model seats, and an evidence-grounded judge that feed into this pipeline. See the World Model page for the full Concept D architecture.

The question isn't whether this works.

The speedup across 171 modules is measurable.