Herald / Skeptic -- From Constrained LLM Renderer to Local Assistant Operating System

Technical Deep-Dive

How Herald works, step by step.

⬢

Explore the full architecture in 3D

Orbit, zoom, and click through every component. See how data flows from input to output.

→

The Pipeline

Ingress

Raw input accepted

Normalizer

Wake word stripped, profile mapped

Dispatcher

Intent classified deterministically

Lane

Realtime or background worker

Compiler

Facts assembled into packet

Renderer

Facts rephrased into speech

Formatter

Internal tokens cleaned

Output

Delivered to sink

5-Stage Deterministic Routing

No LLM decides where your input goes. Five stages of deterministic classification run in sequence. The first match wins.

Stage 1: Exact MatchO(1)

Hardcoded phrases. "time", "status", "help" match instantly. No inference, no scoring.

Stage 2: Strict DeterministicO(1)

Signal-based pattern matching (e.g. asks_time, asks_cancel, asks_weather).

Stage 3: Soft DeterministicO(n)

Synonym expansion and entity extraction for greetings, identity, and app operations.

Stage 4: Semantic MatchVector

Embedding-based intent matching via nomic-embed-text for fuzzy intent detection.

Stage 5: Classifier FallbackO(n)

Keyword router that assigns unhandled queries to BG1 specialists or general_chat.

Try It: Dispatcher Playground (Simulation)

This is a JavaScript approximation of the Python dispatcher. Type any query and see which tier would catch it, where it routes, and whether the LLM is called. Latency estimates are design targets, not benchmarked values.

MatchedStage 2: Strict Deterministic

Patterntime

Routed toRealtime

LLM calledNo

Est. latency<10ms

Real-Time BG1 Streaming

Background tasks (BG1) use specialized models like deepseek-r1:8b for deep reasoning. You can see the "thinking" process in real-time before the final answer is rendered.

BG1 Specialist: deepseek-r1:8b

[System] Routing to Research Specialist...

[System] Context loaded (128k window).

[BG1 thinking] The user is asking about quantum computing. I need to explain the core concepts of qubits, superposition, and entanglement. I should also mention current limitations like decoherence and the state of NISQ devices. Searching evidence store for "quantum computing basics"... Found 3 corroborated claims.

Quantum computing leverages the principles of quantum mechanics to perform calculations that are currently impossible for classical computers...

Degraded Mode

When the LLM is unavailable—Ollama crashes, VRAM fills up, model corrupts—Herald continues operating. The deterministic brain never stops. Responses lose their natural language polish but remain functionally correct.

Skeptic: The Cognitive Layer

This is the Herald pipeline—the rendering layer. Skeptic adds a cognitive layer above it: a persistent world-state model, a planner, specialist model seats, and an evidence-grounded judge that feed into this pipeline. See the World Model page for the full Concept D architecture.

The question isn't whether this works.

The speedup across 171 modules is measurable.

View benchmarks →Back to home