No LLM decides where your input goes. Five stages of deterministic classification run in sequence. The first match wins.
This is a JavaScript approximation of the Python dispatcher. Type any query and see which tier would catch it, where it routes, and whether the LLM is called. Latency estimates are design targets, not benchmarked values.
Background tasks (BG1) use specialized models like deepseek-r1:8b for deep reasoning. You can see the "thinking" process in real-time before the final answer is rendered.
When the LLM is unavailable—Ollama crashes, VRAM fills up, model corrupts—Herald continues operating. The deterministic brain never stops. Responses lose their natural language polish but remain functionally correct.
This is the Herald pipeline—the rendering layer. Skeptic adds a cognitive layer above it: a persistent world-state model, a planner, specialist model seats, and an evidence-grounded judge that feed into this pipeline. See the World Model page for the full Concept D architecture.
The question isn't whether this works.
The speedup across 171 modules is measurable.