Move the LLM from the center to the edge.

Herald inverts the conventional relationship between LLMs and application logic. A 5-stage deterministic cascade handles every routing decision across ten specialist model seats. The LLM is called once, at the very end, to make the answer sound human. Sequential relay manages VRAM so all seats share a single GPU.

Traditional: LLM-as-Brain
0ms
Input
LLM: Intent
LLM: Pick tool
LLM: Format args
Tool exec
LLM: Read result
LLM: Compose
Output
Herald: LLM-as-Renderer
0ms
Input
5-Stage Cascade
Code: Tool
Fact Compiler
LLM: Render
Output
5 LLM calls
~2.4s per turn
vs
0-1 LLM calls
<30ms per turn
The key constraint

The LLM receives an immutable packet of pre-assembled facts from the response compiler. It can rephrase them into natural speech. It cannot add new facts, call tools, change the routing decision, or reason about what to do next. Ten model seats, one rendering boundary. The LLM is a rendering engine for human-readable text—nothing more.

The question isn't whether this works.

46k+ lines of deterministic code. Numbers back this up.