Herald inverts the conventional relationship between LLMs and application logic. A 5-stage deterministic cascade handles every routing decision across ten specialist model seats. The LLM is called once, at the very end, to make the answer sound human. Sequential relay manages VRAM so all seats share a single GPU.
The LLM receives an immutable packet of pre-assembled facts from the response compiler. It can rephrase them into natural speech. It cannot add new facts, call tools, change the routing decision, or reason about what to do next. Ten model seats, one rendering boundary. The LLM is a rendering engine for human-readable text—nothing more.
The question isn't whether this works.
46k+ lines of deterministic code. Numbers back this up.