Herald / Skeptic -- From Constrained LLM Renderer to Local Assistant Operating System

The Inversion

Move the LLM from the center to the edge.

Herald inverts the conventional relationship between LLMs and application logic. A 5-stage deterministic cascade handles every routing decision across ten specialist model seats. The LLM is called once, at the very end, to make the answer sound human. Sequential relay manages VRAM so all seats share a single GPU.

Traditional: LLM-as-Brain

0ms

Input

LLM: Intent

LLM: Pick tool

LLM: Format args

Tool exec

LLM: Read result

LLM: Compose

Output

Herald: LLM-as-Renderer

0ms

Input

5-Stage Cascade

Code: Tool

Fact Compiler

LLM: Render

Output

5 LLM calls

~2.4s per turn

0-1 LLM calls

<30ms per turn

The key constraint

The LLM receives an immutable packet of pre-assembled facts from the response compiler. It can rephrase them into natural speech. It cannot add new facts, call tools, change the routing decision, or reason about what to do next. Ten model seats, one rendering boundary. The LLM is a rendering engine for human-readable text—nothing more.

The question isn't whether this works.

46k+ lines of deterministic code. Numbers back this up.

View benchmarks →Back to home