From Constrained LLM Renderer to Local Assistant Operating System
|
Every AI agent framework puts the LLM at the center of every decision. Herald inverts this entirely — a 5-stage deterministic cascade routes, selects tools, and assembles facts across ten specialist model seats. The LLM renders once, at the end, constrained to rephrasing what the code already knows. Sequential relay manages VRAM so all ten seats share a single consumer GPU.
The question isn't whether this works.
Herald is the pattern. Skeptic is the operating system. 10 model seats. Zero cloud dependency.