Every major AI agent framework operates on the same assumption: the LLM is the brain. It receives user input, decides which tools to call, formats arguments, reads results, decides what to do next, and composes the final answer. The LLM is in the critical path of every single decision.
~
Slow
2+ LLM round-trips per query. Claude Code averages 3-8 calls per turn. ChatGPT chains multiple calls for tool use. Every question waits for the model to think, decide, act, then think again.
$
Expensive
Tokens burned on every routing decision, every tool selection, every intermediate reasoning step. Cloud APIs charge per token. Herald runs 10 local model seats on a single consumer GPU at zero API cost.
?
Unreliable
LLMs hallucinate tool calls, pick wrong functions, format arguments incorrectly, go off-script.
#
Unauditable
When something breaks, you're debugging a black box. Why did the model pick that tool with those args?
!
Fragile
Model goes down, VRAM fills up, API rate-limits. The entire system is dead. Herald's sequential relay and degraded mode mean the deterministic brain never stops.