Herald / Skeptic -- From Constrained LLM Renderer to Local Assistant Operating System

Concept D Architecture

The World Model: A persistent internal model of reality.

Most AI assistants are stateless turn-by-turn chatbots. They forget what just happened. They don't track what tools are available. They can't tell you what they're currently working on. Skeptic is different. It maintains a persistent, structured world-state model that tracks everything the system knows about reality — updated after every event, read by every decision.

World-State Fields

User Profile

Speaker identity, permission level, preferences, interaction history. The system knows who it is talking to.

Current Task Stack

Active tasks ordered by priority. What the system is doing right now and what is queued next.

Open Background Jobs

BG1 worker status, progress percentage, estimated completion. Heavy tasks tracked in real time.

Recent Failures

Last N failures with timestamps, error types, and affected subsystems. Patterns surface automatically.

Device Status

Microphone availability, speaker state, screen capture readiness, active window metadata.

Tool Availability Map

Which registered tools are currently reachable, healthy, and permitted for the active permission profile.

Model Availability Map

Which model seats are loaded, responding, and within latency bounds. CPU vs GPU state tracked.

Confidence Ledger

Aggregate confidence across recent turns. Tracks when the system is certain vs when it is guessing.

Architecture Components

Five components work together to maintain the world model and enforce evidence-grounded verification. Each has one job. None uses an LLM for state management.

State Builder

After each tool result, user input, or system event, the state builder applies deterministic update rules to the world-state. No LLM involved. State transitions are diffable and testable. The world-state is never modified by a model — only by code that processes grounded events.

Planner

Reads the current world-state (task stack, failures, availability) and produces an action plan. Not an LLM reasoning about 'what should I do?' — code reading structured data and applying priority rules. The planner feeds into the existing 5-stage dispatcher.

Evidence Store

Every tool result, file read, and screen capture OCR is stored with full provenance: source, timestamp, and confidence. The evidence store never contains inferences or hunches — only direct observations. This is what the Judge checks claims against.

Belief State

Holds what the system thinks is true but cannot prove: inferred user preferences, predicted next actions, default assumptions. It explicitly acknowledges uncertainty. Separate from the evidence store. Can be wrong.

Judge / Verifier

Before any output is finalized, the Judge cross-references every claim against the evidence store. Claims with tool-output backing are tagged 'observed'. Claims from memory are 'recalled'. Reasoning chains are 'inferred'. Everything else is 'guessed'. Observed claims take priority.

Evidence Store vs Belief State

Evidence Store

Only direct observations
Full provenance (source, timestamp)
Cannot contain inferences
What the Judge checks against
Ground truth for output assembly

Belief State

Inferences, hunches, defaults
Explicitly uncertain
Can be wrong
Never merged with evidence
Useful but never authoritative

Claim Categories

Every claim that reaches the output is tagged with one of four categories. The evidence hierarchy is structural — architecture enforces what prompts cannot.

Observed

Directly from tool output, file read, or OCR result. Highest trust.

Recalled

Retrieved from persistent memory with provenance. Medium-high trust.

Inferred

Derived through reasoning chains. Medium trust. May be wrong.

Guessed

Low confidence. No supporting evidence. Flagged explicitly.

Model Seats

Not one model doing everything. Ten specialist seats across eight distinct models, each with one job, coordinated by deterministic code via sequential relay.

llama3.2:1b/3b

Renderer

Handles the majority of turns: greetings, status, time, simple tool calls. <100ms routed. Only routes to specialists when the task requires domain expertise.

deepcoder:14b

Code Specialist

Code generation, debugging, refactoring, explanation. Runs in the BG1 worker lane with bounded observe-test-reason-retry loops.

rnj-1:8b

Logic Specialist

Logical reasoning, math, and structured problem solving. BG1 lane with sequential relay — loaded on-demand to conserve VRAM.

deepseek-r1:8b

Code Reviewer

Deep code review, architectural analysis, and reasoning chains. BG1 lane. Outputs are verified by the Judge against actual execution results.

qwen3-vl:8b

Vision Specialist

Screen captures, uploaded images, document scans. Processes actual pixel data. Outputs are grounded in what it literally sees.

gemma4:12b

Research Writer

Research synthesis and long-form summarization. BG1 lane for heavy research tasks requiring extended context.

gemma4:4b

Fast Reasoner

Quick reasoning for queries that need domain depth beyond the renderer. Realtime lane, sub-second response.

Controller + Verifier Prompt

Judge / Verifier

Reuses the renderer model (llama3.2:3b) with a specialized verifier prompt. Checks claims against the evidence store via deterministic code in judge.py.

Why this feels closer to AGI

A model does not really “think forever by itself” in the magical sense. In practice, a system that behaves like a cognitive agent is built from scheduled loops, event listeners, memory refresh tasks, background evaluators, and persistent state. Skeptic is exactly this: a well-designed cognitive architecture around multiple model roles.

Esoteric v0.2 stops behaving like a turn-by-turn chatbot. It starts behaving like an agent maintaining a live internal model of reality. Not AGI in the literal sense. But something much closer to a true cognitive system than a normal assistant — and one that stays honest because every claim is checked against grounded evidence.

The question isn't whether this works.

See it in the real world.

View applications →Back to home