Most AI assistants are stateless turn-by-turn chatbots. They forget what just happened. They don't track what tools are available. They can't tell you what they're currently working on. Skeptic is different. It maintains a persistent, structured world-state model that tracks everything the system knows about reality — updated after every event, read by every decision.
User Profile
Speaker identity, permission level, preferences, interaction history. The system knows who it is talking to.
Current Task Stack
Active tasks ordered by priority. What the system is doing right now and what is queued next.
Open Background Jobs
BG1 worker status, progress percentage, estimated completion. Heavy tasks tracked in real time.
Recent Failures
Last N failures with timestamps, error types, and affected subsystems. Patterns surface automatically.
Device Status
Microphone availability, speaker state, screen capture readiness, active window metadata.
Tool Availability Map
Which registered tools are currently reachable, healthy, and permitted for the active permission profile.
Model Availability Map
Which model seats are loaded, responding, and within latency bounds. CPU vs GPU state tracked.
Confidence Ledger
Aggregate confidence across recent turns. Tracks when the system is certain vs when it is guessing.
Five components work together to maintain the world model and enforce evidence-grounded verification. Each has one job. None uses an LLM for state management.
State Builder
After each tool result, user input, or system event, the state builder applies deterministic update rules to the world-state. No LLM involved. State transitions are diffable and testable. The world-state is never modified by a model — only by code that processes grounded events.
Planner
Reads the current world-state (task stack, failures, availability) and produces an action plan. Not an LLM reasoning about 'what should I do?' — code reading structured data and applying priority rules. The planner feeds into the existing 5-stage dispatcher.
Evidence Store
Every tool result, file read, and screen capture OCR is stored with full provenance: source, timestamp, and confidence. The evidence store never contains inferences or hunches — only direct observations. This is what the Judge checks claims against.
Belief State
Holds what the system thinks is true but cannot prove: inferred user preferences, predicted next actions, default assumptions. It explicitly acknowledges uncertainty. Separate from the evidence store. Can be wrong.
Judge / Verifier
Before any output is finalized, the Judge cross-references every claim against the evidence store. Claims with tool-output backing are tagged 'observed'. Claims from memory are 'recalled'. Reasoning chains are 'inferred'. Everything else is 'guessed'. Observed claims take priority.
Every claim that reaches the output is tagged with one of four categories. The evidence hierarchy is structural — architecture enforces what prompts cannot.
Observed
Directly from tool output, file read, or OCR result. Highest trust.
Recalled
Retrieved from persistent memory with provenance. Medium-high trust.
Inferred
Derived through reasoning chains. Medium trust. May be wrong.
Guessed
Low confidence. No supporting evidence. Flagged explicitly.
Not one model doing everything. Ten specialist seats across eight distinct models, each with one job, coordinated by deterministic code via sequential relay.
llama3.2:1b/3b
Renderer
Handles the majority of turns: greetings, status, time, simple tool calls. <100ms routed. Only routes to specialists when the task requires domain expertise.
deepcoder:14b
Code Specialist
Code generation, debugging, refactoring, explanation. Runs in the BG1 worker lane with bounded observe-test-reason-retry loops.
rnj-1:8b
Logic Specialist
Logical reasoning, math, and structured problem solving. BG1 lane with sequential relay — loaded on-demand to conserve VRAM.
deepseek-r1:8b
Code Reviewer
Deep code review, architectural analysis, and reasoning chains. BG1 lane. Outputs are verified by the Judge against actual execution results.
qwen3-vl:8b
Vision Specialist
Screen captures, uploaded images, document scans. Processes actual pixel data. Outputs are grounded in what it literally sees.
gemma4:12b
Research Writer
Research synthesis and long-form summarization. BG1 lane for heavy research tasks requiring extended context.
gemma4:4b
Fast Reasoner
Quick reasoning for queries that need domain depth beyond the renderer. Realtime lane, sub-second response.
Controller + Verifier Prompt
Judge / Verifier
Reuses the renderer model (llama3.2:3b) with a specialized verifier prompt. Checks claims against the evidence store via deterministic code in judge.py.
Why this feels closer to AGI
A model does not really “think forever by itself” in the magical sense. In practice, a system that behaves like a cognitive agent is built from scheduled loops, event listeners, memory refresh tasks, background evaluators, and persistent state. Skeptic is exactly this: a well-designed cognitive architecture around multiple model roles.
Esoteric v0.2 stops behaving like a turn-by-turn chatbot. It starts behaving like an agent maintaining a live internal model of reality. Not AGI in the literal sense. But something much closer to a true cognitive system than a normal assistant — and one that stays honest because every claim is checked against grounded evidence.