Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save ChristopherA/4a3137bf5952d6947fc0b7ce3123896c to your computer and use it in GitHub Desktop.

Select an option

Save ChristopherA/4a3137bf5952d6947fc0b7ce3123896c to your computer and use it in GitHub Desktop.

An Image-Persistent Substrate for Agentic AI -- Concept Brief

Draft, 2026-05-01.

Christopher Allen <ChristopherA@LifeWithAlacrity.com


The pain everyone in the room knows

Agent platforms today are orchestration layers wrapped around stateless LLM calls. Everything an agent needs to do real work -- accumulated memory, learned tools, identity boundaries, coordination with other agents, atomic recovery from partial failure, durable state across restarts -- is engineered separately, on substrates that were never designed for it. The result is the familiar stack: a vector DB to approximate memory, JSON files to approximate tool definitions, queues to approximate workflow durability, IAM policies to approximate capabilities, bespoke retry logic to approximate atomicity. Each layer is a separate vendor, a separate failure mode, a separate cost line.

The structural mismatch is between the agent's logical model -- I remember what I did, I have tools, I can coordinate with peers, I survive a restart -- and the substrate's actual model: I am a stateless function call against external services with no persistent identity.

Most modern bugs in agent infrastructure are artifacts of an impedance mismatch between in-memory semantics and persistence semantics: serialization boundaries between agent reasoning and stored memory, partial-failure ambiguity in multi-step plans, retry-versus-replay confusion when tool calls error mid-execution, cache-versus-source-of-truth divergence between vector DB and ground truth, schema drift between what the LLM thinks it knows and what the database actually stores.

A different substrate

There is a class of runtime where the application's entire state -- code, objects, in-flight operations, accumulated context -- lives in a single persistent image. State does not live in a database that the runtime synchronizes with; the runtime is the state. Snapshot the runtime, restore it tomorrow, work continues exactly from where it left off. No marshaling, no separate database, no synchronization layer.

It is the lineage of the Smalltalk image, the Lisp image, certain mainframe single-level-store designs -- a category that fell out of fashion when stateless services + relational databases became dominant in the 1990s. One member of this class has quietly run production services for nearly three decades. It powered iChat -- the technology behind the first Yahoo! Chatrooms in the late 1990s. The codebase is open-source, actively maintained by its original author, and current as of 2026.

Eliminating the impedance mismatch changes the failure model in a way that is hard to replicate incrementally on a substrate that does not already carry the property. The bug classes above don't get patched -- they cease to be possible.

The architectural inversion: harness drives runtime

The conventional agent-platform pattern: the harness orchestrates LLM calls, treats LLMs as tools (alongside file read, web search, code execution), and rebuilds state, atomicity, identity, and event coordination from external infrastructure -- a database for state, a queue for atomicity, IAM middleware for identity, distributed locks for coordination.

The inversion this substrate enables: the agent harness uses the runtime as a tool, the same way it uses LLMs. The harness sends operations to the runtime; the runtime returns state and events. State, mutation safety, multi-agent coordination, identity boundaries, and event notification all arrive as a single coherent tool's natural output.

Property LLM-as-tool Runtime-as-tool
State across calls None Persistent shared graph
Mutation safety None Atomic call-tree rollback
Multi-agent coordination External Coherent runtime enforces
Identity boundaries None Capability-separated
Event notifications None Native asynchronous
Context window Fixed tokens Arbitrary state, queryable as needed
Observability Opaque Object introspection, event log

LLMs and the runtime sit at the same architectural level in the harness's tool surface, distinguished by cadence and primitive set rather than by rank. The harness's authority over orchestration is preserved; what the runtime adds is the substrate-primitive set LLMs structurally cannot provide. The current agent stack rebuilds those primitives badly outside the runtime; the inversion is what lets the harness consume them as one tool's coherent output instead.

What this gives an agent

The substrate's eight runtime primitives, in agent vocabulary:

  1. Image-persistent agent state. Every object the agent created, every variable it set, every accumulated context — durable across restarts as a single image. No vector DB to keep in sync. No "agent forgot the conversation" because the runtime forgot.

  2. Atomic agent operations. When the agent's multi-step plan fails halfway through (LLM returns garbage, tool errors, precondition violated), the entire operation rolls back. Not "we wrote three records but the fourth failed" -- the in-memory state reverts.

  3. Capability separation enforced at runtime. Each agent runs as a privileged identity with explicit capabilities. Out-of-capability operations don't return error codes the agent has to reason about; they fail at runtime and the operation's atomic boundary rolls back the partial work.

  4. Coherent multi-agent state. Multiple agents share one runtime instance. When two agents both read "budget = $500" and both try to spend it, the runtime serializes -- exactly one succeeds, the other sees the updated state and decides what to do next. No distributed-lock dance, no eventual consistency, no race conditions across agents.

  5. Hot reasoning update. Fix a bug in agent reasoning code while agents are mid-conversation. In-flight operations finish with the old logic; the next operation uses the new. No restart, no state loss, no warmup.

  6. Sandboxed code load for LLM-authored tools. When the agent (or you) wants to give the agent a new tool, the runtime compiles the tool's code into a capability-bounded namespace at native speed. The capability layer enforces what the new tool can do; the persistence layer ensures it stays loaded across restarts. Adding a skill is not a deployment.

  7. Atomic-with-state events. When state changes that other agents care about, the runtime synchronously notifies subscribers within the same atomic operation that caused the change. Agents react to state without polling, without race conditions, without missing events because of delivery-versus-write ordering bugs.

  8. Direct state introspection. Agents (and humans, and other LLMs) can query the runtime's state graph directly -- what objects exist, what their state is, what events have fired. The data model is the API; there is no synthesized layer the model has to reason about separately from the substrate.

Persistence is the foundational primitive; the others derive their value from it. Atomic operations matter because they atomically affect persistent state. Capability bounds matter because the bounded namespace persists. Hot reload matters because the running agent state survives the reload. Sandboxed tool load matters because compiled tools and the objects they create are persistent artifacts, not ephemeral process state. Without orthogonal persistence, these are seven local-process concerns with no continuity across the agent's lifetime; with it, they are substrate-scope guarantees that survive restart indefinitely.

What becomes possible

Three patterns the agent ecosystem cannot cleanly assemble today:

Agents with durable, code-shaped memory. Most agent memory today is text in a vector database -- fuzzy, lossy, requires retrieval-with-reranking, and divergent from the agent's actual reasoning state. With orthogonal persistence, the agent's memory can be executable artifacts, not just text: a function the agent wrote yesterday is still callable today; an object the agent built remains in the heap; an inference rule the agent learned is part of its runtime, not an external memory store. The agent's accumulated competence is durable in the same image as its current execution.

LLM-authored tools that are first-class system citizens. The agent writes new tool code at runtime; the runtime compiles it into a capability-bounded namespace; the new tool persists as part of the image; subsequent agent operations call the tool directly at native speed. No deployment pipeline per tool. No Docker warmup per call. The capability layer prevents the tool from doing anything the agent isn't allowed to do. The customer experience changes from "the AI ran for thirty seconds in a sandboxed timeout" to "the AI built me a custom tool that's still here next week."

Multi-agent workflows that fail atomically. When five agents collaborate on a customer-service ticket -- one researches the customer's history, another drafts a response, a third checks compliance, a fourth schedules follow-up, a fifth logs the result -- atomic operations apply across the entire collaboration. If any agent's step fails (LLM hallucination, tool error, precondition violated), the runtime reverts the workspace to its pre-operation state. No partially-completed customer-service interaction with three actions taken and two pending in a queue somewhere.

Five-axis containment for LLM-generated code

LLMs make mistakes. The substrate's safety story for LLM-authored code is structural: code that an agent (or an LLM) loads into the runtime sits inside five independent containment mechanisms, each bounding a different attack surface.

  1. Language constraint. The DSL agent code is written in cannot express dangerous operations -- no raw memory access, no escape to the host language, no direct system-call surface.
  2. Location constraint. Agent code lives only in agent-owned objects, never in privileged runtime layers. The kernel and system tiers are inaccessible to user-tier code regardless of what the language permits.
  3. Invocation constraint. Agent code is only entered through the event system, not by direct calls from arbitrary code. An attacker who has not subscribed to events cannot reach the code.
  4. Capability separation. What agent code can reach when it does run is mediated by the privilege layer -- privileged objects, system files, and resource-quota mutations are bounded.
  5. Atomic rollback. Any error in agent code reverts the entire operation. No partially-applied state changes, no manual cleanup, no half-corrupted state the next agent has to reconcile.

To break out, agent code would have to defeat all five mechanisms simultaneously. Stacked containment changes the cost calculus for an attacker: defeating language puts them in a location they cannot run from; defeating location puts them in front of an invocation constraint that prevents direct entry; defeating invocation puts them inside a capability layer that bounds reach; defeating capability still leaves them inside an atomic envelope that rolls back any partial state.

WebAssembly with WASI offers a sandbox plus capability separation -- two of the five. Lua embedded in game engines offers a language sandbox -- one of the five. Erlang offers lightweight processes plus supervision-restart -- a different two of the five. None of the contemporary platforms widely deployed for user-supplied code in multi-tenant runtimes stacks all five.

For LLM-generated code specifically, this is the structural safety the workload requires. The runtime contains the mistakes -- not the developer, not the prompt-injection detector, not the human-in-the-loop reviewer.

The demonstration

A passkey-protected sandboxed Web REPL, framed as agent infrastructure.

An authenticated agent (or a human acting on behalf of an agent) drops into a code editor running against a bounded namespace inside the runtime: the agent's workspace, where it can build tools, materialize objects, accumulate state. The capability layer prevents the agent from touching the runtime kernel, other agents' workspaces, or the host operating system.

Kill the runtime process. Restart it. The agent reconnects.

Its workspace is exactly as it left it. Every tool it built, every object it materialized, every accumulated piece of context, intact -- not because anything was serialized to a database between sessions, but because the runtime is the state and orthogonal persistence keeps the entire image durable across restarts. Five minutes in a browser tab to see what an agent runtime with isolation, persistence, atomic operations, capability boundaries, and hot reload as primitives can do that the LangChain + Postgres + Pinecone + AWS Lambda stack cannot reach without significant per-application engineering.

Where this fits in the AI ecosystem

This is a substrate, not an agent framework. It composes with the tools the audience already uses:

  • LangChain / LlamaIndex / DSPy / similar reasoning frameworks -- the substrate is the runtime your agents run on top of. These frameworks remain useful for prompt-shape and reasoning-pattern abstraction.
  • MCP / A2A / agent-to-agent protocols -- the substrate's HTTP and event primitives implement these protocols natively. The runtime can be both an MCP server (exposing tools to external agents) and an MCP client (calling external tools from inside).
  • Vector databases (Pinecone, Weaviate, Chroma, pgvector) -- still useful for fuzzy retrieval over large unstructured corpora. The substrate handles exact memory of structured agent state. Both can coexist.
  • Temporal / AWS Step Functions / Inngest -- the substrate handles small-to-medium-scale durable workflows natively. Temporal remains the right tool for cross-machine workflow durability at large scale.
  • Sandboxed code execution (E2B, Modal, Replit Agents, Daytona) -- the substrate makes sandboxed code-execution a runtime primitive rather than a per-call infrastructure choice. For one-off cloud-scale code execution, those tools remain appropriate.

The agent harness uses the runtime as a tool, the same way it uses LLMs. Multiple agents share a coherent runtime instance; the atomicity primitive contains failed operations; the capability layer separates agents; the event system tells the harness about state changes. The harness's job becomes orchestration; the runtime's job is everything else.

Where this doesn't fit

Honest constraints worth naming up front:

  • Single coherence domain. The substrate is single-machine. Modern hardware addresses terabytes of RAM and dozens of cores per machine, which makes vertical scale sufficient for most agent workloads. Workloads that genuinely need cross-machine distributed coherence are the wrong fit at the architecture level, not at the implementation level.
  • Stateless agent calls. If your agent is a stateless RAG query -- input prompt, retrieve, answer, forget -- the substrate's primitives are overkill. The architecture earns its keep where state continuity matters across operations.
  • Embarrassingly parallel inference. Cloud-scale parallel LLM calls (batch inference over millions of documents) are the wrong shape for a single coherence domain.
  • Five-nines uptime. Single-machine architecture inherits single-machine failure modes; multi-region active-active is not what this substrate is for.

What this brief is asking

Whether the framing lands for the agentic AI audience. Specifically, three questions worth a five-minute conversation:

  1. Does framing agent state as image-persistent rather than vector DB + episodic logs + tool registry match how you (or your customers) think about agent memory?
  2. Is LLM-authored tools as first-class persistent artifacts a problem you've been working around, or one that hadn't surfaced as a problem yet?
  3. What's the agent-infrastructure gap you wish existed today that orthogonal persistence + atomic operations + capability bounds + hot reload would address?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment