Skip to content

Instantly share code, notes, and snippets.

@oneryalcin
Created April 29, 2026 10:23
Show Gist options
  • Select an option

  • Save oneryalcin/ee2c27e2d8aa040da8fbe7eebcc2ecea to your computer and use it in GitHub Desktop.

Select an option

Save oneryalcin/ee2c27e2d8aa040da8fbe7eebcc2ecea to your computer and use it in GitHub Desktop.
Building on codex app-server: a developer's guide to OpenAI Codex's JSON-RPC interface (transports, methods, hooks, subagents, skills, MCP, Python SDKs, reference architecture, recipes)

Building on codex app-server: a developer's guide to OpenAI Codex's JSON-RPC interface

Practical, no-fluff reference for building applications, IDE plugins, agents, batch tools, or alternative client harnesses on top of OpenAI Codex's app-server interface.

This guide distills:

  • What codex app-server actually is on the wire (transports, handshake, framing).
  • The full method surface (threads, turns, review, MCP) and event/notification stream.
  • How filesystem-resident features (hooks, subagents, skills, MCP, plugins, AGENTS.md) interact with the protocol.
  • Where state lives on disk (Codex core state vs. companion state).
  • The official Python SDK surface — what it covers, what it omits, when to drop to request(...).
  • Third-party clients across Python, Elixir.
  • A reference architecture: how the official Claude Code "Codex companion" plugin (codex-companion.mjs) drives the protocol, including the broker session-reuse trick.
  • A minimum-viable Python client to copy-paste.
  • Practical recipes (CI reviewers, multi-agent fan-out, approval UIs).

All sources linked inline and again at the end.


1. What is codex app-server?

codex app-server is a stateful, long-lived process that hosts Codex agent threads and exposes them over JSON-RPC 2.0. It's the same backend that powers OpenAI's Codex VS Code extension and JetBrains plugin, and that custom integrations (the openai/codex Python SDK, the JS Claude Code companion, third-party clients) talk to.

Think of it as Codex's IPC interface: you spawn one process per workspace (or reuse a long-lived one), then drive it with structured RPC instead of scraping codex exec output.

Use it when you need any of:

  • Streaming reasoning / message / command-output deltas.
  • Persistent threads (thread/resume, thread/fork).
  • Bidirectional approvals (server asks your client to allow/deny a tool action).
  • Structured output (JSON schema-conforming results).
  • Multiple turns against the same context.
  • Custom UI/UX (IDE plugin, web dashboard, Slack bot, CI reviewer).

If you just want "run prompt → exit", use codex exec non-interactive mode instead — app-server is overkill for one-shots.

References:


2. Wire protocol

2.1 Transports

Transport How to start Status Framing
stdio (default) codex app-server Stable Newline-delimited JSON (one message per line) over stdin/stdout
WebSocket codex app-server --listen ws://127.0.0.1:4500 Experimental, unsupported One JSON message per WS frame; bounded queues; busy server returns RPC error code -32001

Stderr carries human-readable diagnostics; you can mirror it to a log.

2.2 JSON-RPC framing

Standard JSON-RPC 2.0 with the "jsonrpc":"2.0" member omitted on the wire. Three message kinds:

// Client → Server request
{"id": 1, "method": "thread/start", "params": {...}}

// Server → Client response (success or error)
{"id": 1, "result": {...}}
{"id": 1, "error": {"code": -32601, "message": "Method not found"}}

// Notification (either direction, no id)
{"method": "item/agentMessage/delta", "params": {...}}

// Server → Client request (bidirectional, e.g. for approvals)
{"id": 42, "method": "execCommandApproval", "params": {...}}

One message per \n-terminated line on stdio. JSON-RPC error code -32001 means the broker/server is busy — back off and retry.

2.3 Handshake

Mandatory first turn:

// Client →
{"id": 1, "method": "initialize", "params": {
  "clientInfo": {"name": "your-app", "title": "Your App", "version": "1.0.0"},
  "capabilities": {
    "experimentalApi": false,
    "optOutNotificationMethods": [
      "item/agentMessage/delta",
      "item/reasoning/summaryTextDelta",
      "item/reasoning/summaryPartAdded",
      "item/reasoning/textDelta"
    ]
  }
}}

// Server → response with server capabilities + version
// Then client →
{"method": "initialized", "params": {}}

Until the handshake completes the server rejects every other request. optOutNotificationMethods lets you suppress noisy delta channels — useful for batch jobs that only want final results.

experimentalApi: true enables the gated surfaces (dynamic tools, ChatGPT token management, future-flagged features). Don't flip it unless you need them; experimental APIs may break between Codex versions.

2.4 Schema generation

The exact parameter and result shapes for the version of Codex you have installed are reproducible:

codex app-server generate-ts            # TypeScript types
codex app-server generate-json-schema   # JSON Schema bundle

Use these to codegen typed clients. Schemas drift between CLI versions — regenerate when you upgrade codex.

References:


3. Method surface

3.1 Threads — conversations

Method Purpose
thread/start Begin a new agent session. Configure model, reasoning effort, sandbox policy, output schema, dynamic tools, base instructions, personality, working dir, MCP overrides. Returns a threadId.
thread/resume Reopen a stored thread by id. Full history is restored server-side from the rollout JSONL (see §6). Optionally override params for the resumed session.
thread/fork Branch an existing thread to explore a different path without disturbing the original.
thread/list Paginated history; filter by model provider, source kind, archive state, working dir, free-text search.
thread/rollback Drop the last N turns from in-memory context. A rollback marker is appended to the thread's persisted JSONL log.
thread/name/set Rename a thread.
thread/archive Archive (or unarchive) a thread; filtered out of the default thread/list view.
thread/backgroundTerminals/* Manage long-lived shell sessions inside the sandbox. Experimental.
thread/compact Compact history (used by the official Python SDK's thread.compact()).

3.2 Turns — driving the agent

Method Purpose
turn/start Submit user input (text, images, file paths) and run the agent. Per-turn overrides: model, reasoning effort, personality, sandbox policy, output_schema (structured output), dynamicTools (experimental), skill inputs. Returns a turnId.
turn/steer Append additional input to an in-flight turn without cancelling.
turn/interrupt Abort an in-flight turn cleanly. The companion's /codex:cancel is built on this plus tree-killing the broker process.

Sandbox policy values seen in clients: read-only, workspace-write, danger-full-access. Reasoning effort: none, minimal, low, medium, high, xhigh (verified from codex-companion.mjs's VALID_REASONING_EFFORTS).

3.3 Review

Method Purpose
review/start Invoke Codex's built-in code reviewer. Targets: uncommitted working tree ({type:"uncommittedChanges"}), base branch diff ({type:"baseBranch", branch:"main"}), specific commit ranges, or custom targets. Optionally fork into a detached review thread.

This is what /codex:review in the JS companion calls; the more flexible /codex:adversarial-review instead runs a turn/start against a structured-output schema (schemas/review-output.schema.json in the plugin) so you get parseable JSON.

3.4 MCP tools

Method Purpose
mcpServer/tool/call Invoke a tool exposed by an MCP server configured for the thread.

3.5 Auth, config, account

Method Purpose
account/read Current ChatGPT/API auth state, refresh token controls.
config/read Effective resolved configuration (model defaults, sandbox defaults, MCP servers, paths). Useful for diagnostics.

3.6 Server → Client requests (bidirectional)

The server can send requests to your client. Most importantly, approvals:

  • execCommandApproval — server wants to run a shell command, asks for permission.
  • applyPatchApproval — server wants to write/modify a file.
  • (and others under the same approval umbrella)

Your client responds with accept / decline / cancel. This is how Codex enforces sandboxing with a human-in-the-loop. If you set ApprovalsReviewer to auto_review or guardian, Codex resolves these automatically; set it to user and you'll receive every approval request and must answer.

This is also why the protocol is bidirectional: every client implementation must handle inbound id+method messages, not just inbound responses.

References:


4. Notifications / streaming events

The server streams notifications during turn execution. Subscribe by not opting them out at initialize. Common methods:

Notification Meaning
turn/started New turn begins.
turn/completed Turn finished (success or failure).
item/started A new content item (message, reasoning, tool call) starts.
item/completed A content item finished.
item/agentMessage/delta Streaming agent message text.
item/reasoning/textDelta Streaming reasoning text.
item/reasoning/summaryTextDelta Streaming reasoning summary.
item/reasoning/summaryPartAdded New reasoning summary part.
(command output deltas, tool call events, etc.) Various per-tool streams.

Opt out of high-volume deltas (the four above) for batch / CI jobs. Keep the lifecycle events (turn/*, item/started, item/completed) — those are how you know when the turn is done and what was produced.


5. Filesystem-resident features (hooks, subagents, skills, MCP, plugins)

These all live as files Codex reads rather than RPC methods you call. They apply to every client of the app-server (TUI, codex exec, your custom Python script) automatically.

5.1 Hooks

Six lifecycle events, configured in config.toml (inline [hooks] table) or sibling hooks.json files at any config layer.

Event Tool matcher Can block?
SessionStart source (startup/resume/clear) Yes
PreToolUse tool_name (Bash, apply_patch, MCP) Yes
PermissionRequest tool_name Yes (allow / deny / abstain)
PostToolUse tool_name No
UserPromptSubmit Yes
Stop No

JSON payload to your hook script: session_id, transcript_path, cwd, hook_event_name, model (+ turn_id for turn-scoped events). Hook output keys: continue, stopReason, systemMessage, suppressOutput. Default timeout: 600s. Multiple matching hooks run concurrently.

Older simpler mechanism: notify = ["/bin/bash", "/path/to/notify.sh"] in config.toml — fires on agent-turn-complete only.

References:

5.2 Subagents

TOML files at ~/.codex/agents/*.toml (personal) or .codex/agents/*.toml (project).

name = "tester"
description = "Runs and explains the test suite for any change"
developer_instructions = "..."
model = "gpt-5.4-codex"
model_reasoning_effort = "high"
sandbox_mode = "workspace-write"
# mcp_servers = [...] / skills.config = {...}

Built-in templates (default, worker, explorer) can be overridden by name. Codex spawns subagents only on explicit user request (e.g. "spawn one agent per point") or via /agent slash command in the TUI. Each subagent runs as its own thread; approvals from inactive agent threads bubble up to the active UI labelled by source thread.

Not exposed via dedicated app-server RPC. To replicate the UX from a custom client, start additional thread/start calls yourself with the right config layer pointed at the agent file.

References:

5.3 Agent Skills

Open standard for packaging reusable agent workflows. Discovery scopes (lowest → highest priority):

  1. System (bundled with Codex by OpenAI).
  2. Admin: /etc/codex/skills.
  3. User: $HOME/.agents/skills.
  4. Repo: .agents/skills in CWD or repo root.

Directory layout:

my-skill/
├── SKILL.md           # required, YAML frontmatter (name, description) + instructions
├── scripts/           # optional executable helpers
├── references/        # optional docs the skill can read
├── assets/            # optional binary/static
└── agents/openai.yaml # optional Codex-specific UI/invocation/tool-deps

Progressive disclosure: Codex injects only metadata (name, description, path) at session start, capped at ~8KB total. Full SKILL.md is read only when the skill is selected. Invoked explicitly via /skills or $skill-name mention, or implicitly when prompts match the skill description.

Over the wire: per-turn skill activation is exposed as skill_inputs on turn/start. Skill discovery is filesystem-only.

References:

5.4 MCP servers

Configured globally in config.toml:

[mcp_servers.linear]
command = "npx"
args = ["@modelcontextprotocol/server-linear"]
env = { LINEAR_API_KEY = "${env:LINEAR_API_KEY}" }
startup_timeout_sec = 30
tool_timeout_sec = 120
enabled = true
required = false
enabled_tools = ["list_issues", "get_issue"]
# disabled_tools = [...]

[mcp_servers.streamable_example]
url = "https://example.com/mcp"
bearer_token_env_var = "EXAMPLE_TOKEN"
# OAuth: codex mcp login <server-name>

Two server kinds: STDIO (command, args, env, cwd) and Streamable HTTP (url, bearer_token_env_var or OAuth via codex mcp login).

App-server methods that touch MCP:

  • mcpServer/tool/call — invoke a tool against the thread's configured MCP server.
  • dynamicTools parameter on thread/start — runtime tool registration (gated behind capabilities.experimentalApi=true). Persisted in the rollout metadata; restored on thread/resume if not re-supplied.

References:

5.5 Plugins

Distribution wrapper that bundles: skills + MCP server configs + app mappings + presentation assets. Skills are the authoring format; plugins are the packaging. The Claude Code "Codex companion" you're integrating with is itself a plugin (lives under ~/.claude/plugins/cache/openai-codex/codex/<version>/).

5.6 AGENTS.md

Codex reads AGENTS.md (or AGENTS.override.md) before doing any work, layering global → project guidance. Same role as CLAUDE.md for Claude Code, or a README-for-agents.

References:


6. Where state lives on disk

Two completely separate state stores: Codex core state (everything the agent knows) and companion / harness state (the JS plugin's per-job tracking, optional).

6.1 Codex core state — ~/.codex/

~/.codex/
  auth.json                     # ChatGPT/API credentials
  config.toml                   # user config (model, sandbox, MCP servers, hooks, notify)
  history.jsonl                 # global prompt-history log (every user message ever sent)
  session_index.jsonl           # index over the rollouts below
  sessions/                     # one JSONL "rollout" per thread — the durable thread state
    YYYY/MM/DD/
      rollout-<ISO>-<thread-id>.jsonl
  archived_sessions/            # archived threads, same format
  state_5.sqlite                # newer indexes/metadata (sqlite alongside the JSONL)
  logs_2.sqlite                 # internal telemetry/logs
  memories/, prompts/, rules/, skills/, plugins/, …
  cache/, models_cache.json, generated_images/, shell_snapshots/
  .codex-global-state.json      # global runtime state

Rollout files are newline-delimited JSON, one event per line. First line is session_meta (id, cwd, model, base_instructions, originator, cli_version), then a stream of turn/item events — same shape you'd see over the wire. Example first line:

{"timestamp":"2026-04-24T12:35:48.797Z","type":"session_meta",
 "payload":{"id":"019dbf7d-...","cwd":"/path/to/repo",
            "originator":"codex-tui","cli_version":"0.124.0",
            "model_provider":"openai", ...}}

So ~/.codex/sessions/.../rollout-*.jsonl is the durable, replayable representation of a thread. thread/resume reads from there. thread/list paginates session_index.jsonl. To grep all your past sessions: rg <term> ~/.codex/sessions/.

6.2 Companion / harness state (the JS plugin)

The Claude Code Codex companion uses its own per-workspace state:

$CLAUDE_PLUGIN_DATA/state/<slug>-<sha256(realpath)[:16]>/
  state.json        # { version, config:{stopReviewGate}, jobs:[…] } — index, capped at 50
  broker.json       # { endpoint, pidFile, logFile, sessionDir, pid } for the UDS broker
  jobs/
    <job-id>.json   # full per-job record (request, payload, threadId, turnId, status…)
    <job-id>.log    # plain-text progress log (NOT JSONL)

Workspace-keyed by <basename>-<sha256(realpath)[:16]>. 50-job cap; older job files garbage-collected on saveState. Locate it with:

echo "${CLAUDE_PLUGIN_DATA:-$TMPDIR/codex-companion}/state"

Your own client can ignore all of this; only the JS companion uses it. The rich agent transcripts you actually want are always in ~/.codex/sessions/.


7. The official Python SDK — what it covers

Package: openai-codex-app-server-sdk (in openai/codex repo at sdk/python).

7.1 Install & minimal example

pip install openai-codex-app-server-sdk
# bundles `openai-codex-cli-bin` matching the SDK version
from codex_app_server import Codex

with Codex() as codex:
    thread = codex.thread_start(model="gpt-5")
    result = thread.run("Say hello in one sentence.")
    print(result.final_response)
    print(result.items)

Configure binary location explicitly when running against a non-bundled CLI:

from codex_app_server import Codex, AppServerConfig
with Codex(config=AppServerConfig(codex_bin="/usr/local/bin/codex")) as codex:
    ...

7.2 Surface area

High-level API exposed:

  • Codex() / AsyncCodex() — process lifecycle, initialize handshake, context-managed shutdown.
  • Thread: start, resume, fork, run, compact.
  • TurnHandle: streaming events, steer, interrupt.
  • Per-turn config: model, effort (low/medium/high), output_schema (structured output), image inputs (ImageInput, LocalImageInput), SandboxPolicy, ApprovalsReviewer (user / auto_review / guardian), skill_inputs, custom personality, developer_instructions.
  • Typed Pydantic notification models with snake_case ↔ camelCase translation.
  • Retry helper: codex_app_server.retry.retry_on_overload for -32001 busy responses.
  • Low-level request(...) escape hatch.

Not (yet) exposed in the high-level wrappers:

  • WebSocket transport — stdio only.
  • thread/list, thread/rollback, thread/name/set, thread/archive.
  • mcpServer/tool/call.
  • dynamicTools runtime tool registration.
  • thread/backgroundTerminals/*.
  • account/read, config/read introspection.
  • review/start as first-class — run reviews as turns instead.
  • Per-action approval callbacks (you can configure ApprovalsReviewer mode but not intercept individual execCommandApproval / applyPatchApproval requests with custom logic).

For everything missing, drop to the request(...) low-level method — the connection and Pydantic models are still useful.

Disclaimers from the README:

  • "Experimental Python SDK for codex app-server JSON-RPC v2 over stdio."
  • result.final_response is None if a turn ends without a final-answer message.
  • Schema is bundled per CLI version; mismatched binary will surface as Pydantic validation errors.
  • Repo no longer ships codex binaries inside sdk/python — set codex_bin or rely on the pinned openai-codex-cli-bin wheel.

References:


8. Third-party clients

8.1 codex-app-server-sdk (Python, third-party)

Async-only Python client by Mariusz Woloszyn. Supports both stdio and WebSocket. Requires Python ≥ 3.12.

uv add codex-app-server-sdk
# or
pip install codex-app-server-sdk
import asyncio
from codex_app_server_sdk import CodexClient

async def main() -> None:
    async with CodexClient.connect_stdio() as client:
        result = await client.chat_once("Hello from Python")
        print(result.final_text)

asyncio.run(main())

Helpers: chat_once(...) (one-shot), chat(...) (step-streaming), thread/turn lifecycle handling, plus low-level request(...).

References:

8.2 codex-sdk-python (Python, third-party)

Supports both the codex exec JSONL path and the persistent app-server JSON-RPC path; exposes typed structured results for each.

References:

8.3 Elixir

  • codex_sdk by nshkrdotcom — full Elixir SDK. (GitHub)
  • ExMCP.ACP.Adapters.Codex — exposes Codex app-server as an ACP adapter inside the ex_mcp ecosystem.

8.4 JavaScript / Node — the reference companion

The Claude Code Codex plugin's codex-companion.mjs is a complete, real-world JSON-RPC client written in plain Node (no SDK dependency). Worth reading as a worked example. See §10 for a tour.


9. Building your own client

9.1 Minimum viable Python client (no dependencies)

This shows the wire format. Use the official SDK or codex-app-server-sdk for production — they handle retries, typed models, async streaming, and broker reuse.

import json
import subprocess
import threading
import itertools
import queue


class CodexClient:
    def __init__(self, cwd="."):
        self.proc = subprocess.Popen(
            ["codex", "app-server"],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1,
            cwd=cwd,
        )
        self._ids = itertools.count(1)
        self._pending = {}
        self._notifications = queue.Queue()
        self._reader = threading.Thread(target=self._read_loop, daemon=True)
        self._reader.start()

    def _read_loop(self):
        for line in self.proc.stdout:
            line = line.strip()
            if not line:
                continue
            msg = json.loads(line)
            if "id" in msg and "method" not in msg:
                # Response to one of our requests
                fut = self._pending.pop(msg["id"], None)
                if fut is not None:
                    fut.put(msg)
            elif "id" in msg and "method" in msg:
                # Server → Client request (e.g. approval)
                # Reply with method-not-found until you wire approvals
                self._send({
                    "id": msg["id"],
                    "error": {"code": -32601,
                              "message": f"Unsupported server request: {msg['method']}"},
                })
            elif "method" in msg:
                # Notification (deltas, lifecycle events)
                self._notifications.put(msg)

    def _send(self, message):
        line = json.dumps(message) + "\n"
        self.proc.stdin.write(line)
        self.proc.stdin.flush()

    def request(self, method, params=None, timeout=120):
        i = next(self._ids)
        fut = queue.Queue(maxsize=1)
        self._pending[i] = fut
        self._send({"id": i, "method": method, "params": params or {}})
        msg = fut.get(timeout=timeout)
        if "error" in msg:
            raise RuntimeError(f"{method} failed: {msg['error']}")
        return msg.get("result", {})

    def notify(self, method, params=None):
        self._send({"method": method, "params": params or {}})

    def initialize(self, name="my-client", version="0.0.1"):
        self.request("initialize", {
            "clientInfo": {"name": name, "title": name, "version": version},
            "capabilities": {
                "experimentalApi": False,
                "optOutNotificationMethods": [
                    "item/agentMessage/delta",
                    "item/reasoning/summaryTextDelta",
                    "item/reasoning/summaryPartAdded",
                    "item/reasoning/textDelta",
                ],
            },
        })
        self.notify("initialized", {})

    def drain_notifications(self):
        while not self._notifications.empty():
            yield self._notifications.get_nowait()

    def close(self):
        try:
            self.proc.stdin.close()
        finally:
            self.proc.terminate()
            self.proc.wait(timeout=5)


if __name__ == "__main__":
    client = CodexClient()
    try:
        client.initialize()
        thread = client.request("thread/start", {"cwd": ".", "model": "gpt-5"})
        thread_id = thread["threadId"]
        result = client.request("turn/start", {
            "threadId": thread_id,
            "input": [{"type": "text", "text": "Say hello in one sentence."}],
            "sandbox": "read-only",
        })
        print("Final:", result.get("finalMessage"))
        for note in client.drain_notifications():
            print("Notif:", note["method"])
    finally:
        client.close()

Real param/result shapes are versioned — generate them with codex app-server generate-json-schema for accuracy. Treat the snippet above as wire-format demonstration.

9.2 Async / streaming

For real streaming (token-by-token reasoning, multiple parallel threads, or WebSocket), use asyncio + asyncio.subprocess, or jump straight to codex-app-server-sdk (already async).

9.3 Handling server → client approval requests

Don't reject everything with -32601. Implement at minimum:

def _handle_server_request(self, msg):
    method = msg["method"]
    if method in {"execCommandApproval", "applyPatchApproval"}:
        # Show prompt to user / consult policy / log to dashboard
        decision = "accept"  # or "decline" / "cancel"
        self._send({"id": msg["id"], "result": {"decision": decision}})
    else:
        self._send({
            "id": msg["id"],
            "error": {"code": -32601, "message": f"Unsupported: {method}"},
        })

Or set ApprovalsReviewer = "auto_review" / "guardian" on thread/start to let Codex auto-resolve them based on its built-in heuristics, and only intercept when you want custom UX.


10. Reference architecture: the JS Codex companion

The Claude Code Codex plugin's scripts/codex-companion.mjs (under ~/.claude/plugins/cache/openai-codex/codex/<version>/) is a battle-tested example client. Worth understanding when designing your own.

10.1 Process model — direct vs broker

Two transports inside one client (scripts/lib/app-server.mjs):

  • Direct: spawn codex app-server per command, communicate over stdio JSONL. Simple, but pays full startup cost every invocation.
  • Broker: a long-lived daemon (app-server-broker.mjs) is spawned once per workspace. It hosts a persistent codex app-server and listens on a Unix domain socket. Subsequent companion invocations connect to the socket via net.createConnection({ path }) instead of spawning a new server. Endpoint, pid, log file, and session dir are tracked in <companion-state>/broker.json.

Why this matters: starting codex app-server involves loading the Codex binary, reading config, attaching MCP servers, and warming caches. The broker amortizes that cost across all subsequent foreground/background companion calls.

10.2 Methods exercised

Grep result from scripts/lib/codex.mjs:

client.request("thread/start", ...)
client.request("thread/resume", ...)
client.request("thread/name/set", ...)
client.request("thread/list", ...)
client.request("turn/start", ...)
client.request("turn/interrupt", ...)
client.request("review/start", ...)
client.request("account/read", ...)
client.request("config/read", ...)

So the companion is a comprehensive consumer of the protocol — broader than the official Python SDK's high-level surface — but doesn't touch mcpServer/tool/call, dynamicTools, thread/fork, thread/rollback, or thread/archive.

10.3 Job model

/codex:task in foreground vs background:

  • Foreground: opens client → runs turn synchronously → renders result → exits. State persisted to jobs/<job-id>.json and .log for /codex:status retrospection.
  • Background: parent spawns a detached task-worker subprocess (same script, different subcommand), records queued state, returns immediately. Worker reads job record, runs the same turn, updates state. /codex:cancel then sends turn/interrupt to the broker and tree-kills the worker pid.

This pattern is straightforward to port: detached subprocess + shared JSON state file + turn/interrupt for cancellation = a complete background-task system over the protocol.

10.4 Stop-gate review

Optional config stopReviewGate: true makes Codex review your turn before letting Claude Code "stop". Implemented as a turn/start against the adversarial-review prompt + structured-output schema. A practical demonstration of how to layer custom workflows on top of the protocol with zero protocol changes.


11. Practical recipes

11.1 Replace the JS companion in Python

Mapping:

Companion behavior Python implementation
Hooks, MCP, skills, AGENTS.md, subagents, notify Files in ~/.codex/ / .codex/ / .agents/ — nothing to do, applies automatically.
/codex:task foreground Codex.thread_start(...)thread.run("...")
/codex:task --background Detached multiprocessing.Process running thread.run(...), write status JSON to your own state dir.
/codex:task --resume-last Track latest thread_id in your state file, then Codex.thread_resume(thread_id).
/codex:cancel Drop to client.request("turn/interrupt", {threadId, turnId}).
/codex:review Use the official SDK's output_schema parameter against an adversarial-review prompt, or drop to client.request("review/start", {...}).
/codex:status (across sessions) client.request("thread/list", {...}) — drop to low-level.
Broker daemon (session reuse across CLI invocations) Either (a) keep one Python process alive (FastAPI / asyncio app), (b) skip and pay the cold-start cost, or (c) build your own UDS daemon mirroring app-server-broker.mjs.

11.2 CI reviewer

from codex_app_server import Codex
import json, sys

OUTPUT_SCHEMA = json.load(open("review-output.schema.json"))

with Codex() as codex:
    thread = codex.thread_start(
        model="gpt-5",
        sandbox_policy={"mode": "read-only"},
        approvals_reviewer="auto_review",
    )
    result = thread.run(
        "Review the staged diff. Flag bugs, security issues, missing tests. "
        "Respond against the provided schema.",
        output_schema=OUTPUT_SCHEMA,
    )
    parsed = json.loads(result.final_response)
    if parsed["severity"] in {"high", "critical"}:
        sys.exit(1)

Wire this to your CI's PR diff + post structured output as a review comment.

11.3 Multi-agent fan-out

Fork the parent thread into N children, run them in parallel, compare outputs:

import asyncio
from codex_app_server import AsyncCodex

async def variant(codex, parent_id, prompt, model, effort):
    child = await codex.thread_fork(parent_id)
    return await child.run(prompt, model=model, effort=effort)

async def main():
    async with AsyncCodex() as codex:
        parent = await codex.thread_start(...)
        results = await asyncio.gather(
            variant(codex, parent.thread_id, "Solve this", "gpt-5",        "low"),
            variant(codex, parent.thread_id, "Solve this", "gpt-5",        "high"),
            variant(codex, parent.thread_id, "Solve this", "gpt-5.4-codex","medium"),
        )
        # rank by self-grading or external rubric

This is how you'd build a "best-of-N" agent or replicate the subagent UX without invoking the TUI's /agent machinery.

11.4 Custom approval UI (Slack bot, web dashboard)

Set approvals_reviewer="user" so every action requires explicit approval. Intercept the bidirectional execCommandApproval / applyPatchApproval server-to-client requests, route them to your UI, return the user's decision. The official Python SDK doesn't surface this directly — drop to the third-party codex-app-server-sdk or write your own client (§9).

11.5 Long-running research agents

thread/resume works across days/weeks. Persist thread_id to your DB; pick up where you left off:

with Codex() as codex:
    thread = codex.thread_resume(stored_thread_id)
    result = thread.run("Pick up where we left off. Next step: …")

The full rollout JSONL is on disk under ~/.codex/sessions/, so you can also grep, replay, or export it.

11.6 Embedding skills + MCP for an internal product

  1. Bundle skills under .agents/skills/<my-skill>/SKILL.md in your repo.
  2. Wire MCP servers in .codex/config.toml ([mcp_servers.<name>]).
  3. Optionally bundle as a Codex plugin for distribution.
  4. Drive turns from your Python service — skills + MCP tools are picked up automatically.

12. Caveats & operational notes

  • Schema versioning: app-server schemas drift with CLI versions. Pin openai-codex-cli-bin (or whatever your distro mechanism is) and regenerate types via generate-ts / generate-json-schema on upgrade.
  • Experimental surfaces: WebSocket transport, dynamic tools, ChatGPT token management, background terminals — gated behind experimentalApi or explicitly marked unstable. Do not depend on these for production unless you accept regular breaking changes.
  • Busy responses: RPC error code -32001 means the server (or broker) is overloaded; back off and retry. The official Python SDK ships codex_app_server.retry.retry_on_overload for exactly this.
  • Bidirectional protocol: every client must handle inbound id+method messages. Replying -32601 to everything will work for trivial cases but fails the moment Codex needs an approval.
  • Stateful, not REST: one process per session (or per workspace if you broker). Don't try to serve multi-tenant workloads with a single shared app-server instance — that's not its model.
  • final_response may be null: a turn that ends without an agent message (tool-call-only, errored, etc.) returns final_response = None. Inspect result.items for the actual content.
  • Approvals reviewer modes: user (everything asks), auto_review (Codex decides based on built-in heuristics), guardian (third option, stricter auto). Pick deliberately; defaults are conservative for a reason.
  • Sandbox + workspace: sandbox policy applies per turn, but the underlying file changes are real. read-only for review/triage, workspace-write for edits, danger-full-access only when you've thought hard about it.

13. Sources & further reading

Official OpenAI documentation

Source code

Background / design

Third-party clients

Ecosystem / how-to


Distilled from a working session reading codex-companion.mjs + searching the public docs and ecosystem. Verify shapes against your installed CLI version with codex app-server generate-json-schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment