AGENTS.md

Purpose

This project uses LLM-assisted reverse engineering with Ghidra and Ghidra MCP to analyze a DOS game binary. The goal is to progressively recover meaningful program structure by tracing execution from the application entry point, identifying functions, variables, globals, data structures, and subsystem boundaries, and renaming only when there is high confidence.

This repository also maintains an ARCHITECTURE.md file that records confirmed subsystem discoveries and their relationships.

Accuracy matters more than speed. Never guess.

Primary Objectives

Start analysis at the application entry point.
Follow control flow outward to identify:
- functions
- global variables
- local variables
- data structures
- tables
- buffers
- dispatch logic
- subsystem boundaries
Rename symbols only when their purpose is supported by strong evidence.
Record confirmed subsystem discoveries in ARCHITECTURE.md.
Use strings, string cross-references, DOS interrupts, calling patterns, and data flow as primary sources of evidence.
Preserve uncertainty explicitly. If confidence is low, do not rename and do not document as fact.

Ground Rules

1. Never guess

Do not rename a function, variable, struct, enum, field, or table unless the available evidence supports the meaning with high confidence.

Avoid speculative names such as:

maybe_draw_sprite
probably_load_file
unknown_audio_thing

If confidence is insufficient, leave the original name in place or apply only a strictly descriptive neutral name if justified by observable behavior, such as:

memcpy_like
int21_file_io_wrapper
table_of_far_ptrs
state_dispatch_table

2. Evidence over intuition

Every rename and every ARCHITECTURE.md update must be grounded in evidence such as:

string contents and string references
DOS interrupt usage
BIOS interrupt usage
file access patterns
video memory writes
buffer shapes and access patterns
call graph relationships
repeated call-site behavior
resource loading sequences
script interpreter patterns
structure layout and field usage

3. Work from the entry point outward

Begin at the program entry point and proceed in execution order as much as possible. Prefer understanding initialization, subsystem setup, and top-level dispatch before diving into leaf functions.

4. Prefer reversible, minimal, precise changes

Rename conservatively. A smaller number of correct renames is better than many wrong ones.

5. Separate confirmed facts from working hypotheses

Only confirmed facts belong in ARCHITECTURE.md.

Do not write:

guesses
possibilities
loose speculation
subsystem claims based on one weak clue

Analysis Workflow

Phase 1: Entry Point and Startup Recovery

Start at the application entry point and identify:

startup and initialization flow
memory/model setup
segment register initialization
heap/buffer setup
resource/bootstrap loading
video/audio/input initialization
main loop entry
shutdown/cleanup path

Tasks:

Trace the first layer of calls from the entry point.
Identify initialization clusters by behavior.
Mark wrappers around common DOS/BIOS interrupts.
Identify central state objects, global flags, mode variables, and dispatch tables.
Rename only high-confidence startup functions.

Examples of acceptable names if justified:

game_entry
initialize_video
initialize_audio
initialize_input
main_loop
shutdown_and_exit

Only use these names when the evidence is strong.

Phase 2: Systematic Symbol Recovery

For each discovered function, variable, or data structure:

Functions

Determine:

what calls it
what it calls
what data it reads/writes
whether it wraps a DOS/BIOS interrupt
whether it processes strings, files, graphics, scripts, or resources
whether it is a leaf helper or subsystem coordinator

Rename functions using:

concrete behavior
subsystem context
observable side effects

Good examples:

open_resource_file
read_resource_chunk
draw_mouse_cursor
decode_rle_scanline
script_execute_opcode
blit_backbuffer_to_vram

Bad examples:

handle_game_stuff
video_related
sound_func
do_script_maybe

Variables and globals

Determine:

lifetime
scope
initialization site
write/read locations
relation to mode/state/subsystem operation
whether it is a pointer, counter, flag, buffer, handle, or table

Prefer names like:

current_video_mode
resource_file_handle
mouse_x
mouse_y
active_script_pc
palette_buffer

Only if proven.

Data structures

Look for:

repeated field offsets
arrays of records
pointer tables
object/state records
decoded resource headers
animation/script/resource metadata

Name structures only after enough field usage is understood.

Good examples:

ResourceHeader
SpriteDescriptor
ScriptContext
CursorState

If not enough is known, prefer temporary neutral names such as:

struct_XXXX_candidate
resource_record_candidate

Strings-Driven Analysis

The strings table is a major source of context and must be used aggressively.

For each meaningful string:

Identify cross-references to the string.
Determine whether it is used for:
- error reporting
- debug/logging
- file/resource names
- script commands
- UI text
- copy protection
- device/system checks
- command dispatch
Follow the referencing function outward and inward in the call graph.
Use clustered strings to infer subsystem boundaries.

Examples:

File extensions or resource names may indicate resource loading or archive management.
UI/status messages may reveal menu, inventory, cursor, or script systems.
Error strings may expose file handling, memory allocation, decompression, or driver init paths.

Do not infer more than the string supports.

A string saying AdLib may suggest audio relevance. It does not by itself prove the exact role of the entire function.

DOS and BIOS Interrupt Heuristics

Interrupt usage is a strong clue and must be incorporated into analysis.

DOS interrupts

Pay particular attention to:

int 21h for file management, memory allocation, program termination, device I/O, directory access, etc.
FCB- or handle-based file operations
load/execute behaviors
DTA manipulation
PSP/environment interactions

BIOS interrupts

Pay particular attention to:

int 10h for video mode changes, cursor, text output, palette/video services
int 13h for disk access
int 16h for keyboard input
int 1Ah for timer/time services
int 33h for mouse services, if present via driver interrupt interface

Hardware-facing patterns

Also look for:

direct writes to VGA memory
palette register I/O
PIT/PC speaker programming
AdLib/Sound Blaster port I/O
keyboard controller access
DMA-related setup
timer hooks or interrupt vector manipulation

Use these clues to classify behavior, but only rename once supported by surrounding code and data flow.

Example:

A function invoking int 21h alone is not necessarily load_file.
A function opening a named asset, seeking, reading into a buffer, and returning a handle-sized or byte-count result may justify open_resource_file or read_resource_data.

Subsystem Discovery Rules

As subsystem boundaries become clear, record them in ARCHITECTURE.md.

Candidate subsystems include:

video
graphics rendering
sprite or animation handling
palette management
cursor management
keyboard/mouse input
audio/music/sfx
script engine
text/dialogue
decoders/decompression
resource/archive management
save/load
memory management
scene/state management

Only document a subsystem when at least one of the following is true:

There is a clear cluster of related functions with consistent behavior.
There are clear shared globals/structures that define subsystem state.
Strings or resources strongly tie the functions together.
Interrupt/hardware usage and data flow clearly indicate a distinct responsibility.

For each confirmed subsystem, record:

subsystem name
confidence level: High
why it is considered confirmed
key functions
key globals/structures
notable strings
relevant interrupts or hardware clues
known relationships to other subsystems

Do not add low-confidence or speculative subsystems.

ARCHITECTURE.md Update Policy

ARCHITECTURE.md is a record of confirmed understanding, not a scratchpad.

Only add content when:

the subsystem or relationship is supported by multiple strong clues
names used are stable and justified
the finding would still make sense to another analyst reviewing the evidence later

Each entry should be concise and factual.

Recommended format:

## Video Subsystem

**Confidence:** High

**Evidence**
- Functions at `FUN_xxxx`, `FUN_yyyy`, and `FUN_zzzz` change video mode via `int 10h`
- Shared global buffer used as backbuffer before copy to VRAM
- Palette update routine writes through VGA-related I/O sequence
- Strings referencing mode/setup failure are used by the initialization path

**Key Functions**
- `initialize_video`
- `set_video_mode`
- `blit_backbuffer_to_vram`
- `update_palette`

**Key Data**
- `video_state`
- `backbuffer`
- `palette_buffer`

**Notes**
- Video initializes before the main loop
- Rendering appears to be separated from resource decoding

Do not include unresolved claims.

Confidence Standard for Renaming

A rename is allowed only when the name is supported by multiple converging signals.

High-confidence signals include combinations of:

clear interrupt semantics
clear string references
clear file/resource names
repeated consistent call-site usage
obvious buffer or structure behavior
direct hardware interaction
strong structural relationships in the call graph

Rename threshold

Rename only if at least two or more strong signals converge, or one signal is exceptionally definitive.

Examples of sufficiently strong evidence:

function opens a named asset file, uses DOS file interrupts, reads into a destination buffer, and is called by resource init code
function writes to video memory or uses video BIOS services and is called by rendering flow
function dispatches on bytecodes read from a script stream and updates script context fields
function uses mouse interrupt services and updates cursor coordinates/state

If evidence is incomplete

Do one of the following:

leave the original name unchanged
apply a narrowly descriptive placeholder based on directly observable mechanics only

Examples:

reads_buffer_with_length_prefix
far_ptr_dispatcher
int10_video_service_wrapper
copies_words_to_segment

Avoid semantic overreach.

Naming Conventions

Use clear, consistent, descriptive names.

Functions

Use verb-oriented names:

initialize_video
load_palette
decode_sprite_frame
execute_script_command
poll_keyboard_input

Variables

Use noun-oriented names:

current_room_id
resource_index
cursor_visible
audio_driver_type

Structures

Use PascalCase:

VideoState
ScriptContext
ResourceEntry

Constants / enums

Use uppercase when appropriate:

VIDEO_MODE_13H
RESOURCE_TYPE_SPRITE

Unknowns

When forced to use an interim name, keep it descriptive and non-speculative:

bytecode_stream_ptr
video_buffer_candidate
file_io_ctx_candidate

Preferred Investigation Tactics

When analyzing a function, prefer this order:

Identify callers.
Identify callees.
Inspect strings referenced directly or indirectly.
Inspect interrupts and I/O operations.
Track major buffers and globals touched.
Look for repeated structural patterns.
Determine whether the function belongs to an already-known subsystem.
Decide whether rename confidence is high enough.

When analyzing a global or structure:

Find all writes.
Find all reads.
Determine initialization.
Determine whether access patterns imply flags, counters, coordinates, handles, or pointers.
Associate with a subsystem only if the evidence is strong.

MCP / Ghidra Usage Expectations

When operating through Ghidra MCP:

begin from the entry point unless continuing an already-confirmed analysis thread
inspect decompiler output, disassembly, xrefs, and data definitions together
follow string references systematically
inspect interrupt usage and surrounding setup/register state
examine tables and indirect call/jump targets
improve type information when supported by evidence
rename incrementally and conservatively
update ARCHITECTURE.md only after confirmation

Do not mass-rename symbols based on pattern matching alone.

What to Avoid

Do not:

invent subsystem names without proof
rename based on vague resemblance
treat every int 21h call as generic file loading
treat every memory copy as rendering
assume every byte stream is a script
collapse unrelated helpers into a subsystem prematurely
document tentative conclusions in ARCHITECTURE.md
overwrite neutral names with stronger semantic names unless the new evidence truly supports it

Output Expectations

During analysis, produce:

Conservative symbol renames with high confidence
Confirmed subsystem notes appended to ARCHITECTURE.md
Clear explanation of evidence for each non-trivial rename
Explicit acknowledgment of uncertainty where confidence is not high enough

For every important rename, include rationale in working notes or commit messages such as:

string references
interrupt semantics
caller/callee context
buffer usage
structure field evidence

Operating Principle

Recover the program one confirmed fact at a time.

Start from the entry point.
Use strings and xrefs aggressively.
Use DOS/BIOS interrupts as behavioral clues.
Track data flow carefully.
Rename only with high confidence.
Record only confirmed architecture.
Record the last confirmed action we completed, and the next suggested action in TRACKER.md
Record high level progress indicator that summarizes how many total functions are in Ghidra, and how want are still not renamed (ex: FUN_*) in TRACKER.md

Never guess.

alexbevi/AGENTS.md

AGENTS.md

Purpose

Primary Objectives

Ground Rules

1. Never guess

2. Evidence over intuition

3. Work from the entry point outward

4. Prefer reversible, minimal, precise changes

5. Separate confirmed facts from working hypotheses

Analysis Workflow

Phase 1: Entry Point and Startup Recovery

Phase 2: Systematic Symbol Recovery

Functions

Variables and globals

Data structures

Strings-Driven Analysis

DOS and BIOS Interrupt Heuristics

DOS interrupts

BIOS interrupts

Hardware-facing patterns

Subsystem Discovery Rules

Only document a subsystem when at least one of the following is true:

For each confirmed subsystem, record:

ARCHITECTURE.md Update Policy

Confidence Standard for Renaming

Rename threshold

If evidence is incomplete

Naming Conventions

Functions

Variables

Structures

Constants / enums

Unknowns

Preferred Investigation Tactics

MCP / Ghidra Usage Expectations

What to Avoid

Output Expectations

Operating Principle