This project uses LLM-assisted reverse engineering with Ghidra and Ghidra MCP to analyze a DOS game binary. The goal is to progressively recover meaningful program structure by tracing execution from the application entry point, identifying functions, variables, globals, data structures, and subsystem boundaries, and renaming only when there is high confidence.
This repository also maintains an ARCHITECTURE.md file that records confirmed subsystem discoveries and their relationships.
Accuracy matters more than speed. Never guess.
- Start analysis at the application entry point.
- Follow control flow outward to identify:
- functions
- global variables
- local variables
- data structures
- tables
- buffers
- dispatch logic
- subsystem boundaries
- Rename symbols only when their purpose is supported by strong evidence.
- Record confirmed subsystem discoveries in
ARCHITECTURE.md. - Use strings, string cross-references, DOS interrupts, calling patterns, and data flow as primary sources of evidence.
- Preserve uncertainty explicitly. If confidence is low, do not rename and do not document as fact.
Do not rename a function, variable, struct, enum, field, or table unless the available evidence supports the meaning with high confidence.
Avoid speculative names such as:
maybe_draw_spriteprobably_load_fileunknown_audio_thing
If confidence is insufficient, leave the original name in place or apply only a strictly descriptive neutral name if justified by observable behavior, such as:
memcpy_likeint21_file_io_wrappertable_of_far_ptrsstate_dispatch_table
Every rename and every ARCHITECTURE.md update must be grounded in evidence such as:
- string contents and string references
- DOS interrupt usage
- BIOS interrupt usage
- file access patterns
- video memory writes
- buffer shapes and access patterns
- call graph relationships
- repeated call-site behavior
- resource loading sequences
- script interpreter patterns
- structure layout and field usage
Begin at the program entry point and proceed in execution order as much as possible. Prefer understanding initialization, subsystem setup, and top-level dispatch before diving into leaf functions.
Rename conservatively. A smaller number of correct renames is better than many wrong ones.
Only confirmed facts belong in ARCHITECTURE.md.
Do not write:
- guesses
- possibilities
- loose speculation
- subsystem claims based on one weak clue
Start at the application entry point and identify:
- startup and initialization flow
- memory/model setup
- segment register initialization
- heap/buffer setup
- resource/bootstrap loading
- video/audio/input initialization
- main loop entry
- shutdown/cleanup path
Tasks:
- Trace the first layer of calls from the entry point.
- Identify initialization clusters by behavior.
- Mark wrappers around common DOS/BIOS interrupts.
- Identify central state objects, global flags, mode variables, and dispatch tables.
- Rename only high-confidence startup functions.
Examples of acceptable names if justified:
game_entryinitialize_videoinitialize_audioinitialize_inputmain_loopshutdown_and_exit
Only use these names when the evidence is strong.
For each discovered function, variable, or data structure:
Determine:
- what calls it
- what it calls
- what data it reads/writes
- whether it wraps a DOS/BIOS interrupt
- whether it processes strings, files, graphics, scripts, or resources
- whether it is a leaf helper or subsystem coordinator
Rename functions using:
- concrete behavior
- subsystem context
- observable side effects
Good examples:
open_resource_fileread_resource_chunkdraw_mouse_cursordecode_rle_scanlinescript_execute_opcodeblit_backbuffer_to_vram
Bad examples:
handle_game_stuffvideo_relatedsound_funcdo_script_maybe
Determine:
- lifetime
- scope
- initialization site
- write/read locations
- relation to mode/state/subsystem operation
- whether it is a pointer, counter, flag, buffer, handle, or table
Prefer names like:
current_video_moderesource_file_handlemouse_xmouse_yactive_script_pcpalette_buffer
Only if proven.
Look for:
- repeated field offsets
- arrays of records
- pointer tables
- object/state records
- decoded resource headers
- animation/script/resource metadata
Name structures only after enough field usage is understood.
Good examples:
ResourceHeaderSpriteDescriptorScriptContextCursorState
If not enough is known, prefer temporary neutral names such as:
struct_XXXX_candidateresource_record_candidate
The strings table is a major source of context and must be used aggressively.
For each meaningful string:
- Identify cross-references to the string.
- Determine whether it is used for:
- error reporting
- debug/logging
- file/resource names
- script commands
- UI text
- copy protection
- device/system checks
- command dispatch
- Follow the referencing function outward and inward in the call graph.
- Use clustered strings to infer subsystem boundaries.
Examples:
- File extensions or resource names may indicate resource loading or archive management.
- UI/status messages may reveal menu, inventory, cursor, or script systems.
- Error strings may expose file handling, memory allocation, decompression, or driver init paths.
Do not infer more than the string supports.
A string saying AdLib may suggest audio relevance. It does not by itself prove the exact role of the entire function.
Interrupt usage is a strong clue and must be incorporated into analysis.
Pay particular attention to:
int 21hfor file management, memory allocation, program termination, device I/O, directory access, etc.- FCB- or handle-based file operations
- load/execute behaviors
- DTA manipulation
- PSP/environment interactions
Pay particular attention to:
int 10hfor video mode changes, cursor, text output, palette/video servicesint 13hfor disk accessint 16hfor keyboard inputint 1Ahfor timer/time servicesint 33hfor mouse services, if present via driver interrupt interface
Also look for:
- direct writes to VGA memory
- palette register I/O
- PIT/PC speaker programming
- AdLib/Sound Blaster port I/O
- keyboard controller access
- DMA-related setup
- timer hooks or interrupt vector manipulation
Use these clues to classify behavior, but only rename once supported by surrounding code and data flow.
Example:
- A function invoking
int 21halone is not necessarilyload_file. - A function opening a named asset, seeking, reading into a buffer, and returning a handle-sized or byte-count result may justify
open_resource_fileorread_resource_data.
As subsystem boundaries become clear, record them in ARCHITECTURE.md.
Candidate subsystems include:
- video
- graphics rendering
- sprite or animation handling
- palette management
- cursor management
- keyboard/mouse input
- audio/music/sfx
- script engine
- text/dialogue
- decoders/decompression
- resource/archive management
- save/load
- memory management
- scene/state management
- There is a clear cluster of related functions with consistent behavior.
- There are clear shared globals/structures that define subsystem state.
- Strings or resources strongly tie the functions together.
- Interrupt/hardware usage and data flow clearly indicate a distinct responsibility.
- subsystem name
- confidence level:
High - why it is considered confirmed
- key functions
- key globals/structures
- notable strings
- relevant interrupts or hardware clues
- known relationships to other subsystems
Do not add low-confidence or speculative subsystems.
ARCHITECTURE.md is a record of confirmed understanding, not a scratchpad.
Only add content when:
- the subsystem or relationship is supported by multiple strong clues
- names used are stable and justified
- the finding would still make sense to another analyst reviewing the evidence later
Each entry should be concise and factual.
Recommended format:
## Video Subsystem
**Confidence:** High
**Evidence**
- Functions at `FUN_xxxx`, `FUN_yyyy`, and `FUN_zzzz` change video mode via `int 10h`
- Shared global buffer used as backbuffer before copy to VRAM
- Palette update routine writes through VGA-related I/O sequence
- Strings referencing mode/setup failure are used by the initialization path
**Key Functions**
- `initialize_video`
- `set_video_mode`
- `blit_backbuffer_to_vram`
- `update_palette`
**Key Data**
- `video_state`
- `backbuffer`
- `palette_buffer`
**Notes**
- Video initializes before the main loop
- Rendering appears to be separated from resource decodingDo not include unresolved claims.
A rename is allowed only when the name is supported by multiple converging signals.
High-confidence signals include combinations of:
- clear interrupt semantics
- clear string references
- clear file/resource names
- repeated consistent call-site usage
- obvious buffer or structure behavior
- direct hardware interaction
- strong structural relationships in the call graph
Rename only if at least two or more strong signals converge, or one signal is exceptionally definitive.
Examples of sufficiently strong evidence:
- function opens a named asset file, uses DOS file interrupts, reads into a destination buffer, and is called by resource init code
- function writes to video memory or uses video BIOS services and is called by rendering flow
- function dispatches on bytecodes read from a script stream and updates script context fields
- function uses mouse interrupt services and updates cursor coordinates/state
Do one of the following:
- leave the original name unchanged
- apply a narrowly descriptive placeholder based on directly observable mechanics only
Examples:
reads_buffer_with_length_prefixfar_ptr_dispatcherint10_video_service_wrappercopies_words_to_segment
Avoid semantic overreach.
Use clear, consistent, descriptive names.
Use verb-oriented names:
initialize_videoload_palettedecode_sprite_frameexecute_script_commandpoll_keyboard_input
Use noun-oriented names:
current_room_idresource_indexcursor_visibleaudio_driver_type
Use PascalCase:
VideoStateScriptContextResourceEntry
Use uppercase when appropriate:
VIDEO_MODE_13HRESOURCE_TYPE_SPRITE
When forced to use an interim name, keep it descriptive and non-speculative:
bytecode_stream_ptrvideo_buffer_candidatefile_io_ctx_candidate
When analyzing a function, prefer this order:
- Identify callers.
- Identify callees.
- Inspect strings referenced directly or indirectly.
- Inspect interrupts and I/O operations.
- Track major buffers and globals touched.
- Look for repeated structural patterns.
- Determine whether the function belongs to an already-known subsystem.
- Decide whether rename confidence is high enough.
When analyzing a global or structure:
- Find all writes.
- Find all reads.
- Determine initialization.
- Determine whether access patterns imply flags, counters, coordinates, handles, or pointers.
- Associate with a subsystem only if the evidence is strong.
When operating through Ghidra MCP:
- begin from the entry point unless continuing an already-confirmed analysis thread
- inspect decompiler output, disassembly, xrefs, and data definitions together
- follow string references systematically
- inspect interrupt usage and surrounding setup/register state
- examine tables and indirect call/jump targets
- improve type information when supported by evidence
- rename incrementally and conservatively
- update
ARCHITECTURE.mdonly after confirmation
Do not mass-rename symbols based on pattern matching alone.
Do not:
- invent subsystem names without proof
- rename based on vague resemblance
- treat every
int 21hcall as generic file loading - treat every memory copy as rendering
- assume every byte stream is a script
- collapse unrelated helpers into a subsystem prematurely
- document tentative conclusions in
ARCHITECTURE.md - overwrite neutral names with stronger semantic names unless the new evidence truly supports it
During analysis, produce:
- Conservative symbol renames with high confidence
- Confirmed subsystem notes appended to
ARCHITECTURE.md - Clear explanation of evidence for each non-trivial rename
- Explicit acknowledgment of uncertainty where confidence is not high enough
For every important rename, include rationale in working notes or commit messages such as:
- string references
- interrupt semantics
- caller/callee context
- buffer usage
- structure field evidence
Recover the program one confirmed fact at a time.
- Start from the entry point.
- Use strings and xrefs aggressively.
- Use DOS/BIOS interrupts as behavioral clues.
- Track data flow carefully.
- Rename only with high confidence.
- Record only confirmed architecture.
- Record the last confirmed action we completed, and the next suggested action in TRACKER.md
- Record high level progress indicator that summarizes how many total functions are in Ghidra, and how want are still not renamed (ex: FUN_*) in TRACKER.md
Never guess.