On reverse engineering using AI

Leveraging reverse Ralph loops to understand legacy codebases

Apr 05, 2026

There’s a technique doing the rounds in agentic engineering circles called the Ralph loop, coined by Geoffrey Huntley. In its forward direction, it’s an autonomous coding pattern: give an AI agent a set of specs, a goal, and a loop — and let it build. One task per iteration. Fresh context each time. No accumulating noise. No compaction.

But Ralph runs in both directions. And the reverse direction is where things get interesting for anyone facing a legacy codebase migration.

This article is an edited summary of a Claude conversation on this topic - but I wanted to share what I’d learnt in hopefully a consumable way.

What is a Reverse Ralph?

A forward Ralph loop takes specifications and produces code. A reverse Ralph does the opposite: it takes existing code and produces specifications. The goal isn’t to understand every line — it’s to extract enough structured knowledge that a migration or decomposition effort can begin from a solid foundation rather than archaeology.

Huntley himself described the approach for porting codebases: run a loop over your test files to extract behavioural specs, run a second loop over your source files to document what each module does, then use those specs as the input to whatever comes next. The source code becomes read-only. The specs become the new source of truth. Huntley has a fun example of taking machine code, deriving specs and then producing the application for the ZX80.

Why Loop at All?

The honest answer is that a single long AI conversation would use fewer tokens in total, and might take shorter time due to not paying the startup cost of each iteration. But total tokens isn’t the right metric.

In a single long context session, every new file you process gets added to an ever-growing conversation. By the time you’re looking at file 30, the model is carrying the full weight of files 1 through 29 — most of which are irrelevant to the current task. Quality degrades. The model loses the thread. Eventually the provider compresses older context and detail is silently lost. This is what’s known as context rot.

The Ralph loop sidesteps this entirely. Each iteration is a fresh process with a clean context window. Prior iterations don’t accumulate - they’ve been externalised to the filesystem as spec files in a directory structure that mirrors the source - making an idempotent run possible. The model always starts from a known state. The cost is a small, predictable overhead per loop. The benefit is consistent, auditable output across every iteration, and cheap recovery when something goes wrong.

For large codebases, there’s a further advantage: loops can be parallelised across independent directories in a way that a single long session never can.

Adding Domain Modelling

A spec-per-file approach captures what each module does. But it doesn’t tell you how the business concepts hang together — which entities matter, how they relate, where the natural seams are for decomposition.

Domain modelling is cross-cutting by nature. A concept like Order might live across a controller, a service layer, a database model, a validator, and an event handler. No single file contains the full picture, so it can’t emerge from a file-by-file loop alone.

The solution is a secondary stage that reads specs rather than source. By the time you’ve extracted specs from every file, you have a compressed, normalised view of the codebase that’s far easier to synthesise across. An entity like Order will appear in dozens of spec files — and a second pass can aggregate those references into a coherent domain picture without ever touching the original source again.

The Plan — In Short

The full process runs in six phases, each producing output that feeds the next. Phases 1 through 5 are strictly read-only on source files. Humans review at two points before forward work begins.

Phase 1 — Test specs: Loop over every test file, extract one behaviour spec per iteration into specs/tests/. Capture acceptance criteria, domain nouns referenced, and edge cases. Each iteration uses a fresh content.
Phase 2 — Source specs: Loop over every source file, extract one module spec per iteration into specs/src/. Capture purpose, public interface, data in/out, dependencies, and domain events. This enriched template is what makes the domain modelling phases possible. Each iteration uses a fresh context.
Phase 3a — Entity capture: Loop over the specs (not the source), identify domain entities and document each one in specs/domain/entities/. This uses a shared context window.
Phase 3b — Entity extraction: Loop over the files in specs/domain/entities/ then search across all the spec files for instances of this entity, and capture in the entity spec file. Each iteration uses a fresh context.
Phase 4 — Relationship mapping: Single invocation - shared context. Read all entity specs, produce a relationship map documenting how entities connect and where the ownership boundaries are.
Phase 5 — Bounded context proposal: Single invocation - shared context. Read the relationship map, propose a decomposition into bounded contexts with rationale.
Phase 6 — Migration TODO: Single invocation - shared context. Read everything, produce a prioritised task list for forward work.

Each phase script enforces read-only guardrails at both the prompt level and the shell level. If the agent touches source files it shouldn’t, the script detects it and reverts automatically.

Two Things Worth Remembering

The specs are the new source of truth. After Phase 2, the agent never needs to read the original codebase again. Every downstream phase works from specs. This means the quality of your spec template in Phase 2 directly determines the quality of everything that follows.

The bounded context decisions need a human. The agent will propose a decomposition based on technical coupling signals. It has no knowledge of your team structure, your deployment constraints, your political landscape, or which parts of the system your organisation actually cares about. Phase 5 output is a starting draft for a conversation — not an answer.

Robbie Clutton

Discussion about this post

Ready for more?