Start United States USA — software From Black Box to Blueprint: Thoughtworks Uses Generative AI to Extract Legacy...

From Black Box to Blueprint: Thoughtworks Uses Generative AI to Extract Legacy System Functionality

76
0
TEILEN

Thoughtworks consultants successfully harnessed generative AI to decode legacy systems lacking source code. Using Gemini 2.5 Pro, they accelerated reverse engineering, creating validated „blueprints“ of functionality in just two weeks. The pilot showcased AI’s potential to drastically reduce time and risk in modernizing opaque systems while balancing speed with validation.
Thoughtworks consultants recently described an experiment that applied generative AI to a legacy system with no available source code.
The article, shared on Martin Fowler’s blog, highlighted a pilot where a five-person team analyzed the system’s database, UI, and binaries in parallel.
InfoQ reached out to the authors, Thiyagu Palanisamy and Chandirasekar Thiagarajan, who explained that during the two-week pilot the team used Gemini 2.5 Pro to analyse a thin slice of what was an enormous legacy system. The output of that analysis was a functional specification — a „blueprint“ of the black-box system that domain experts were able to validate.
AI proved most effective in decoding code, summarizing binaries, and mapping database changes, while also easing schema discovery.
AI made a significant difference in reverse engineering the ASM code. Traditional approaches would have taken months to decode the logic specified ASM and also to identify the system functions vs business functionality.
The exercise demonstrated how AI can accelerate reverse engineering, providing insights into legacy systems at a pace difficult to achieve through manual methods alone.
Enterprises often rely on critical systems that have become opaque after many years of use. Documentation is incomplete, source code may be missing, and institutional knowledge erodes over time.
The article frames this as the „black box“ problem: the system works, but its internal rules are hidden. The goal is not to regenerate code but to reconstruct a „blueprint“ of functional intent that can inform modernization with lower risk.
The pilot combined several techniques. One strand focused on connecting dots across data sources by correlating what could be observed in the UI, database schema, and runtime behavior. Another applied change data capture to trace how specific user actions triggered mutations in the database.
Change Data Capture Methodology (source: martinfowler.com)
From there, the team attempted server logic inference by linking database activity with binary calls. This extended into what they describe as AI-assisted binary archaeology, where decompilation tools and large language models helped summarize functions and propose candidate responsibilities.

Continue reading...