nvisia AI Lab · Prototype
Sherlock — Legacy Code Intelligence
When the code has been running for 30 years and the person who wrote it is gone — how do you figure out what it does?
nvisia AI Lab · Prototype
Sherlock — Legacy Code Intelligence
Built by Ruben Rotteveel
An AI agent system that reads legacy code — COBOL, Java, C#, batch files — and reverse-engineers what it does.
No documentation needed. No original developer needed. Sherlock figures it out.
COBOL
Decades-old mainframe logic
Java
Enterprise executables
C#
Modern legacy systems
Batch
Scheduled jobs & scripts
The Institutional Knowledge Crisis
Somewhere in your organization, there's a system that's been running for 20 or 30 years. The developers who built it are gone. The documentation either never existed or stopped being accurate a decade ago. And yet — it runs. Every day. Processing your most critical data.
When you need to modernize a system like that, you face a fundamental question: how do you rebuild something you don't understand? That's what Sherlock is built to answer.
"A scheduled job started in 2016 and it's still running regularly. For 10 years, no one's touched it and it just keeps doing the same thing. Some of these have been running for 30 years. The guys that wrote it are gone — and no one is there to say, 'here's what's happening.'"
— Ruben Rotteveel
Reverse-Engineering Requirements from Code
Sherlock takes legacy code - whatever language, whatever era - and works backward. It reads the files, traces the relationships, and extracts what the system is actually doing. From that, it produces three things:
Architecture Document
How does this system fit together? Sherlock maps the execution model — batch files, Java executables, database-driven configuration, stored procedures — and describes the layers of the system in plain language.
Knowledge Graph
Every program, every code file, every extracted requirement — connected. A navigable map of the system where you can click any node and see everything it touches: what calls it, what it calls, what requirements flow from it.
Requirements Documents
The end output: Word documents containing validated, structured requirements — ready to hand to a development team to rebuild the system in a modern language and framework.
Agents That Review Each Other
Sherlock doesn't just run one agent and hope for the best. It uses a multi-agent review loop — one of the most important design decisions in the system.
The core challenge: AI agents are trained to make the user happy. Left unchecked, an agent will fill in gaps rather than surface them, claiming success even when the analysis is incomplete.
"The reviewer will go really deep in their analysis to find the flaws in the original model's work. You can kind of pit them against each other — and they do a pretty good job."
— Ruben Rotteveel
Ruben is actively tuning specialized reviewers for different dimensions: architecture quality, coding practices, knowledge graph structure, and requirements completeness.
A Week vs. Six Months
The token costs are real. Running multi-agent review loops on large codebases burns tokens. But the comparison that matters isn't tokens vs. zero — it's tokens vs. humans.
$1K
Token Cost
Ruben reached near-completion on a legacy codebase in one week
$100K
Human Time
A business analyst spent 6 months on the same codebase — and gave up
Beyond Direct Cost
There's opportunity cost: while a six-month review is underway, the client's most critical systems go unimproved, clients get frustrated, and business is at risk.

Speed of understanding is itself a form of value.
"Comparing it to people, it's still cheaper. A week versus six months is a big difference in cost."
— Ruben Rotteveel
What's Working — and What Still Needs Work
What's Working
Requirements Extraction
Sherlock reliably surfaces what legacy code is doing, even from COBOL and decades-old batch files with no documentation.
Architecture Mapping
The system traces execution layers and describes how components connect in ways that would take human analysts weeks to produce.
Multi-Agent Review
Reviewer agents catch gaps and surface issues that a single-pass approach would miss.
Knowledge Graph
The goal is interactive traversal of all relationships. The current implementation is functional but not yet the navigable, spatial interface Ruben envisions.
The Economics
Even with high token costs, the speed advantage over human analysis is dramatic.
⚠️ Still In Progress
Reviewer Tuning
Specialized reviewers for different quality dimensions — architecture, coding practices, requirements completeness — are still being developed.
Recovering "Why"
Sherlock can tell you what a system does and surface structural problems. Recovering the reasoning behind decades-old decisions remains a genuine limitation.
Under the Hood
Sherlock is built on a real client engagement and is actively running on production legacy codebases.
Languages Analyzed
COBOL, Java, C#, batch files, stored procedures
Read-Only Access
Sherlock reads and analyzes but never modifies source code
Agent Framework
Multi-agent with specialized reviewer agents for each quality dimension
Knowledge Representation
Graph-based, linking programs → executables → requirements
Output Formats
Architecture documents, knowledge graph, requirements Word documents
Model
Claude (via Anthropic) — powering analysis and multi-agent review loops
Prototype by Ruben Rotteveel · nvisia AI Lab
See Sherlock at the Symposium
nvisia AI Symposium · Chicago · May 7, 2025
Visit the Sherlock booth at the nvisia AI Symposium to see it all in action. Ruben Rotteveel will be on-site to demo and discuss.
Live Requirements Extraction
A live extraction from a real legacy codebase — watch Sherlock read code it has never seen before.
The Architecture Document
See how Sherlock maps a system — execution layers, component relationships, plain-language descriptions.
The Knowledge Graph
Work in progress — come see where it's headed and what the navigable interface will look like.
Multi-Agent Review Loop
Watch agents critique each other's work in real time — the core quality mechanism behind Sherlock.
The Economics Discussion
Token cost vs. human time in real engagements — the numbers that make the business case undeniable.