nvisia AI Lab · Prototype
Sherlock: Legacy Code Intelligence
When the code has been running for 30 years and the person who wrote it is gone — how do you go from 'we don't understand it' to 'we're ready to rebuild it'?
nvisia AI Lab · Prototype
Sherlock — Legacy Code Intelligence
Hosted by Ruben Rotteveel & Marianela Crissman
Ruben Rotteveel, Technical Director
Marianela Crissman, Technical Lead
An AI agent system that reads legacy code — COBOL, Java, C#, batch files — and reverse-engineers what it does.
No documentation needed. No original developer needed. Sherlock figures it out.
COBOL
Decades-old mainframe logic
Java
Enterprise executables
C#
Modern legacy systems
Batch
Scheduled jobs & scripts
The Institutional Knowledge Crisis
Somewhere in your organization, there's a system that's been running for 20 or 30 years. The developers who built it are gone. The documentation either never existed or stopped being accurate a decade ago. And yet — it runs. Every day. Processing your most critical data.
When you need to modernize a system like that, you face a fundamental question: how do you rebuild something you don't understand? That's what Sherlock is built to answer.
"A scheduled job started in 2016 and it's still running regularly. For 10 years, no one's touched it and it just keeps doing the same thing. Some of these have been running for 30 years. The guys that wrote it are gone — and no one is there to say, 'here's what's happening.'"
— Ruben Rotteveel
Reverse-Engineering Requirements from Code
Sherlock takes legacy code - whatever language, whatever era - and works backward. It reads the files, traces the relationships, and extracts what the system is actually doing. From that, it produces three things:
Architecture Document
How does this system fit together? Sherlock maps the execution model — batch files, Java executables, database-driven configuration, stored procedures — and describes the layers of the system in plain language.
Knowledge Graph
Legacy systems are built layer by layer over decades — and that history is invisible. The Knowledge Graph makes it visible. Every component, every relationship, every requirement, all connected in a map you can actually navigate. Pull on any thread and see exactly what it's attached to.
Requirements Documents
The structured requirements Sherlock extracts — organized by rule type, functional vs. non-functional, and linked to their source — form the foundation of a downstream migration pipeline. From here, requirements are clustered by domain, classified, reviewed by architects, and exported as implementation-ready specs for development teams or direct AI agent ingestion.
From Requirements to Migration-Ready Specs
Sherlock extracts the requirements. What happens next is where the migration actually becomes possible.
Marianela Crissman has built a downstream pipeline that takes Sherlock's output and carries it through to implementation-ready deliverables — using embeddings, clustering, and a purpose-built Domain Explorer app.
"It's like a sequel."Marianela Crissman
Cluster the building blocks
Sherlock's extracted requirements are passed through a pipeline that generates vector embeddings and runs clustering algorithms (with tuned hyperparameters) to surface natural groupings. On a typical legacy engagement, this converts 20,000 extracted requirements into approximately 700 discrete building blocks — each with an LLM-generated summary and title describing what that part of the system does. (The exact number depends on the analysis done.)
Discover and validate domains
The clustering process runs a bottom-up discovery of system domains from the data itself. In parallel, subject matter experts and architects identify domains top-down through conversation. The two lists are reconciled: domains are confirmed, merged, added, or expanded based on both sources of signal.
Domain Explorer
A standalone web app built to help architects and developers navigate the results. The left panel shows all validated domains. Expanding a domain reveals its building blocks — each with the original requirements extracted by Sherlock, rationale, acceptance criteria, and source traceability. Architects can flag incorrect assignments, submit corrections, and review decisions before export. This app was shown to the client on May 18, 2026.
The Domain Explorer is a standalone application — not embedded in Sherlock, but purpose-built to work alongside it. It was originally created by Marianela Crissman to give herself visibility into the clustered results, and grew into a multi-user tool that architects and developers can use independently to review, correct, and export migration deliverables.
Export to developers or directly to Claude Code
Once domains are reviewed and finalized, the pipeline exports either:
  • A PDF per domain — a complete implementation brief for a developer to pick up and start building
  • A JSON feed — the same structured data, fed directly into Claude Code with guardrails, allowing it to plan, write specs, get human review, and begin implementation
The full agentic path (JSON → Claude Code → implement) is designed to minimize human-in-the-loop steps without removing oversight. It is currently in development and has not yet been end-to-end tested.
Feedback back to Sherlock
The domain and cluster assignments are exported and fed back into Sherlock. Sherlock then knows how all requirements are distributed across domains — so future analysis runs don't re-surface already-processed requirements.
Agents That Review Each Other
Sherlock doesn't just run one agent and hope for the best. It uses a multi-agent review loop — one of the most important design decisions in the system.
The core challenge: AI agents are trained to make the user happy. Left unchecked, an agent will fill in gaps rather than surface them, claiming success even when the analysis is incomplete.
"The reviewer will go really deep in their analysis to find the flaws in the original model's work. You can kind of pit them against each other — and they do a pretty good job."
— Ruben Rotteveel
Ruben is actively tuning specialized reviewers for different dimensions: architecture quality, coding practices, knowledge graph structure, and requirements completeness.
A Week vs. Six Months
The token costs are real. Running multi-agent review loops on large codebases burns tokens. But the comparison that matters isn't tokens vs. zero — it's tokens vs. humans.
$1K
Token Cost
Ruben reached near-completion on a legacy codebase in one week
$100K
Human Time
A business analyst spent 6 months on the same codebase — and gave up
Beyond Direct Cost
There's opportunity cost: while a six-month review is underway, the client's most critical systems go unimproved, clients get frustrated, and business is at risk.
"Comparing it to people, it's still cheaper. A week versus six months is a big difference in cost."
— Ruben Rotteveel
What's Working — and What Still Needs Work
What's Working
Requirements Extraction
Sherlock reliably surfaces what legacy code is doing, even from COBOL and decades-old batch files with no documentation.
Architecture Mapping
The system traces execution layers and describes how components connect in ways that would take human analysts weeks to produce.
Multi-Agent Review
Reviewer agents catch gaps and surface issues that a single-pass approach would miss.
Knowledge Graph
The graph-based representation is functional. A companion tool — the Domain Explorer — has been built by Marianela Crissman to provide navigable domain and requirements traversal for architects and developers.
The Economics
Even with high token costs, the speed advantage over human analysis is dramatic.
Downstream Domain Pipeline
Embeddings-based clustering of ~700 extracted requirements into discrete building blocks, with LLM-generated summaries and domain classification. Validated on a real client engagement and presented to the client in May 2026.
Domain Explorer App
A purpose-built web interface for architects and developers to navigate discovered domains, review building block assignments, flag corrections, and export implementation-ready specs. In active use.
⚠️ Still In Progress
Reviewer Tuning
Specialized reviewers for different quality dimensions — architecture, coding practices, requirements completeness — are still being developed.
Recovering "Why"
Sherlock can tell you what a system does and surface structural problems. Recovering the reasoning behind decades-old decisions remains a genuine limitation.
Full Agentic Implementation Pipeline
The path from Domain Explorer JSON export → Claude Code → spec generation → implementation is designed and partially built. End-to-end testing has not yet been completed.
Under the Hood
Sherlock is built on a real client engagement and is actively running on production legacy codebases.
Languages Analyzed
COBOL, Java, C#, batch files, stored procedures
Read-Only Access
Sherlock reads and analyzes but never modifies source code
Agent Framework
Multi-agent with specialized reviewer agents for each quality dimension
Knowledge Representation
Graph-based, linking programs → executables → requirements
Output Formats
Architecture documents, knowledge graph, requirements Word documents
Model
Claude (via Anthropic) — powering analysis and multi-agent review loops
Embeddings + Clustering
Vector embeddings and hyperparameter-tuned clustering algorithms for requirements grouping (Python / Jupyter Notebooks)
Domain Explorer
Purpose-built web app for domain navigation, requirement review, and decision export
Claude Code
Target agent for downstream implementation pipeline (in development)
Export Formats
PDF (developer briefs) and JSON (AI agent ingestion)
Prototype by Ruben Rotteveel · Migration Pipeline by Marianela Crissman · nvisia AI Lab
See Sherlock at the AI Lab — ELITE Tech Showcase
nvisia ELITE Tech Showcase · June 24, 2026 · Grand Geneva Resort & Spa
Ruben Rotteveel and Marianela Crissman will both be on-site to demo and discuss.
Loramoor B, Lower Level · 12:30–3:30 PM
What you can see at the booth:
Live Requirements Extraction
Live requirements extraction from a real legacy codebase
Domain Explorer
Navigate discovered domains and drill into building blocks
Full Migration Pipeline
From raw legacy code to implementation-ready specs
Multi-Agent Review Loop
Agents critiquing each other's analysis in real time
The Economics
$1K in token costs vs. $100K in human analyst time