Case Study: Sovereign AI in Enterprise Software Modernization

The Challenge
A major telecommunications provider needed to modernize millions of lines of legacy COBOL and migrate proprietary data transformation workflows built on Informatica — mission-critical systems that powered core business operations. The target was modern, maintainable platforms: Java for application logic, PySpark and Airflow for data processing and orchestration.
The codebase had all the hallmarks of decades-old enterprise software. No documentation. No requirements tracing. No test suites. The business logic that kept the operation running was locked inside code that few people could understand, let alone safely modify. The organization had reached the point where the business was shaped around the limitations of its software, rather than the software adapting to the needs of its dynamic and scaled business.
Traditionally, a modernization effort of this scale would require teams of hundreds of engineers, 24 or more months of effort, and a high tolerance for risk. Industry data bears this out — large-scale modernization projects fail at staggering rates, and even successful ones routinely exceed their timelines and budgets by multiples.
Adding to the complexity: the client's security policy required that all source code remain within their network boundary. No exceptions. Any AI-assisted approach would need to operate entirely inside the client's infrastructure — a constraint that immediately disqualified most of the available options.
The Approach
This project deployed a multi-agent AI platform purpose-built for enterprise-scale code understanding and software engineering. Rather than deploying hundreds of engineers to partition and port the codebase, the platform used a fundamentally different strategy: reverse-engineer the missing artifacts before generating a single line of modernized code.
The foundation was deep, structural understanding of the existing codebase. The platform parsed every file in the repository, constructed full syntax trees of the source code, and built a complete dependency graph — a map of every relationship between every unit of code in the application. This wasn't keyword search or pattern matching. It was a precise, machine-readable model of how the entire system fit together.
From that foundation, the platform deployed specialized AI agents in capability layers, each building on the outputs of the one before it:
- Documentation generation produced rich technical documentation for every file and module — the documentation that had either never existed or had decayed beyond usefulness as the codebase aged.
- Requirements generation reverse-engineered business intent from the source code itself — functional requirements, user stories, and application-level features — treating the code as the authoritative source of truth for what the system actually does.
- Test generation produced test suites for the existing functionality, creating the validation capabilities needed to confirm that modernized code behaves identically to the legacy system it replaces.
- Code generation — the final layer, consuming outputs of all the layers beneath it — transpiled the legacy code into the modern target languages with far higher fidelity than a model working without that rich context.
Each agent interaction was scoped by the dependency graph. When an agent worked on a unit of code, the platform's context service provided that unit and its mapped dependencies — nothing more. The model never saw the full codebase in a single interaction. It saw exactly what it needed to see.
To satisfy the client's requirement that source code never leave their network, a frontier AI model was deployed directly within the client's infrastructure. The platform's model-agnostic architecture made this possible — it was engineered to work with any model, so the hosting constraint shaped the model selection without limiting the platform's capabilities.
Two parallel workstreams executed simultaneously: COBOL to Java for application logic, and Informatica to PySpark and Airflow for data transformation and orchestration. The entire engagement was operated by a team of four, in conjunction with the client's relevant business and technical stakeholders.
Sovereign AI in Practice
What made this project successful was not just the platform's technical capabilities. It was the deliberate, infrastructure-level decisions that governed how those capabilities were designed and deployed — decisions that reflect the three pillars of AI sovereignty.
Choice
The platform was engineered to be model-agnostic — every agent, across every capability layer, could run against any model. This meant the client's requirement that AI models run inside their infrastructure became a model selection decision, not an architectural constraint. The hosting requirement shaped which model was chosen, not how the platform operated. And because no part of the platform was coupled to a specific provider, that decision remained a flexible one — open to revision as the landscape evolved, whether through new providers offering on-premises deployment, open-source models reaching sufficient capability, or changes in the client's own risk tolerance.
Control
The client's source code — millions of lines of mission-critical application logic — never left their network. This was not a contractual assurance from a vendor. It was an infrastructure-level guarantee: the model ran inside the client's environment, and the code physically could not reach an external provider.
Within that boundary, control extended further. Each agent interaction was scoped by the dependency graph to a single unit of code and its direct dependencies. The model never received the full codebase. It received the minimum context required to do its job — enforced by the platform's architecture, not by the judgment of the person operating it.
Each agent class operated within a defined role. Documentation agents produced documentation. Requirements agents produced requirements. No single agent had unconstrained access to the full pipeline or the authority to act beyond its designated task.
What served as a control mechanism also served as a quality mechanism — scoped context and defined roles meant each agent worked more effectively and accurately.
Clarity
Building codebase understanding in layers produced a complete, inspectable chain of artifacts at every stage of the modernization: parsed source code, dependency maps, technical documentation, reverse-engineered business requirements, test suites, and modernized code. Each artifact was associated with the specific unit of code that produced it. Each was available for human review.
This was not a black-box transpilation where legacy code went in and modern code came out. At every intermediate step, the reasoning was visible: here is what the code does (documentation), here is what it is meant to accomplish (requirements), here is how we will verify it (tests), and here is the modernized implementation (code generation). If a question arose about any unit of modernized code, the full chain — from original source through every generated artifact — was traceable.
The test suites generated by the platform served as the primary validation mechanism, confirming that modernized code behaved identically to the legacy system. This was evaluation grounded in the client's actual codebase and business logic — not a generic benchmark, not a vendor's quality score, but a direct, verifiable answer to the question: does this new code do what the old code did?
The Outcome
The modernized codebase — Java application logic, PySpark and Airflow data pipelines — was deployed into production. What would have traditionally required hundreds of engineers and two or more years of effort was completed by a team of four in a matter of months.
But the more significant outcome was not speed: it was what the client had at the end.
They did not receive a mechanical transpilation that traded one maintenance burden for another. They received modernized code backed by a full set of the artifacts that their legacy systems didn't have: comprehensive technical documentation, detailed business requirements, and test suites that validated functional equivalence. For the first time, the organization had a complete, documented understanding of what these systems do — expressed in languages and formats that their current and future engineers can read, maintain, and build upon.
The fidelity of these outputs — the accuracy of the documentation, the precision of the reverse-engineered requirements, the correctness of the modernized code — was exceptionally high. This was not incidental. It was a direct consequence of sovereign AI architecture. Because inference ran inside the client's infrastructure, the model operated on complete, unredacted source code and data — no sanitization, no loss of fidelity at the input layer. Scoped context and defined agent roles meant each agent worked with precisely relevant information on a precisely defined task. And because codebase understanding was built in layers — each producing inspectable, verifiable artifacts — errors were caught at intermediate stages, not compounded through to the final output. Sovereign AI drives quality at every layer.
The business was no longer shaped around the limitations of its software. The software could now adapt to the needs of the business.
A project like this requires AI to operate deep inside a client's most sensitive systems — their source code, their business logic, their production infrastructure. Without the ability to decide where models run, what they can access, how they're scoped, and how their outputs are validated, an organization either accepts risk it cannot quantify or forgoes the project entirely.
Which model handled the work? The client decided — and could change that decision at any time. Where did inference run? Inside their network, by policy. What did the model see? Only what the architecture allowed — scoped by the context service, not by trust. How was quality validated? Against the client's own codebase, through reverse-engineered requirements and generated test suites. At every layer, the answer was the same: the client decided, deliberately.
The choice of model and hosting was deliberate — and flexible. The control over data, scope, and agent authority was architectural — not contractual. The clarity into every artifact, every decision, and every validation was complete — and auditable. This is sovereign AI.