Data Lineage: The Foundation of Regulatory Reporting
In January 2013, the Basel Committee on Banking Supervision published a document that would keep Chief Data Officers awake for the next decade: BCBS 239 (Principles for effective risk data aggregation and risk reporting).
Before BCBS 239, banks were largely treated as "Black Boxes" by regulators. The regulator would ask for a report (e.g., Liquidity Coverage Ratio), and the bank would provide a number. As long as the number looked reasonable, everyone was happy. BCBS 239 changed the game. It effectively said: "It is not enough to give us the right number; you must prove how you got it."
This requirement gave birth to the modern discipline of Data Lineage.
What is Data Lineage?
Data Lineage is the map of the data's journey through the organization. It tracks the data lifecycle from creation to consumption. Think of it like a supply chain for information. If you buy a car, you want to know where the steel came from, where the engine was built, and who assembled it. If you look at a Risk Report, you need to know:
- Source: Where was this trade captured? (e.g., Murex, Calypso).
- Transformation: How was it changed? (e.g., "We converted EUR to USD using the 4 PM fix").
- Aggregation: How was it grouped? (e.g., "Sum of all exposures to Counterparty X").
- Reporting: Where is it displayed? (e.g., Cell C4 on the COREP return).
The "EUC" Nightmare
Historically, banks built their reporting processes on End User Computing (EUCs)—better known as massive, complex Excel spreadsheets and Access databases.
- System A dumps a CSV.
- Analyst B copies it into Excel.
- Macro C runs a VLOOKUP against another sheet.
- Analyst D makes a "manual adjustment" because the number "looked wrong."
- The Final Report is emailed to the regulator.
This is a lineage nightmare. It is opaque, fragile, and prone to error. If a regulator asks, "Why did this number change?", and the answer is "Because Dave adjusted it in the spreadsheet," you have failed the exam.
Implementing Lineage: A Practical Approach
30-second video summary
You cannot map every single data point in the bank. That is boiling the ocean. You must prioritize.
1. Identify Critical Data Elements (CDEs)
Start with the report. Look at the most important fields (e.g., Total Exposure, RWA, LCR). These are your Critical Data Elements. Trace only these fields back to the source.
2. Technical vs. Business Lineage
- Technical Lineage: This scans the code. It looks at SQL scripts, ETL jobs (Informatica, Talend), and Python code to see exactly how data moves between tables. It is detailed but hard for humans to read.
- Business Lineage: This is the high-level flow. "Trade Data moves from Front Office to Risk Engine." This is what the business users and auditors want to see. You need tools (like Collibra, Solidatus, or Ab Initio) to capture both and link them.
3. Identify Control Points
Lineage allows you to see where things can go wrong. You should overlay your Data Quality Controls onto the lineage map.
- "Between System A and System B, we have a Reconciliation control."
- "At the entry to the Risk Engine, we have a Validation check (No null values)."
The Business Value Beyond Compliance
While regulators force you to do this, there is immense business value in it.
- Root Cause Analysis: When a report is wrong (and it will be), lineage tells you exactly where to look. You can trace the error upstream to the specific source system bug in minutes, not days.
- Change Impact Analysis: If the Front Office wants to upgrade their trading system, Lineage tells you exactly which downstream reports will break. You can plan your testing accordingly.
- Decommissioning: You can see which legacy systems are feeding reports. If a system feeds nothing, turn it off.
Conclusion
Data Lineage is the nervous system of the bank. It connects the sensory inputs (transactions) to the brain (reporting and decision making). In a post-BCBS 239 world, having a clear, documented, and controlled lineage is not just a "nice to have"; it is the license to operate.
Further Reading
- BCBS 239: Principles for Effective Risk Data Aggregation — The original Basel Committee publication that set the standard for data governance in banking.
- EDM Council: Data Management Capability Assessment Model (DCAM) — Industry framework for assessing data management maturity.
- DAMA International: DMBOK (Data Management Body of Knowledge) — The definitive reference for data management best practices and data lineage frameworks.
Ready to do the structural work?
Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.
Explore the AI Enablement serviceMore like this — once a month
Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.
No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.
Related insights
AI-Powered AML and KYC: Smarter Compliance, Fewer False Positives
How machine learning is transforming Anti-Money Laundering and Know Your Customer processes—reducing false positives by up to 70% while strengthening detection, as regulators from FATF to the EBA signal growing support for AI adoption.
February 14, 2026DORA Is Here: How AI Helps Banks Meet Digital Operational Resilience Requirements
The EU's Digital Operational Resilience Act (DORA) took effect in January 2025. Here is how AI can help financial institutions comply with its five pillars—from ICT risk management to threat-led penetration testing.
February 08, 2026Fixing the 95% False Positive Problem: AI in Transaction Monitoring and SAR Filing
Transaction monitoring systems generate overwhelmingly false alerts. Here is how AI-driven anomaly detection and automated SAR narrative generation are transforming financial crime compliance—with backing from FinCEN, the FCA, and BaFin.
January 25, 2026