The Data Dictionary as a Regulatory Artefact: BCBS 239 and Beyond
The data dictionary is an artefact that rarely gets the attention it deserves. It sits in a SharePoint folder, maintained by a small data team, referenced occasionally when someone cannot figure out what a field means. Then a regulator asks why the firm's Common Equity Tier 1 ratio differs from the ratio reported in the prior quarter, and the answer involves tracing the definition of a single field back through four systems, two vendor feeds and a transformation layer. That trace is the data dictionary's real job.
This post covers the data dictionary as a regulatory artefact in financial services, how to structure it so it holds up under scrutiny, and the failure modes that keep appearing in examination findings. The framing is drawn from BCBS 239 principles but applies equally to Solvency II, DORA data requirements, and broader data governance regimes.
Why the data dictionary matters
BCBS 239 was issued in 2013 and is still a top three source of supervisory findings for G-SIBs more than a decade later. The reason is that firms underestimated how much effort it takes to document what their data actually means, where it comes from, how it transforms, and who is accountable for its quality. The data dictionary is the artefact that records all of this. When it is incomplete, the downstream effects are severe: unreliable regulatory reports, inability to explain variances, reworked numbers under time pressure, and supervisory confidence that erodes with every miss.
The failure is not a failure of intent. Most firms know the data dictionary matters. The failure is of discipline and tooling. The dictionary is kept current during build. It drifts during run. Change management does not touch it. By year three, 30 percent of the definitions are subtly wrong, and the errors compound because downstream teams rely on them.
What a regulatory-grade data dictionary contains
A dictionary suitable for regulated reporting contains more than a field name and a data type. The minimum regulated-grade structure has eight elements per field.
1. Canonical name and synonyms
The canonical name is the authoritative label for the field. The synonyms section lists every alias the field uses across the estate: the name in the source system, the name in the data warehouse, the name in the regulatory template, and any common shorthand the business uses. Synonyms are where most confusion starts. The same economic quantity called "exposure" in one system, "notional" in another, and "EAD" in a third is a classic BCBS 239 failure mode.
2. Business definition
A plain-language definition written for a business audience. Not a technical description. A business user reading the definition should be able to answer the question "what does this number mean" in one sentence. "Exposure at default" is not a business definition. "The amount the bank stands to lose if the counterparty defaults today, measured in the reporting currency, calculated as on-balance-sheet exposure plus the regulatory add-on for off-balance-sheet commitments" is a business definition.
3. Technical definition
A precise technical definition: data type, allowed values, null behaviour, encoding, precision. For numerical fields, the unit of measure and the rounding rule. For date fields, the timezone convention and the calendar convention. These are the details where edge cases live.
4. Source of truth
The authoritative source system for the field. Not "the warehouse", which is a consumer. The system that creates the value. If the field is derived, the source of truth is the calculation, and the calculation's inputs are separately sourced. This links directly to the data lineage mapping artefact.
5. Lineage
The end-to-end path from source of truth through every system, transformation and store until the field reaches the regulatory report or downstream consumer. Lineage is a first-class artefact in its own right. The dictionary entry references it rather than duplicating it.
6. Quality rules
The specific data quality rules that apply to the field: validity checks, completeness thresholds, referential integrity, business rule consistency. Each rule has a measurement frequency and an exception threshold. Without quality rules, the dictionary is a definition without a guarantee.
7. Ownership
The accountable business owner, the steward responsible for day-to-day quality, and the data custodian responsible for the technical implementation. Under BCBS 239 principle 1 (governance), this ownership chain is not optional. The FCA and PRA expect a named senior manager accountable for the integrity of material data assets.
8. Regulatory use
The regulatory reports or returns the field feeds, with the specific line item or field reference. A field that contributes to COREP COREP FINREP FINREP, to the ECB STE templates, or to the FCA RAG returns should say so explicitly. This is the column that auditors trace backward from when a regulatory figure looks wrong.
Structural patterns that work
Pattern one: one dictionary, many views
The common mistake is to maintain multiple dictionaries: one for the data warehouse, one for the regulatory reporting team, one for the risk team. They inevitably drift. The pattern that works is a single authoritative dictionary with views tailored to each consumer. The warehouse team sees the lineage view. The regulatory team sees the report mapping view. The risk team sees the calculation view. Same underlying metadata, different projections.
Pattern two: the dictionary is code, not prose
Dictionary entries live in a structured format (JSON, YAML, or a metadata repository with a schema) that can be validated, diffed, version-controlled and queried. Prose dictionaries in Word or Confluence are human-readable but machine-opaque. Structured dictionaries can be checked for completeness, cross-referenced with the data catalog, and rebuilt as reports for any audience.
Pattern three: change through pull request
Changes to the dictionary follow the same discipline as changes to code. A pull request, a reviewer from the data governance function, a reviewer from the business owner, a pipeline that checks the change does not break downstream dependencies. This discipline prevents the silent drift that kills dictionary accuracy over time.
For a broader view of how data governance discipline connects to AI enablement and operational resilience, see our post on data layer as constraint on enterprise AI.
Common failure modes
Failure 1: the dictionary built for certification, not operation
The dictionary is built as a deliverable for a specific regulatory exercise (often BCBS 239 implementation), signed off, and then frozen. By year two, it describes a data landscape that has moved on. The fix is to treat the dictionary as operational infrastructure, not a one-off deliverable. It must be updated as part of every change that affects a field's definition, source or lineage.
Failure 2: synonyms that do not agree
Two systems call the same field by different names, and the dictionary entries for the two names have slightly different definitions. This is the most common source of regulatory reporting discrepancies. The fix is a canonical name discipline: one name is authoritative, the others are synonyms that point to it, and the synonym definitions are derived from the canonical definition.
Failure 3: missing derivation logic
The field "net exposure" is in the dictionary. The definition says "exposure net of collateral". The dictionary does not say which collateral valuations are used, whether the collateral is haircut, which FX rate applies if the exposure is in a different currency. Three teams implement "net exposure" three different ways. Fix: the dictionary entry for a derived field includes the full derivation logic, with references to the source fields and their own entries.
Failure 4: no ownership, no updates
Fields exist in the dictionary with no owner. When the underlying business process changes, nobody updates the field. Three years later, the field's definition no longer matches reality. Fix: enforce ownership as a mandatory metadata element. Fields without an owner fail dictionary validation and surface in the governance dashboard.
Failure 5: the dictionary disconnected from the lineage
The dictionary says where the field comes from. The lineage says how it gets there. They are maintained by different teams and they drift. Fix: the dictionary and the lineage are two views over the same underlying metadata, generated from a shared model.
The role in regulatory reporting
For COREP, FINREP, the ECB STE returns, MIFIR transaction reporting, and the FCA regulatory returns, the data dictionary is the artefact that connects the reported figures to the underlying data. When a regulator asks why a figure looks the way it does, the dictionary is the starting point of the answer.
The depth required depends on the regulatory regime. Under the BIS and ECB frameworks, traceability to source is explicit. Under the FCA regime for UK firms, the expectation is similar though less prescriptive. Under DORA, the data underlying incident reporting must be documented with its quality controls and lineage. The common thread: the dictionary is the artefact that provides the provenance for every regulated figure the firm reports.
Our data lineage regulatory post covers the lineage dimension in detail, and together with the dictionary the two artefacts provide the core data governance stack for regulated reporting.
Building the dictionary for a new programme
If you are setting up data governance for a new programme, the sequence we recommend is:
- Start with the regulatory report and work backward. List every field in the report. Each field gets a dictionary entry.
- Identify the source of truth for each field. The system that originates the value. Not the data warehouse.
- Document the lineage from source of truth to report. Every transformation, with the business rule that governs it.
- Name the owners. Accountable owner, steward, custodian. No field without an owner.
- Define the quality rules. What must be true for the field value to be trusted. Measurement frequency. Exception threshold.
- Instrument the pipeline. Every dictionary entry has monitoring. Every breach surfaces as an incident.
- Review on every change. The dictionary is not frozen. It is updated with every change that affects a field.
The BIAN service domain model and the EDM Council DCAM framework provide reference architectures for the data governance layer that surrounds the dictionary. The EBA BCBS 239 monitoring report publishes what regulators actually find when they examine dictionary quality.
The short version
The data dictionary is the artefact that connects regulated figures to underlying data. If it is accurate, complete and current, the firm can explain its numbers and defend them in supervisory dialogue. If it is not, every variance investigation becomes a forensic exercise and every regulatory ask becomes a fire drill.
The discipline that keeps the dictionary accurate is mostly about ownership, tooling, and change management. Treat it as operational infrastructure, not a one-time deliverable. Enforce structure through metadata schemas, not through goodwill. Connect it to the lineage, the quality rules, and the ownership model.
Our business analysis service covers the data artefact stack as part of regulated programme delivery, and the data lineage mapping resource is the direct companion to the dictionary pattern described here. For the broader framing of data as a regulatory capability, our regulatory compliance transformation service is the entry point.
Ready to do the structural work?
Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.
Explore the AI Enablement serviceMore like this — once a month
Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.
No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.
Related insights
Data Flow Diagrams That Satisfy GDPR and DORA
How to build data flow diagrams that pass privacy impact assessments, DORA third-party scrutiny, and internal audit. Notation, scope, and the pitfalls that turn DFDs into shelfware.
April 17, 2026Entity-Relationship Diagrams for Financial Services: From Customer to Trade
How to build ERDs that work for regulated financial data, handle corporate hierarchies, party relationships, time-variance, and product hierarchies without collapsing under real-world complexity.
April 17, 2026Given-When-Then Acceptance Criteria for Regulated Product Teams
How to write acceptance criteria using Given-When-Then that are testable, audit-ready, and connected to the regulatory obligation. Patterns, anti-patterns, and examples from financial services.
April 17, 2026