The Action Data Layer: Why Your Reporting Data Cannot Power AI Decisions
There is a structural distinction that determines whether enterprise AI works or does not, and most organisations have not yet reckoned with it. The distinction is between reporting data and action data, and almost every data investment in the last 20 years has been optimised for the wrong one.
Reporting data is captured to answer questions about what happened. It is aggregated, cleaned, transformed, and stored in formats optimised for human analysts and regulatory reports. It tolerates latency (last night's batch is fine). It tolerates missing fields (a null value in a quarterly report is a gap, not a crisis). It tolerates inconsistent definitions across business units (as long as each unit's report is internally consistent).
Action data is captured to drive what happens next inside a workflow. It is structured to be consumed by other systems in real time, with no human intervention. It does not tolerate latency, because a delayed decision is a wrong decision. It does not tolerate missing fields, because a system that encounters a null must either halt or guess, and both are unacceptable in a regulated environment. It does not tolerate inconsistent definitions, because two systems that disagree about what a field means will produce conflicting decisions.
Most enterprise data infrastructure is built for reporting. AI needs the action variant. This mismatch is the single most common reason AI initiatives plateau after 12 months of promising pilot results.
For the foundational argument on why the data layer is the binding constraint, see the earlier essay. This post goes deeper into the architecture: what an action-data layer actually looks like, how it differs from a data warehouse, and how to build one.
The architecture in practice
An action-data layer has five structural characteristics that distinguish it from a traditional data warehouse or data lake:
1. Captured at the point of action, not at end-of-day
In a reporting architecture, data is extracted from source systems in nightly batch jobs, transformed, and loaded into the warehouse. In an action-data architecture, data is captured as events at the point they happen: when a trade is executed, when a customer submits a document, when a claims handler resolves a case, when a sensor reading arrives from a grid asset.
The event capture must include the full context: what happened, who did it (or which system did it), when it happened, and what state the world was in when it happened. This event stream is the foundation that everything else builds on.
The technology layer varies by context: Apache Kafka for high-volume financial event streams, Apache Flink for real-time processing, cloud-native event buses (AWS EventBridge, GCP Pub/Sub) for lower-volume use cases. The choice depends on your volume, latency requirements, and existing infrastructure.
2. Standardised at capture time, not at report time
In a reporting architecture, data is standardised (mapped to a common schema) during the ETL process, often days after capture. In an action-data architecture, data is standardised at the point of capture. This means the schema is enforced upstream, at the source system, not downstream in the warehouse.
This is the hardest part of the rebuild because it requires source system owners to agree on field definitions, data types, and business rules before data enters the pipeline. In a typical large financial institution, this means negotiating with 5-15 vendor platforms, each with its own data model. In asset management, BlackRock Aladdin, SimCorp, Charles River, and the custodian platforms all have different position schemas. The action-data layer must normalise these into a single position-and-fund wide-row that downstream systems can consume consistently.
3. Structured around the workflow, not the report
A data warehouse is structured around the dimensions of the regulatory report: entity, product, counterparty, time period. An action-data layer is structured around the workflow it serves: a customer wide-row for KYC, a policy-and-claim wide-row for claims operations, a position-and-fund wide-row for NAV production, a trade-and-execution event stream for capital markets.
The wide-row pattern means that every piece of information the AI system needs to make a decision about an entity is available in a single, denormalised record. No joins at query time. No lookups across systems. The wide-row is pre-computed and maintained continuously.
4. Observable lineage from source to model
In a reporting architecture, lineage is documented (sometimes). In an action-data architecture, lineage is observable: for any individual AI decision, you can trace the data from the source system through every transformation to the model input. This is not optional in regulated industries. Under PRA SS1/23, EU AI Act, and equivalent frameworks, the regulator expects to be able to reconstruct any individual decision end-to-end.
The tooling layer for observable lineage has matured significantly: OpenLineage is the de facto open standard, with integrations in dbt, Airflow, Spark, and most major orchestration tools. DataHub and OpenMetadata provide the metadata catalog layer.
5. Quality monitored continuously, not on a schedule
In a reporting architecture, data quality is assessed periodically (monthly or quarterly data quality reports). In an action-data architecture, data quality is monitored continuously at every stage of the pipeline: completeness checks at ingestion, schema validation at standardisation, anomaly detection on the wide-row, and end-to-end reconciliation against source systems.
Great Expectations, Monte Carlo, and Bigeye provide the data quality monitoring layer. The key is that quality checks run on every event, not on a schedule, and quality failures trigger alerts rather than appearing in a report six weeks later.
What this costs and what it produces
The action-data layer is the most expensive, most unglamorous, and most consequential component of the AI enablement work. It typically takes 6-12 months to build for one priority value stream and consumes 40-60% of the total enablement programme budget.
The reason to do it despite the cost is that the action-data layer is reusable. The wide-row built for NAV production feeds regulatory reporting, ESG operations, and distribution analytics. The customer wide-row built for KYC feeds credit decisioning, customer operations, and Consumer Duty monitoring. The case study on asset management NAV production shows this compounding in practice: the data layer built for the first workflow is now feeding three adjacent ones.
This is the compounding mechanism. Each subsequent workflow that the organisation redesigns starts with the data layer partially built. The second project costs less than the first. The third costs less than the second. By the fifth project, the data layer is mature enough that new workflows can be redesigned in weeks rather than months.
How to start
The practical first step is an honest assessment of your current data layer against the five characteristics above. Our AI Enablement Maturity Diagnostic includes a data-layer pillar that scores your current state across these dimensions and identifies where the binding constraint sits.
If the diagnostic reveals data-layer gaps (it usually does), the Data Foundations for AI course covers the architectural patterns, the tooling landscape, and the implementation approach in seven self-paced modules.
For the hands-on build, the data layer is the first deliverable in every AI Enablement transformation programme. It comes before the workflow redesign, before the governance instrumentation, and before the role design, because everything else depends on it.
The firms that build the action-data layer first will compound advantage on every subsequent AI initiative. The firms that do not will find that every new initiative is a standalone project that starts from scratch.
Score this against your own organisation
Take the AI Enablement Maturity Diagnostic — 25 questions across the five pillars (production function, data layer, decision systems, operating model, governance). Per-pillar breakdown and prioritised next steps in 5 minutes.
Take the diagnosticReady to do the structural work?
Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.
Explore the AI Enablement serviceMore like this — once a month
Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.
No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.
Related insights
AI Model Risk Management Under PRA SS1/23: What COOs and CROs Actually Need to Do
A practical guide to managing AI model risk under PRA SS1/23 for banking and insurance leaders. Covers the supervisory expectations, the model lifecycle, and the governance machinery that satisfies the PRA on first reading.
April 10, 2026Building a Data Flywheel in Financial Services: The Compounding Mechanism Most Firms Are Missing
Why most AI initiatives in banking, insurance, and asset management plateau after 12 months, and how building a working data flywheel turns operational data into a structural moat that compounds quarter over quarter.
April 09, 2026Three Lines of Defence for AI: How to Design Governance That Works in Regulated Industries
A practical framework for applying the three-lines-of-defence model to AI deployments in banking, insurance, healthcare, and energy. Covers first-line ownership, second-line challenge, and third-line assurance.
April 08, 2026