Skip to main content
AI Enablement

AI Enablement Maturity: The Four Levels and Where Most Regulated Firms Get Stuck

March 23, 2026
AI Enablement Maturity: The Four Levels and Where Most Regulated Firms Get Stuck

Every regulated firm we work with believes it is further along its AI journey than it actually is. The CEO reports to the board that the firm has "deployed AI across multiple business lines." The reality, when you look at the workflows, the data layer, and the governance framework, is that the firm has bolted a handful of AI tools onto existing processes and called it transformation.

This gap between perception and reality is why we built the AI Enablement Maturity Diagnostic. The diagnostic scores firms across five pillars (workflow integration, data layer maturity, governance readiness, decision rights clarity, and talent alignment) and assigns an overall maturity level. The four levels are not arbitrary; they describe qualitatively different relationships between AI and the operating model.

This post explains each level in detail, with concrete examples from banking, insurance, and healthcare. More importantly, it explains why most firms get stuck at the transition from Level 2 (Augmentation) to Level 3 (Structural), and what it takes to break through.

Level 1: Surface

Definition. AI is present in the organisation, but it does not participate in any regulated workflow. The use cases are limited to internal productivity tools (email summarisation, meeting transcription, document search) and exploratory pilots that have not reached production.

What it looks like in practice.

In banking: the innovation team has run three pilots (a chatbot for internal IT support, a document classifier for KYC, and a sentiment analyser for customer complaints). None have been validated under PRA SS1/23, none are integrated into the production workflow, and none have a data feedback loop. The board receives a quarterly update on "AI initiatives" that reports on activity, not outcomes.

In insurance: the claims team is experimenting with an image recognition tool for motor claims damage assessment. The tool runs alongside the human adjuster, and the adjuster does not use its output because they do not trust it. The experiment has been running for nine months with no decision on whether to proceed, scale, or stop.

In healthcare: the clinical team has access to a diagnostic support tool that suggests differential diagnoses based on patient symptoms. Clinicians report that they "sometimes look at it" but it does not affect their clinical decisions. The tool has no integration with the electronic health record, no audit trail, and no governance framework.

Why firms get stuck here. Surface-level firms have not made the institutional decision to treat AI as a structural capability. AI is a series of experiments, not a strategic commitment. The experiments are led by enthusiastic individuals, not by a programme with executive sponsorship, a governance framework, and a budget. McKinsey's State of AI report finds that roughly 40% of large enterprises remain at this level.

How to move to Level 2. Pick one use case with clear operational value, validate it under your regulatory framework, integrate it into the production workflow, and demonstrate measurable outcomes. The AI enablement playbook describes the selection criteria.

Level 2: Augmentation

Definition. AI is deployed in production workflows, but as a tool that assists human decision-makers. The human remains the primary decision-maker; the AI provides recommendations, scores, or alerts that the human can accept, modify, or override.

What it looks like in practice.

In banking: the fraud detection system uses an ML model to score transactions and generate alerts. Investigators review every alert, make the disposition decision, and file the SAR if warranted. The model reduces the volume of obviously benign alerts that reach the investigator, but the investigator still reviews every flagged case. The model is validated under PRA SS1/23, monitored for drift, and retrained annually.

In insurance: the underwriting workflow uses an AI model to pre-populate the risk assessment for standard commercial lines. The underwriter reviews the pre-populated assessment, adjusts it based on their judgment, and makes the final pricing decision. The AI saves time on data gathering and initial analysis, but the underwriting decision remains fully human.

In healthcare: the radiology department uses an AI tool to highlight potential anomalies in chest X-rays. The radiologist reviews every image, uses the AI's highlighting as one input among many, and makes the diagnostic decision. The tool is regulated as a Class IIa medical device and has a clinical safety case.

The augmentation trap. Most regulated firms that have deployed AI in production are at Level 2. The problem is not that Level 2 is bad; it delivers real value (typically 15-25% efficiency improvement in the augmented workflows). The problem is that Level 2 does not compound. The AI tool gets marginally better through periodic retraining, but the operating model does not change. The same number of people do the same jobs, slightly faster, with AI assistance.

MIT Sloan's research on AI in organisations describes this as the "augmentation plateau": firms invest in AI, achieve initial gains, and then find that the gains flatten because the underlying operating model has not been redesigned to take advantage of the AI capability.

This is why most AI pilots do not compound. The pilot demonstrates value at Level 2, the firm scales it, and the scaled version delivers the same per-unit improvement across more volume. The improvement is linear, not exponential. There is no structural shift in the production function.

Level 3: Structural

Definition. The operating model has been redesigned around AI as a native capability. Workflows, roles, decision rights, and the data layer are structurally different from the pre-AI state. AI does not assist human decision-makers; it handles routine decisions end-to-end, and humans focus on exceptions, judgment calls, and system supervision.

What it looks like in practice.

In banking: the transaction monitoring system uses AI to handle 80% of alerts end-to-end (auto-close with structured rationale, auto-escalate with pre-populated investigation pack). Investigators focus exclusively on the genuinely ambiguous 20%. Every investigator decision feeds back to the model as structured training signal, creating a data flywheel that improves the model's accuracy quarter over quarter. The first-line roles have been redesigned: the "Level 1 investigator" role no longer exists, replaced by "system supervisor" and "complex investigation specialist."

In insurance: the claims workflow routes straightforward claims (clear liability, standard damage, within policy limits) through an AI-driven straight-through process that settles without human intervention. Complex claims (disputed liability, large loss, potential fraud) are routed to specialist adjusters with an AI-generated investigation pack. The combined ratio improves quarter over quarter because the straight-through process reduces handling costs and the specialist adjusters spend their time on the cases that actually require judgment.

In healthcare: the diagnostic pathway for a specific condition (say, diabetic retinopathy screening) is AI-led. The AI system reads the retinal images, classifies them, and routes normal results directly to the patient's GP with a structured report. Only abnormal or uncertain results are reviewed by an ophthalmologist. The clinical safety case demonstrates that the AI-led pathway produces diagnostic accuracy equivalent to or better than the fully human pathway, and the throughput is an order of magnitude higher.

What makes Level 3 structurally different. The distinguishing feature of Level 3 is that the operating model has changed, not just the technology. The roles are different, the decision rights are different, the data layer supports a feedback loop, and the governance framework accommodates continuous model improvement. This is the target operating model redesign that AI enablement is designed to produce.

Why the transition from Level 2 to Level 3 is the hardest. Three barriers make this transition the most difficult in the maturity model:

1. Workflow redesign requires operating model authority. You cannot redesign workflows from the technology team. Workflow redesign changes roles, decision rights, and accountability structures. It requires COO-level sponsorship and, in most firms, board-level approval for the role changes. The technology team can build Level 2 tools; only the operating model owner can build Level 3 workflows.

2. The data layer must support feedback loops. Level 2 can operate on a traditional data infrastructure (batch-processed, report-oriented). Level 3 requires the action-data layer: a real-time, feedback-enabled data architecture that captures every human decision at every model interaction point and feeds it back to the training pipeline. Building this data layer is the most expensive and unglamorous part of the Level 2 to Level 3 transition.

3. The governance framework must accommodate continuous improvement. At Level 2, models are validated once and monitored. At Level 3, models are continuously retrained on production feedback, and the governance framework must validate each iteration without creating a bottleneck that stops the flywheel. This requires governance that accelerates deployment, not governance that gates it.

The World Economic Forum's AI Readiness framework and Gartner's AI Maturity Model both describe a similar transition barrier, though with different terminology. The structural insight is the same: the hardest step is not deploying AI, but redesigning the operating model to be AI-native.

Level 4: AI-Native

Definition. The operating model is designed from first principles around AI as the primary processing capability. Humans participate as supervisors, exception handlers, and strategic decision-makers. The data flywheel operates continuously. The organisation's competitive advantage is structural, embedded in the operating model rather than in any individual AI model.

What it looks like in practice.

In banking: a digital-native lender where the entire credit decisioning pipeline (application processing, identity verification, credit assessment, pricing, documentation generation, portfolio monitoring) is AI-driven. Human credit officers handle only cases outside the model's decision boundary, and their decisions feed back to expand the boundary over time.

In insurance: an MGA where underwriting, pricing, and claims processes are AI-native. Models accept or decline risks, adjust pricing in real time, and process straightforward claims end-to-end. Human specialists focus on portfolio strategy, complex cases, and model governance.

In healthcare: a digital health provider where triage, diagnostic, and treatment recommendation pathways are AI-led for defined clinical scenarios, with clinicians providing oversight and handling the judgment calls that require empathy and clinical intuition.

Who is at Level 4 today? Very few regulated firms are at Level 4 across their entire operations. Some neobanks, digital insurers, and digital health providers have built AI-native operating models for specific value streams. Most incumbent regulated firms are at Level 2 with aspirations toward Level 3.

The path from Level 3 to Level 4 is less about overcoming barriers and more about sustained execution. Once the operating model, data layer, governance framework, and talent model are aligned at Level 3, the progression to Level 4 is a matter of expanding the scope (more value streams, more decision types, wider decision boundaries) and deepening the flywheel (faster feedback loops, richer training signal, more sophisticated models).

Where to start

If you recognise your organisation in the descriptions above, three practical actions will help clarify where you stand and what to do next:

1. Take the AI Enablement Maturity Diagnostic. The diagnostic produces a per-pillar score and an overall maturity level assessment. It also identifies the binding constraint: the pillar that is holding you back from the next level.

2. Focus on the binding constraint, not the overall score. If your diagnostic shows strong technology but weak data layer, the data layer is the binding constraint. If it shows strong data but weak governance, governance is the binding constraint. The cheapest way to advance is to fix the weakest pillar, not to invest more in the strongest.

3. Read the sector-specific guidance. The AI Enablement for Banking and AI Enablement for Healthcare service pages describe the maturity progression with sector-specific examples and regulatory considerations. The AI enablement service overview describes the five-pillar framework that supports progression through the levels.

For leaders who want the detailed mechanics, the production function shift essay explains the economic logic behind the maturity progression, and the data flywheel essay explains the compounding mechanism that makes Level 3 and Level 4 structurally different from Levels 1 and 2.

Score this against your own organisation

Take the AI Enablement Maturity Diagnostic — 25 questions across the five pillars (production function, data layer, decision systems, operating model, governance). Per-pillar breakdown and prioritised next steps in 5 minutes.

Take the diagnostic

Ready to do the structural work?

Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.

Explore the AI Enablement service
Monthly newsletter

More like this — once a month

Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.

No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.