AI Governance & Model Risk for Financial ServicesModule 6 of 7

Module 6 of 786% through

Module 6

Model Risk Operations — Monitoring, Drift, Overrides, Incidents

The day-to-day machinery of running a deployed AI system in good standing — monitoring, drift detection, override review, and incident response.

Module 6 — 90-second video overview

The work after deployment

Most AI governance discussions focus on what happens before a model goes live: validation, approval, controls. The harder reality is what happens after: the day-to-day operational machinery of keeping the model in good standing once it is live and making real decisions.

This is where most enterprise governance programmes are weakest. The pre-deployment work is well-defined and well-resourced. The post-deployment work is often ad-hoc, under-resourced, and quietly fails. This module is about how to do the post-deployment work properly.

Monitoring SLOs for models

The starting point is the same one we used for data in the Data Foundations course: service-level objectives. Models, like data, need explicit, measurable commitments to their behaviour, with alerts when those commitments are breached.

For AI/ML models, the SLOs that matter most are:

Accuracy. "The model's accuracy on a holdout sample will not fall below X%, measured weekly."
Drift. "The distribution of input feature Y will not shift by more than Z standard deviations in a 7-day window without triggering an alert."
Override rate. "The override rate on the model's decisions will stay between A% and B% in any rolling 14-day window."
Freshness. "Inputs feeding the model will be no more than X minutes stale, 99% of the time."
Latency. "P95 inference latency will be under X ms."
Coverage. "The model will produce a confident decision on at least Y% of incoming cases; the rest will escalate."
Outcome quality. "Customer complaints related to model decisions will not exceed X per 1,000 decisions."

These SLOs should be defined per model, calibrated to the model's role, and committed in writing. They should be visible to first line, second line, and (for high-risk models) audit. They should drive monitoring, alerts, and incident response.

Drift — the silent killer

Drift is the most common operational issue for deployed models, and the one most enterprises are not actively monitoring for. Two kinds:

Data drift is when the distribution of the model's inputs changes over time, even if the relationship between inputs and outputs is stable. Example: a customer base that gets younger over time, shifting the input feature distribution. The model may still be doing the right thing, but it's now operating outside the input range it was trained on.

Concept drift is when the relationship between the inputs and the correct answer changes. Example: a fraud detection model trained before a new fraud pattern emerged. The inputs look familiar, but the right answer for those inputs has changed.

Both kinds hurt performance. Both require different responses:

Data drift can sometimes be addressed by recalibrating thresholds, expanding the training distribution, or narrowing the model's deployment scope.
Concept drift usually requires retraining on more recent data and may require investigating what changed in the underlying domain.

Detection requires monitoring both the input distributions (for data drift) and the model's predictive accuracy on a labelled sample (for concept drift). The labelled sample is key — without ground truth labels, you cannot detect concept drift directly. For high-stakes models, building the labelling pipeline is part of the deployment, not an afterthought.

Override patterns as an early signal

The override rate is one of the most useful operational signals you have. Sudden changes in override rate are almost always meaningful, even before they show up in accuracy metrics.

Things to watch for:

Rising override rate. Usually means the model is performing worse than it used to. Could be drift, could be a change in the data, could be a change in operator behaviour. Investigate.
Falling override rate. Looks good but isn't necessarily. Could mean operators are getting tired of reviewing and are rubber-stamping. Sample some recent overrides to check whether the human review is still meaningful.
Skewed overrides by category. Operators overriding certain categories of cases more often than others. May indicate the model is biased on those categories, or that operators have additional context the model doesn't.
Operator-level differences. Different operators showing very different override rates on similar cases. May indicate inconsistent training or operator-specific bias.

A monthly review of override patterns by the model risk function should be a standing item for any material model. The review should be substantive: what changed, why, what should we do about it?

Incident response for models

When something goes wrong — a breached SLO, a sudden override spike, an anomalous decision, a customer complaint, an audit finding — the response needs to be operational, not bureaucratic.

A working incident response process:

Detection. SLO breach, anomaly alert, escalation, complaint. Anything that suggests the model is behaving unexpectedly.
Triage. Is this a model issue or an upstream data issue? What's the severity? What's the customer impact?
Containment. If severity is high, can you pause the model or fall back to a manual workflow while you investigate? Material models should always have a fallback.
Investigation. Walk the lineage, examine the inputs, sample affected decisions, check the model's recent behaviour. The decision log is your friend here.
Remediation. Fix the immediate issue (rollback, recalibration, retraining, scope change).
Notification. Inform second line, audit, and (where appropriate) the regulator. Document what was communicated and when.
Post-mortem. Documented review of what happened, why, and what should change to prevent recurrence. Shared with the second line and audit.

The hardest part of model incident response is usually #3 — containment. Many enterprises deploy models without a fallback, which means containment requires shutting down the workflow entirely. Building the fallback into the design adds engineering work upfront but is essential for material deployments.

Working with the second line

Second line should not learn about model incidents from the post-mortem. They should be involved during the incident, in real time. The pattern that works:

Material model incidents page the assigned second-line risk officer alongside the on-call engineer.
The second-line risk officer participates in the triage call.
They help decide on containment and notification.
They sign off on the post-mortem and corrective actions.
They report the incident pattern to the model risk committee.

This is uncomfortable for second-line functions that are not used to operational pace. It is essential for material AI deployments, because the timeline of an incident does not wait for committee meetings.

Regulatory notification

When does an AI incident need to be notified to a regulator? The honest answer is: it depends on the regulation and the materiality. Some triggers to discuss with your compliance and legal teams:

EU AI Act: Serious incidents in high-risk AI systems must be reported to authorities (Article 73).
DORA: Major ICT incidents have explicit reporting timelines and templates.
PRA SS1/23: Material model failures should be discussed with supervisors as part of normal supervisory dialogue.
FCA: Customer harm from automated decisions may trigger reporting under SUP 15 or Consumer Duty.

The right time to define your notification thresholds and triggers is before an incident, not during. If you have to figure out whether to call the regulator while you're in the middle of an incident, you have already failed to plan.

What's next

In Module 7 we'll close the course with the governance roadmap — how to mature your AI governance over 12-24 months, sequencing the work so each phase delivers a defensible posture and the organisation can absorb the change.

Module 5: Embedded Governance — Making It Live Inside the Workflow

Module 7: The Governance Roadmap — 12-24 Months to a Mature Posture

Monthly newsletter

Stay current between modules

Subscribe to the monthly essay for long-form analysis on AI enablement, embedded governance, and operating-model design — written for the same audience this course serves.

No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.