Skip to main content
AI Enablement

Fixing Trade Surveillance: Why 95% False Positive Rates Are a Structural Problem, Not a Tuning Problem

March 21, 2026
Fixing Trade Surveillance: Why 95% False Positive Rates Are a Structural Problem, Not a Tuning Problem

If you run a compliance or surveillance function on the sell side, you know the number. Somewhere between 90% and 95% of the alerts generated by your trade surveillance system are false positives. Your surveillance analysts spend their days closing alerts that should never have been generated, writing "no further action" on case after case, while the genuinely suspicious patterns hide in the noise.

This is not a calibration problem. It is not a threshold tuning problem. It is not a "we need better rules" problem. It is a structural problem with the way trade surveillance systems are designed, and the fix requires structural redesign, not incremental improvement.

This post explains why the false positive rate is structural, describes the AI-enabled redesign that addresses it, and covers the regulatory framework (MAR, MiFID II, FINRA) that both constrains and enables the approach. It builds on our work across capital markets AI enablement engagements and extends the principles described in the transaction monitoring and SAR filing post to the trade surveillance domain.

Why the false positive rate is structural

Trade surveillance systems operate on deterministic rules. A typical system has 50-200 rules, each designed to detect a specific market abuse pattern: spoofing, layering, insider dealing, wash trading, and market manipulation. Each rule has parameters (cancellation timing, order size thresholds, repetition counts) that define the detection boundary.

The parameters must be set conservatively. ESMA's MAR guidelines and the FCA's market abuse guidance both require firms to have effective arrangements to detect, prevent, and report market abuse. "Effective" is interpreted as "do not miss anything." The consequence of a false negative (missing actual market abuse) is regulatory sanction and criminal prosecution. The consequence of a false positive is analyst time.

Given this asymmetry, every compliance function sets thresholds conservatively. The result is mathematically predictable: a system designed to catch every potentially suspicious pattern will flag an enormous number of benign patterns that happen to match the rule's parameters.

Tuning the thresholds does not fix this. Raising the cancellation time threshold reduces false positives but increases false negative risk. The compliance function will not accept that trade-off, so the threshold stays low.

Adding more rules does not fix this. Each new rule generates its own false positives. The total alert volume increases without reducing the false positive rate of existing rules.

Better data does not fix this by itself. Enriching the alert with context helps analysts close false positives faster, but does not reduce the number of false positives generated.

The structural problem is that deterministic rules cannot distinguish between a pattern that is suspicious and one that merely looks suspicious. The same cancellation pattern that indicates spoofing also occurs in legitimate market-making, algorithmic execution, and hedging. Rules detect patterns, not intent. In a market where algorithmic trading accounts for 60-80% of volume, the number of benign patterns matching surveillance rules is enormous.

The structural redesign

The AI-enabled approach to trade surveillance is structurally different from rule-based surveillance. It does not replace rules with models; it uses models to process the alerts that rules generate, and it redesigns the surveillance workflow so that human analysts focus on the cases that actually require human judgment.

The redesign has three components:

Component 1: AI-powered alert triage

Every alert generated by the rules engine passes through an ML model that scores it from "almost certainly benign" to "almost certainly suspicious." The model is trained on the firm's historical alert data: thousands of disposed alerts from the past 3-5 years, where rule parameters are the features and analyst dispositions are the labels.

The model learns patterns specific to the firm, its asset classes, its clients' strategies, and its primary venues. An off-the-shelf surveillance model cannot learn these patterns because it lacks access to the firm's disposition history.

The scored alerts are routed into three tiers:

Tier 1: Auto-close with structured rationale (60-70% of alerts). Alerts that the model scores as overwhelmingly likely to be benign, and where the structured rationale (generated by the model) would satisfy a regulator's review. The rationale includes: the alert parameters, the trader's historical behaviour for this pattern, the market context at the time, and the model's scoring explanation. These alerts are closed without human review but with full audit trail.

Tier 2: Streamlined review (20-25% of alerts). Alerts that the model scores as likely benign but with enough ambiguity that human review is warranted. The analyst receives the alert with the model's pre-populated analysis, reviews the structured evidence, and makes the disposition decision in a streamlined interface. The review takes 2-5 minutes rather than the 15-30 minutes typical of a cold review.

Tier 3: Full investigation (5-10% of alerts). Alerts that the model scores as genuinely suspicious. The analyst receives the alert with the model's full investigation pack: the alert parameters, the trader's behavioural profile, the market context, related alerts across instruments and time periods, and the model's explanation of why this alert is different from the benign patterns. The analyst conducts a thorough investigation and makes the SAR/STR filing decision if warranted.

Component 2: The data flywheel

The critical structural feature is the feedback loop. Every analyst disposition (Tier 2 and Tier 3) feeds back to the model as new training signal. The model is not static; it learns continuously from the firm's own surveillance decisions.

This creates the data flywheel: better triage produces more accurate tier assignments, which produces higher-quality disposition decisions, which produces better training signal, which improves triage accuracy. The flywheel spins faster as the volume of disposed alerts grows.

The flywheel also captures something that rules cannot: the analyst's reasoning. When an analyst closes a Tier 2 alert, the structured disposition form captures not just the decision but the reasoning (why this pattern is benign for this trader in this context). That reasoning, captured in structured fields rather than free-text notes, is the training signal that teaches the model to distinguish intent from pattern.

Over 12-24 months, the Tier 1 auto-close percentage increases (from 60% to 75-80%) as the model learns more patterns that are reliably benign. The false positive rate for Tier 3 alerts decreases (from 40-50% to 15-25%) as the model becomes better at identifying genuinely suspicious patterns. The analyst team's time shifts from routine closures to complex investigations, which improves both the quality of surveillance and the job satisfaction of the analysts.

Component 3: Behavioural profiling

The third structural component goes beyond alert-level analysis to trader-level behavioural profiling. The system builds a behavioural profile for every trader and every client account, capturing their normal trading patterns across instruments, venues, time periods, and market conditions.

Alerts are assessed not just against absolute thresholds but against the trader's individual baseline. A cancellation rate of 80% might be suspicious for a client who normally has a cancellation rate of 20%, but entirely normal for an algorithmic market-maker whose strategy inherently involves high cancellation rates. Rule-based systems cannot make this distinction because they apply the same threshold to every trader. The behavioural profiling model makes it automatically.

This approach aligns with FINRA's surveillance guidance, which emphasises the importance of understanding customer trading behaviour in context rather than applying uniform thresholds.

The regulatory framework

MAR (Market Abuse Regulation)

Article 16 of MAR requires firms that operate a trading venue or execute transactions to have effective arrangements to detect and report suspicious orders and transactions. ESMA's technical standards specify that these arrangements must include automated surveillance systems.

MAR does not prescribe the technology. It requires effectiveness. An AI-enabled system that produces a 20% false positive rate and catches 99% of genuinely suspicious patterns is more effective than a rule-based system that produces a 95% false positive rate and catches 97% of suspicious patterns. The regulatory argument for AI-enabled surveillance is that it is more effective, not that it is cheaper.

The compliance challenge is demonstrating that the auto-closed alerts (Tier 1) are genuinely benign. The structured rationale generated by the model must be sufficiently detailed that a regulator reviewing the auto-closure can understand why the alert was closed and agree that the closure was appropriate. This is an explainability requirement, not a transparency requirement: the regulator does not need to understand the model's internal weights; they need to understand the reasoning for each decision.

MiFID II organisational requirements

MiFID II Article 16 and the ESMA guidelines on MiFID II organisational requirements require firms to maintain adequate and effective surveillance arrangements. The SYSC (Systems and Controls) requirements in the FCA Handbook add specificity: the surveillance function must be adequately resourced, the compliance function must have adequate authority, and the governance framework must ensure that surveillance findings are escalated appropriately.

The AI-enabled approach strengthens the MiFID II posture because: surveillance analysts spend more time on genuinely suspicious cases (better resource allocation), the model's behavioural profiling produces richer context for each investigation (better effectiveness), and the structured disposition data provides a continuous quality metric for the surveillance function (better governance).

FINRA (US)

FINRA's regulatory oversight reports have consistently highlighted surveillance effectiveness as a priority. FINRA Rule 3110 requires broker-dealers to establish supervisory systems that are reasonably designed to achieve compliance with securities laws. FINRA has signalled openness to AI-enhanced surveillance, provided the firm can demonstrate that the AI system is properly validated, monitored, and governed.

For firms operating across jurisdictions (a common pattern for sell-side firms with US, UK, and EU presence), the AI-enabled approach has the advantage of a single surveillance model that can be adapted to jurisdiction-specific requirements through configuration rather than redesign.

The governance requirements

The three lines of defence framework applies to AI-enabled trade surveillance with specific structural requirements:

First line: the surveillance team. Analysts operate as system supervisors: reviewing Tier 2 and Tier 3 alerts, making disposition decisions, capturing structured feedback, and monitoring performance dashboards daily. The decision rights matrix defines which alerts can be auto-closed, which require analyst review, and which must be escalated.

Second line: the compliance function. Compliance validates triage accuracy through periodic sample reviews of auto-closed alerts, sets auto-close thresholds, approves model updates, and owns the regulatory dialogue.

Third line: audit. Internal audit assesses whether the governance framework operates as designed: sample reviews happening, overrides captured, model performance monitored, regulatory reports accurate.

The AI Enablement Maturity Diagnostic includes a surveillance-specific assessment for capital markets firms that scores the current state across model effectiveness, workflow integration, data layer maturity, governance readiness, and analyst capability.

The implementation sequence

The practical path to AI-enabled trade surveillance follows a defined sequence:

Phase 1 (months 1-3): Historical analysis. Extract 3-5 years of disposed alert data. Clean and structure the data (disposition decisions, alert parameters, market context). Train the initial triage model. Validate model performance against a holdout sample of historical alerts.

Phase 2 (months 3-6): Shadow mode. Deploy the model in shadow mode alongside the existing surveillance process. The model scores every alert, but analysts continue to review all alerts as before. Compare the model's triage against the analyst's actual dispositions. Measure the accuracy of the model's Tier 1 (auto-close) recommendations.

Phase 3 (months 6-9): Graduated deployment. Begin auto-closing Tier 1 alerts for the highest-confidence subset (the alerts where the model's confidence is above 95%). Analysts continue to review Tier 2 and Tier 3 alerts. Compliance conducts sample reviews of auto-closed alerts. Measure the time savings and the quality metrics.

Phase 4 (months 9-18): Full deployment. Expand the auto-close population as confidence builds. Activate the data flywheel (analyst dispositions feed back to the model). Implement behavioural profiling. Measure the quarter-over-quarter improvement in false positive rate and investigation quality.

For the detailed mechanics, Module 5 of the AI Governance and Model Risk course covers the surveillance-specific governance framework, and the AI Enablement for Capital Markets service page describes the full engagement approach.

Ready to do the structural work?

Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.

Explore the AI Enablement service

Ready to do the structural work?

Our AI Enablement engagements are built around the five pillars in this article. We start with a focused diagnostic, then redesign one priority workflow end-to-end as proof — including the data layer, decision rights, and governance machinery.

Explore the AI Enablement service
Monthly newsletter

More like this — once a month

Get the next long-form essay on AI enablement, embedded governance, and operating-model design straight to your inbox. One considered piece per month, written for senior practitioners in regulated industries.

No spam. Unsubscribe anytime. Read by senior practitioners across FS, healthcare, energy, and the public sector.