AML Published: Updated: By

What percentage of AML alerts are false positives?

Quick answer

Between 90% and 95% of AML alerts at most banks are false positives. Some institutions running older, rules-based monitoring systems report rates as high as 98%. The rate depends on threshold calibration and system age, but 95% is the widely cited industry benchmark. ---

What percentage of AML alerts are false positives?

Between 90% and 95% of AML transaction monitoring alerts at most banks are false positives. The number has been consistently reported at this level across industry surveys for over a decade. Some institutions, particularly those running legacy rules-based systems that haven't been recalibrated in years, report rates as high as 98%.

This isn't a niche problem. It affects every financial institution running a transaction monitoring program, with direct costs in analyst time, SAR filing decisions, and regulatory exposure. FinCEN's SAR Statistics show that US institutions filed over 3.6 million SARs in 2022. If 95% of the upstream alerts were false positives, that's an enormous compliance operation producing limited useful output for law enforcement.

The LexisNexis Risk Solutions True Cost of Financial Crime study estimates total AML compliance costs in the US at over $30 billion annually, with alert review consuming a disproportionate share of that budget.

Why is the false positive rate so high in AML transaction monitoring?

Rules-based transaction monitoring was designed for a different era. Thresholds were set to flag anything that might be suspicious, erring toward over-reporting because the cost of missing a real case was considered higher than the cost of reviewing too many alerts. Regulators reinforced this through examination guidance that treated high SAR volumes as evidence of a thorough program.

The result is monitoring systems that flag legitimate business activity constantly. A $10,001 cash deposit. A wire transfer to a country on an internal risk list. Three transactions in a week that sum to just under the reporting threshold. These patterns are also how ordinary businesses operate in certain industries.

FATF Recommendation 1 on the risk-based approach was meant to shift this. Banks should allocate monitoring intensity based on actual customer risk, not blanket rules. In practice, many institutions layer risk-based programs on top of existing rules-based systems rather than replacing them. That doesn't reduce false positives and sometimes increases them.

The calibration problem is partly organizational. Adjusting thresholds requires sign-off from compliance, risk, and sometimes legal. Raising a threshold that results in a missed SAR later creates liability. So thresholds stay conservative, and false positive rates stay high.

Poor Customer Due Diligence (CDD) data compounds this. If the monitoring system doesn't know a customer is a licensed cash-intensive business, it will alert on normal operating behavior indefinitely.

Why this matters for compliance teams

High false positive rates have four direct consequences.

Analyst capacity is consumed by low-value work. A team of 40 analysts reviewing 1,000 alerts per day spends most of its time confirming that transactions are legitimate. Real suspicious activity gets less attention, not more. We've seen banks where the SAR backlog hit 6,000 cases because the review queue was buried under routine alerts.

SAR quality suffers. When analysts are under volume pressure, SARs get filed quickly and without thorough investigation. Law enforcement receives reports that are technically compliant but operationally useless. The FATF effectiveness evaluation framework explicitly assesses whether SARs are being used productively by authorities. Low-quality filings drag down a jurisdiction's effectiveness score.

Regulatory examination risk increases. Examiners reviewing your transaction monitoring program don't just check whether you filed SARs. They look at whether your tuning methodology is documented, whether you've done lookback reviews, and whether your false positive rate reflects defensible calibration. An undocumented 97% false positive rate is a finding. See what triggers a regulatory exam for context on how examiners frame these reviews.

AML compliance costs stay elevated without producing better outcomes. Every alert a human analyst reviews has a cost. At scale, a 95% false positive rate means 95 cents of every alert-review dollar is wasted. That's the number that gets compliance budgets cut and headcount frozen, which then increases the risk of real misses.

What can reduce AML false positive rates?

Three approaches have demonstrated results in practice.

Threshold recalibration. Most institutions haven't reviewed their monitoring rules in years. A structured tuning exercise, comparing alert populations against SAR outcomes and adjusting thresholds based on actual risk, can reduce false positive rates by 20-30 percentage points without increasing regulatory risk. The key is documentation: any threshold change needs to be recorded, justified, and defensible to an examiner.

Behavioral analytics. Risk scoring that builds individual customer baselines and alerts on deviations from expected behavior generates fewer false positives than rules applied uniformly across a customer population. A business that regularly receives six-figure wire transfers from the same counterparties shouldn't generate alerts on every transaction. AI-based transaction monitoring enables this kind of segmentation at scale, and FATF Recommendation 15 on new technologies explicitly supports its use.

Better CDD and EDD data. A large share of false positives come from insufficient customer context at onboarding and refresh. Enhanced Due Diligence for higher-risk relationships, combined with regular refresh cycles, gives the monitoring system the context it needs to distinguish normal activity from suspicious patterns.

None of these is fast. Behavioral analytics implementations typically take 12-18 months before false positive rates stabilize at a new baseline. Threshold recalibration requires examiner-quality documentation before any threshold is touched. Running a 95% false positive rate indefinitely isn't cost-neutral, but neither is a poorly documented tuning program that draws a finding on its own.

Related questions

Related concepts and regulations


← All compliance questions