sanctions

Sanctions Screening Accuracy Benchmark: 2024 Statistics, Trends, and Analysis

Last updated:
97.2%
Sanctions Screening Accuracy Benchmark (2024)

Sweden's banking regulator tested 19 banks against 5,000 names from UN and EU sanctions lists in 2024. On correctly spelled entries, average match accuracy reached 97.2%, but no institution caught every name. Industry data puts false positive rates at 90–95%. Only 16% of countries assessed by FATF demonstrate substantial effectiveness in targeted financial sanctions implementation.

Methodology

These figures come from three primary sources, supplemented by OFAC's official enforcement register and LexisNexis Risk Solutions' compliance cost research.

The central benchmark is Finansinspektionen's FI Supervision 30 report, published December 2024. Sweden's financial regulator gave 19 Swedish banks a test list of 5,000 names drawn from UN and EU sanctions lists, then measured how well each bank's automated screening systems detected matches. The report covers customer screening and transaction screening separately and provides an average match accuracy figure across all participating institutions. The test was conducted in a controlled environment, not against live production traffic. FI also compared results against a parallel dataset from 75 banks and financial institutions in other countries that had conducted equivalent tests in April 2024 through a shared technical supplier.

The peer-reviewed complement is Kim and Yang (2024), published in Frontiers in Artificial Intelligence, which tested both traditional fuzzy matching and an NLP prototype against a controlled dataset of sanctioned and non-sanctioned entities.

Country-level effectiveness data comes from FATF's 2024 report on complex proliferation financing and sanctions evasion schemes. OFAC enforcement figures are taken directly from OFAC's 2024 civil penalties register, with context from Morrison Foerster's April 2025 enforcement review. Compliance cost data is from LexisNexis Risk Solutions' 2024 True Cost of Financial Crime Compliance studies (US/Canada and EMEA editions).

One caveat applies to the Finansinspektionen figure: 97.2% accuracy covers correctly spelled names only. Accuracy for name variants, transliterations, and aliases is materially lower and was not disclosed at the institution level. Treat 97.2% as a ceiling for baseline system performance, not an operational average.


Full data table

Metric Value Year Source
Avg match accuracy, correctly spelled names (19 Swedish banks) 97.2% 2024 Finansinspektionen FI Supervision 30
Banks achieving 100% match rate in FI test 0 of 19 2024 Finansinspektionen FI Supervision 30
NLP model sensitivity for sanctioned entity detection 70.96% 2024 Kim & Yang, Frontiers in Artificial Intelligence
NLP model overall accuracy (true positives + false positives combined) 47.80% 2024 Kim & Yang, Frontiers in Artificial Intelligence
Industry-average false positive rate, sanctions alerts 90–95% (est.) 2025 Alessa Sanctions Screening Trends Survey
ML alert discount rate achieved in KPMG client deployment 99.27% 2020 KPMG Sanctions Screening Optimization
Countries with high/substantial TFS effectiveness (FATF IO-11) 16% 2024 FATF
OFAC civil monetary penalties, total $48.8M 2024 OFAC / Morrison Foerster
OFAC civil monetary penalties, total $1.5B 2023 OFAC / Morrison Foerster
Orgs in EMEA reporting rising screening alert volumes 78% 2024 LexisNexis Risk Solutions

Sources: Finansinspektionen FI Supervision 30 (December 2024); Kim and Yang, Frontiers in Artificial Intelligence (November 2024); Alessa Sanctions Screening Trends Survey 2026; FATF Complex Proliferation Financing and Sanctions Evasion Schemes (2024); OFAC 2024 Civil Penalties Register; Morrison Foerster, April 2025; LexisNexis Risk Solutions True Cost of Financial Crime Compliance EMEA (2024); KPMG Sanctions Screening Optimization (2020).


Key findings

  • No institution passed 100%. Sweden's financial regulator tested 19 banks against 5,000 sanctioned names in 2024. On correctly spelled entries, the average accuracy was 97.2%. For aliases, transliterations, and spelling variants, accuracy dropped further. FI's conclusion was direct: the systems could be more effective, and some banks have room to improve. Larger banks consistently outperformed smaller ones (Finansinspektionen FI Supervision 30, December 2024).

  • NLP improves detection but creates a different problem. A 2024 peer-reviewed study tested NLP models for sanctions name matching. The model caught 70.96% of sanctioned entities, but generated 13,336 false positive alerts from a dataset of 450 innocent entities. Overall accuracy landed at 47.80%. Detection improved; precision collapsed. That trade-off is unresolved in current NLP approaches (Kim and Yang, Frontiers in Artificial Intelligence, November 2024).

  • False positive rates run at 90–95%. Across institutions globally, industry benchmarks from Alessa and KPMG consistently put false positive rates in this range. For every 100 sanctions alerts generated, fewer than 10 identify an actual match. KPMG's machine-learning prototype flagged 99.27% of a client's alert queue as discountable, giving a concrete sense of the scale (KPMG Sanctions Screening Optimization, 2020).

  • Only 16% of countries implement sanctions effectively. FATF's 2024 proliferation financing report found that just 16% of assessed countries achieved high or substantial effectiveness on Immediate Outcome 11, the criterion for targeted financial sanctions implementation. That systemic gap puts pressure on banks with cross-border operations: correspondent relationships in lower-effectiveness jurisdictions carry materially higher exposure (FATF, 2024).

  • OFAC penalties dropped in 2024, but enforcement philosophy hardened. OFAC assessed $48.8 million across 12 actions in 2024, down from $1.5 billion in 2023. This doesn't reflect reduced risk. OFAC characterized multiple 2024 violations as willful and egregious, and zero of the twelve cases involved voluntary self-disclosure. The signal is consistent: detection failures attract the most severe outcomes (OFAC, 2024; Morrison Foerster, April 2025).


Year-over-year trends

OFAC's penalty trajectory shows why single-year figures mislead. Totals went from approximately $42.7 million in 2022 to $1.5 billion in 2023 (driven primarily by a handful of large multi-year investigation settlements), then back to $48.8 million in 2024 across 12 actions. That volatility reflects how investigations conclude, not how frequently violations occur. The stable signal is enforcement posture: OFAC's 2024 descriptions of violations as egregious and its zero self-disclosure rate across all 12 cases are consistent with prior years.

Alert volumes are rising. LexisNexis found that 78% of EMEA institutions reported increasing screening alert volumes in 2024. Total alert loads are growing even without changes in the false positive rate. That matters because if the 90–95% false positive rate is constant, a growing alert base means more analyst time consumed by noise year over year.

The FI Supervision 30 benchmark is new. Finansinspektionen had not previously published a system-level accuracy test across Swedish banks, so 2024 is the baseline. The 97.2% figure will become meaningful for trend analysis when FI repeats the exercise.

The NLP research trajectory points toward a split in performance metrics. Traditional fuzzy matching misses name variants but generates fewer false positives. NLP-based approaches catch more variants but generate far more false alerts. The Kim and Yang 2024 study puts NLP sensitivity at 70.96% and overall accuracy at 47.80%, compared against traditional approaches where those figures trade differently. As institutions adopt machine learning and NLP tools, benchmarks will need to track sensitivity and precision separately rather than a single accuracy number.

The FATF IO-11 statistic of 16% effective implementation has remained low across multiple assessment cycles. FATF's 2024 report found no improvement in the aggregate figure, and the underlying structural issues (porous borders, governance gaps, incomplete list implementation) are not short-term corrections.


What this means for compliance teams

The 97.2% figure from Finansinspektionen looks like a pass. It isn't. It covers correctly spelled entries only, and no institution in the test caught all 5,000 names. In live operations the gap widens further. Name transliteration alone accounts for a significant share of missed matches. That's the same category of failure that featured in cases like the Standard Chartered 2019 sanctions violations and the BNP Paribas 2014 penalties: transactions that cleared nominal screening parameters but involved sanctioned parties under variant name entries.

Teams running sanctions screening should treat the 90–95% false positive rate as a capacity planning problem, not just a tuning problem. At a bank processing thousands of screening alerts per day, that rate means analysts spend the overwhelming majority of their time on noise. That's a direct drain on the review capacity available for genuine risks. It also raises concentration risk: exhausted teams reviewing high-volume false alert queues are more likely to miss the real match when it appears.

The fix isn't simply tightening match thresholds. Tighter matching reduces false positives but also reduces true positive detection. Looser matching catches more real matches but buries them in volume. The Finansinspektionen results make this concrete: larger banks with more resources configured their systems to catch more, accepting a higher alert volume. Smaller banks tuned for efficiency and missed more actual matches.

Where transaction monitoring and sanctions screening run as separate systems, alert deduplication matters. An entity triggering both systems needs coordinated review. Without it, you get duplicated effort or coverage gaps, both of which create compliance exposure.

Regulatory compliance automation addresses the volume problem at both ends: better entity resolution cuts false positives at intake, and workflow tooling reduces analyst time per alert that does require review. OFAC's 2024 enforcement posture adds a timing argument. None of the twelve 2024 actions involved self-disclosure. Institutions that identify issues and report proactively consistently receive materially better outcomes. That requires detection capability, not just screening coverage.

The FATF country effectiveness gap is a practical issue, not just a geopolitical one. Compliance teams at internationally active banks can't rely solely on their own screening accuracy when 84% of assessed countries fall below the high or substantial effectiveness threshold. Correspondent relationships, trade finance flows, and cross-border payments all carry exposure that internal screening alone can't fully address.


Sources

Turn these numbers into fewer of your own

FluxForce AI agents cut false positives, clear SAR backlogs, and keep audit-ready evidence, so the next statistics report cites the industry, not you.

← Back to Statistics