AML

True Positive: Definition and Use in Compliance

Published: Last updated:

True Positive is a detection outcome in AML monitoring where a system-generated alert is confirmed, after analyst investigation, to reflect genuine suspicious or criminal financial activity. It is the opposite of a false positive.

**

What is True Positive?

A true positive in AML is a confirmed hit: a transaction monitoring system generates an alert, an analyst reviews the evidence, and the activity turns out to reflect genuine suspicious or criminal financial behavior. The alert was right.

This sits inside the standard binary classification framework. Any detection system produces four possible outcomes. True positives are correctly flagged suspicious cases. False positives are alerts on activity that turns out to be clean. False negatives are criminal transactions the system missed entirely. True negatives are clean transactions correctly left unflagged.

Two ratios define the health of any detection program. Recall, also called true positive rate or sensitivity, measures what fraction of actual financial crime the system catches. If a system flagged 700 of 1,000 genuinely suspicious transactions in a given month, recall is 70%. The remaining 300 are false negatives: missed crimes.

Precision is the second ratio. It measures what fraction of generated alerts are genuine. Most bank transaction monitoring systems run at 1% to 10% precision. For every 100 alerts analysts review, only 1 to 10 reflect real financial crime. The rest are false positives consuming analyst time with nothing to show for it.

The business consequence is direct. A bank generating 10,000 monthly alerts at 3% precision has 9,700 false positives hitting analyst queues. The Wolfsberg Group's 2019 Statement on Effectiveness for AML Programmes identified this precision gap as one of the primary drivers of AML operational cost. Institutions focused on SAR filing volume rather than confirmed true positives are, as the statement put it, measuring the wrong outcome.

The practical goal for any AML detection program is to increase both the absolute count of true positives and the precision rate, without widening the false negative gap that allows criminal activity to pass through undetected.

How is True Positive used in practice?

When an alert fires in a transaction monitoring system, it enters an analyst review queue. The analyst opens the case, reviews transaction details, account history, linked entity data, and behavioral flags. The decision that follows is alert disposition: close as a false positive, escalate for investigation, or confirm as a true positive.

Confirmed true positives trigger a specific workflow. The analyst documents the suspicious indicators and reasoning, then routes the case to the compliance officer or MLRO. If the activity meets the legal reporting threshold, the institution files a Suspicious Activity Report (SAR) with FinCEN in the US, a Suspicious Transaction Report (STR) in the UK, or the equivalent to the relevant Financial Intelligence Unit (FIU).

Consider a concrete example. A smurfing pattern: a customer makes seven cash deposits over three weeks, each just under the $10,000 Currency Transaction Report (CTR) threshold. The monitoring rule fires. The analyst reviews the deposits against the customer's declared business purpose in the Customer Due Diligence (CDD) file and finds no legitimate explanation. That's a confirmed true positive: structuring in violation of 31 U.S.C. § 5324. A SAR gets filed.

At the portfolio level, true positive tracking drives tuning decisions. Compliance teams measure true positive rates by detection rule, customer risk tier, and business segment. A rule producing 1% precision for six consecutive quarters is a candidate for restructuring. A rule running at 15% precision across 500 alerts is worth tightening further to capture more in the same risk category.

Case management systems that record analyst decisions create a feedback loop. When analysts mark alerts as false positives, that data can feed back into model refinement. Over time, the system gets better at distinguishing patterns that lead to confirmed true positives from patterns that don't.

True Positive in regulatory context

Regulators care deeply about true positive rates, though they rarely use that term directly. What examiners ask about is "effectiveness": whether a detection program actually catches financial crime, produces quality intelligence, and results in useful reporting to the authorities.

Under FATF Recommendation 20, jurisdictions and institutions must have mechanisms in place to detect and report suspicious transactions. FATF's Methodology for Assessing Technical Compliance, first published in 2013 and updated since, evaluates whether suspicious transaction reporting produces actionable intelligence, which maps directly to whether reports are filed on true positives rather than false ones. FATF mutual evaluation reports routinely cite low SAR quality and high false positive rates as indicators of a weak detection environment.

FinCEN tracks SAR quality as well as volume. Its published SAR Activity Reviews note that law enforcement agencies rely on SAR narratives to open and develop investigations. A SAR filed on a false positive adds noise to the financial intelligence picture. A missed true positive is criminal activity going unreported. Both draw regulatory attention.

From a model risk angle, the Federal Reserve's SR 11-7 guidance requires institutions to test models against known outcomes and track performance over time. Applied to transaction monitoring, that means measuring true positive and false positive rates against labeled historical data and documenting those metrics for examiners. The FFIEC BSA/AML Examination Manual extends these requirements specifically to transaction monitoring systems, covering thresholds, tuning processes, and validation documentation.

The risk-based approach ties these metrics directly to regulatory expectation. A private bank serving politically exposed persons (PEPs) that demonstrates a near-zero true positive rate will face hard questions from examiners. Combined with Enhanced Due Diligence (EDD) reviews, the true positive metric tells examiners whether controls are genuinely effective or just technically present.

Common challenges and how to address them

The core challenge is recognizing true positives efficiently. Financial crime is designed to blend in. Layering through multiple entities, the use of money mule accounts, and structuring behavior all produce transaction flows that individually appear normal. The alert fires because a rule or model detects a pattern across multiple data points, not because any single transaction is obviously wrong.

This creates two opposing pressures. Tighten detection thresholds and precision drops: more alerts fire on legitimate activity, true positive rate falls as a percentage of total alerts, and analyst workload grows without a proportional increase in confirmed cases. Loosen thresholds and false negatives increase: criminal activity slips through, exposing the institution to regulatory and reputational risk.

Threshold tuning is one part of the solution. Revisiting detection parameters quarterly, and after typology changes or regulatory updates, keeps the system calibrated. A rule generating 98% false positives in 2022 can be restructured using updated behavioral baselines and more recent typology data. The recalibration adds governance overhead, but the accuracy gain is worth it.

Behavioral analytics and peer group analysis improve precision by adding context. Instead of flagging every customer who deposits $9,500, a system that compares that deposit against the customer's historical behavior and against peers in the same business category produces alerts with richer context. Analysts confirm or dismiss true positives faster, and the confirmation is better documented for SAR filing.

Model monitoring tracks true positive rates over time and flags when performance drifts. A model performing at 8% precision at launch may drop to 3% two years later without any code changes, because the typologies it was trained on have shifted. Ongoing measurement catches that drift before an examiner does.

Banks that replaced rule-only detection with hybrid models combining behavioral analytics with rule-based thresholds have reported reducing alert volume by 40 to 60 percent while maintaining or increasing their absolute true positive counts. Fewer alerts, same confirmed cases, lower cost per SAR filed.

Related terms and concepts

True positive sits inside a cluster of measurement terms that compliance teams and model risk functions use together to evaluate detection system health.

The confusion matrix is the framework that organizes all four detection outcomes: true positive, false positive, true negative, and false negative. Evaluating any detection model starts by populating the confusion matrix on a labeled dataset.

Precision is the ratio of true positives to all positive predictions. If a system generates 1,000 alerts and 50 are confirmed as suspicious, precision is 5%. This metric is most directly tied to analyst workload and operational cost.

Recall, also called sensitivity or true positive rate, is the ratio of true positives to all actual positives in the dataset. If there were 200 genuine suspicious cases and the system caught 50, recall is 25%. This metric reflects how much criminal activity is slipping through.

The F1 Score is the harmonic mean of precision and recall. It's the standard single-number summary when both matter, and in AML they always do.

ROC AUC measures discriminatory power across all possible classification thresholds. AUC scores above 0.85 are generally considered strong for AML transaction monitoring. A high AUC means the model can separate criminal from clean activity at many different sensitivity settings.

Explainability connects directly to true positive work. When analysts confirm a true positive and file a SAR, they need to document the specific indicators that made the activity suspicious. Systems that provide clear, per-alert reasons make that documentation faster and more defensible in an examination. The audit trail from alert generation through SAR filing is now a standard examiner request.

Model validation uses true positive and false positive rates as core performance inputs when assessing whether a detection model is fit for purpose. Ongoing model monitoring then tracks those rates over time to catch performance drift before it becomes an examination finding.


**

Where does the term come from?

**

The term comes from signal detection theory, formalized by Wilson Tanner and John Swets in their 1954 paper in Psychological Review on visual detection decision-making, and later codified by David Green and Swets in their 1966 textbook Signal Detection Theory and Psychophysics. Medical diagnostics adopted the language in the 1970s for test sensitivity and specificity analysis.

In AML, the vocabulary became standard through the growth of statistical transaction monitoring models and FATF's effectiveness methodology framework, published in 2013, which explicitly evaluated whether detection systems produce actionable intelligence rather than simply generating SAR volume.


**

How FluxForce handles true positive

FluxForce AI agents monitor true positive-related patterns in real time, flag anomalies for analyst review, and generate evidence-backed decisions with full audit trails.

← Back to Glossary