What is the difference between rule-based and ML-based fraud detection?
Quick answer
Rule-based fraud detection fires on hard-coded conditions: flag every wire above $10,000 from an account under 30 days old. ML-based detection scores risk by learning patterns from historical data across thousands of variables. Regulators accept both; ML models must satisfy model risk management requirements under frameworks like SR 11-7.
The full answer
Rule-based fraud detection uses explicit conditions written by humans. ML-based fraud detection uses patterns learned from data. That's the core distinction, and everything else follows from it.
A rule fires when specific conditions are met: cash deposits over $9,999, three international wires in 24 hours, a new account sending a large transfer within 48 hours of opening. The logic is binary. Either the condition is met or it isn't. An examiner can read the exact rule and understand why a transaction was flagged. Banks deployed these systems in the 1990s precisely because they were auditable and simple to explain to supervisors.
Rules don't adapt. Fraudsters learn the thresholds and route around them. Structuring transactions to stay below reporting limits is the oldest technique in the book. A rule written for $10,000 cash deposits doesn't catch nine $1,100 deposits across different branches. Rule maintenance is also expensive: a mid-size bank might manage 500 to 1,000 active rules, each requiring periodic review and sign-off from compliance and risk.
ML-based detection learns from labeled historical data. A model trained on millions of clean transactions and thousands of confirmed fraud cases finds which variable combinations predict risk, without an analyst specifying them in advance. The output is a probability score. Transactions above a review threshold enter an analyst queue; those below don't. The model can incorporate customer history, peer benchmarks, and network relationships simultaneously. Most rule engines can't do any of that.
False positive rates are where the business case becomes concrete. Rule-heavy AML programs consistently generate false positive rates above 90%, a pattern documented in FinCEN's 2019 Innovation Notice and confirmed across industry surveys. For the full picture on what percentage of AML alerts are false positives across institution types, those rates vary, but a team reviewing 1,000 alerts to find 50 real cases isn't a sustainable operation. We've seen institutions where analysts spend 80% of their time clearing false positives and 20% on actual investigation.
ML doesn't eliminate false positives, but it can halve the review queue without reducing detection coverage. The FATF's 2021 report on new technologies for AML/CFT found that ML-based transaction monitoring systems at pilot institutions reported meaningful reductions in alert volume alongside maintained or improved detection rates.
Can AI be used for AML transaction monitoring? addresses the regulatory permission in detail. The short answer is yes, with governance requirements.
Why this matters
SAR filing timelines don't wait for analyst backlogs. Banks must file SARs within 30 calendar days of detecting suspicious activity, and how long banks have to file a SAR is a hard regulatory deadline. A rule-based system generating thousands of false positives weekly creates a review backlog that delays genuine case investigation. How FinCEN defines suspicious activity is the underlying legal standard both systems must meet, regardless of the method.
APP fraud and mule accounts break rule logic. What APP fraud is illustrates the core problem: the victim authorizes the payment themselves, so transaction size and velocity rules don't fire. The Payment Systems Regulator's mandatory reimbursement rules, effective October 2024, make this a direct financial liability for UK banks. How mule accounts get detected is another example: mule account activity looks superficially clean in isolation but leaves statistical patterns across behavioral sequences that ML models pick up. Rules written for specific transaction thresholds miss it almost entirely.
Model risk management is a compliance burden that doesn't apply to rules. The Federal Reserve and OCC's SR 11-7 guidance requires independent validation, documentation of model assumptions, and drift monitoring for any quantitative model used in risk decisions. A fraud scoring model is in scope. What triggers a regulatory exam includes model governance gaps as a known escalation driver. You can update a rule threshold in an afternoon. Updating an ML model requires a full validation cycle. That's not a reason to avoid ML; it's a reason to build the governance infrastructure before the exam.
The EU AI Act adds another layer for ML deployments. Who needs to comply with the EU AI Act and when it takes effect are now live planning questions for any institution deploying ML fraud models in EU markets. Fraud detection in financial services is classified as high-risk AI under Annex III of Regulation 2024/1689. Conformity assessments, technical documentation, and human oversight controls are mandatory before deployment. Rules aren't subject to the same requirements.
Most institutions run both: rules for explicit regulatory triggers (CTR thresholds, sanctions screening, specific named typologies) and ML for behavioral anomaly detection at scale. Neither approach is wrong for every situation. The governance burden differs, the false positive profile differs, and the detection surface differs. The choice isn't ideological; it's about matching the tool to the threat type and building the compliance infrastructure to support whichever model you run.
Related questions
- Can AI be used for AML transaction monitoring?
- What percentage of AML alerts are false positives?
- How do mule accounts get detected?
- What is APP fraud?
- How does FinCEN define suspicious activity?