Entity Resolution: Definition and Use in Compliance
Entity Resolution is a data management process that determines whether records across disparate datasets describe the same real-world person, company, or object, using probabilistic matching and deduplication to produce a single authoritative profile.
What is Entity Resolution?
Entity resolution is the process of determining whether two or more records, from one or more databases, describe the same real-world entity. The entity might be a person, a company, an account, or an asset. Records differ because of data entry errors, name transliterations, legal name changes, or deliberate obfuscation by someone trying to avoid detection.
Here's the practical problem. A customer named "Fatima Al-Hassan" applies to open a current account. Your core banking system holds a record for "Fatimah AlHassan," flagged two years earlier for structuring cash deposits. The names differ in three characters. Date of birth matches. The address is different but in the same city. Without a systematic resolution process, an analyst may never connect the two.
The methods fall into three categories. Deterministic matching applies hard rules: two records sharing the same national ID number or passport number are the same entity, full stop. Probabilistic matching scores record pairs across multiple fields, tax ID, date of birth, address, and phone number, assigning weights based on each field's reliability and applying a decision threshold. Machine learning models trained on confirmed match and non-match examples improve accuracy on edge cases, particularly transliterated names and common surnames with many bearers.
The output is a golden record: the authoritative, consolidated customer view that all compliance systems reference. It's what transaction monitoring runs rules against. It's what gets queried when a regulator issues a Section 314(a) information request. It's the file the BSA Officer signs off on.
Deduplication is the downstream action: once two records are confirmed as the same entity, the system merges or suppresses the duplicate. The distinction matters because deduplication destroys data, creating tension with BSA record retention requirements and GDPR obligations. Resolve first; then deduplicate deliberately.
Entity resolution is also called record linkage. The terms are used interchangeably in most compliance contexts. Some data scientists use "record linkage" for cross-database matching and "entity resolution" for within-database work, but regulators and vendors don't draw that line.
How is Entity Resolution used in practice?
Compliance teams apply entity resolution across the full customer lifecycle. The impact is highest at three points.
Onboarding. Before creating a new customer record, the system compares incoming application data against the existing customer master and watchlists. This is where banks catch previously exited customers who return under a slightly different name after being offboarded for fraud or Know Your Customer (KYC) failures. One mid-sized US bank found in 2023 that 4.2% of new account applications matched a previously exited customer when entity resolution ran systematically, compared to under 0.5% caught by manual review alone.
Periodic review. Enhanced Due Diligence (EDD) for high-risk accounts uses entity resolution to answer a specific question: does this customer own, control, or share addresses with other entities in our book? A Politically Exposed Person (PEP) who holds a personal account while directing funds through a private investment vehicle they control is one risk picture. Banks that can't see it as one file separate risk assessments for a single customer relationship, and regulators notice.
Fraud investigation. When analysts build a case around a suspected mule network, entity resolution connects accounts sharing phone numbers, device fingerprints, or IP addresses that present under different names. A European bank collapsed 340 apparent mule accounts into 12 coordinated clusters in 2022, cutting investigation time from six weeks to four days.
Adverse media screening depends on the same capability. A hit against a named director of a corporate customer is actionable only if the system resolves that name to a specific customer record with sufficient confidence. Without resolution, the hit sits unmatched and the risk goes unseen.
In practice, most institutions run entity resolution in real time at account opening and as an overnight batch process, catching matches triggered by new watchlist additions since the previous day's run.
Entity Resolution in regulatory context
Regulators don't use the phrase "entity resolution" in statute. The requirement is implicit in every major AML framework, and examiners expect institutions to have the capability even when it isn't named explicitly.
FATF Recommendation 10 requires financial institutions to identify and verify customer identity using reliable, independent source documents, and to retain those records for at least five years for competent authorities. If a bank holds three fragmented records for the same customer, which five-year file governs? Entity resolution makes the answer unambiguous.
In the United States, FinCEN's 2016 Customer Due Diligence Rule added beneficial ownership requirements: banks must identify and verify individuals who own or control legal entity customers. Linking a corporate customer to its Ultimate Beneficial Owner (UBO) requires entity resolution. That UBO may already exist in the system as a retail customer under a name variant. Without resolution, the connection is invisible.
In the EU, the Fifth Anti-Money Laundering Directive (5AMLD) mandated public beneficial ownership registers and required institutions to report discrepancies between those registers and their own records. You can't identify a discrepancy if you can't first determine whether the person in the public register and the person in your system are the same individual.
The ECB's targeted reviews and the EBA's AML risk assessment guidelines both expect consolidated customer risk views. Fragmented records are a direct supervisory finding. OCC and FDIC examiners have cited inadequate customer data consolidation in consent orders, requiring specific technology remediation on defined timelines.
For Know Your Business (KYB) programs, entity resolution extends to corporate identities: matching a company's trading name, registered name, and Legal Entity Identifier across data sources before extending credit or correspondent relationships. The OFAC 50 Percent Rule makes this a sanctions compliance requirement, since SDN-blocked entities extend to any company they own at 50% or more, even if the subsidiary isn't listed directly.
Common challenges and how to address them
The algorithms are the easy part. The data is not.
Name variation. Names transliterate from Arabic, Chinese, and Cyrillic scripts differently depending on who completed the form and when. "Liu Wei" and "Wei Liu" are one person in different name-order conventions. "Mohamed," "Muhammad," and "Mohammad" share a root but differ in every character position. Phonetic algorithms like Soundex handle some variation but generate high false-positive rates on short names and common surnames. Better implementations combine phonetic scoring with character-level edit distance metrics and, where available, native-script matching.
Date of birth quality. In many jurisdictions, customers without a confirmed birth date default to 01/01 of their birth year. Thousands of records in any large dataset share this value. Effective systems weight date of birth more lightly when day and month are 01/01 and rely instead on document numbers or biometric data.
Cross-system inconsistency. Banks that grew through acquisition hold the same customer in multiple systems with incompatible formats: a trading name in one, a company registration number in another, a LEI in a third. None of these fields map directly to each other without a controlled transformation step. Data standardization before matching, not after, is the fix.
Threshold calibration. Fuzzy matching confidence thresholds must differ by use case. Sanctions screening warrants a low match threshold: accept false positives rather than risk a false negative on an SDN match, which carries criminal liability. Customer deduplication warrants a higher threshold: incorrectly merging two real customers with similar names destroys the integrity of both records and creates a data protection breach.
The standard architecture partitions outcomes into three zones: auto-link above 90% confidence, auto-reject below 40%, and human review for the middle band. This adds operational cost. But a missed SDN link or a wrongly merged customer file costs more.
Related terms and concepts
Entity resolution sits at the intersection of data engineering and financial crime compliance. The adjacent terms appear frequently in vendor documentation, regulatory guidelines, and audit reports.
Record Linkage is the older, technically precise name for the same process. Ivan Fellegi and Alan Sunter formalized it in 1969. In most compliance and vendor contexts, the two terms are used interchangeably.
Deduplication is the action that follows entity resolution. Once two records are confirmed as the same entity, the system produces a golden record and suppresses the duplicate. This creates records management obligations: the BSA requires five-year retention of customer identification records, and GDPR's right to erasure adds complexity when a "duplicate" record contains data the primary doesn't.
Fuzzy Matching is the algorithmic technique most commonly used within entity resolution. It quantifies the similarity between two text strings using edit distance, phonetic equivalence, or token overlap. Entity resolution orchestrates fuzzy matching alongside other signals, including structured identifiers and behavioral data.
Identity Verification (IDV) confirms that a claimed identity corresponds to a real person, typically through document verification and liveness checks. Entity resolution confirms that a verified identity appears consistently across internal records. IDV asks: is this person real? Entity resolution asks: is this the same person we already know? Both feed the Customer Due Diligence (CDD) file, but they answer different questions.
Graph Analytics extends entity resolution into network territory. Once entities are resolved to golden records, graph analytics maps connections between them: shared addresses, common directors, overlapping counterparties. This is how investigators expose shell company structures and mule networks spanning hundreds of accounts. Network analysis built on unresolved, fragmented records produces misleading results: connections appear broken where they're actually whole.
The sequence is: verify identity, resolve to existing records, deduplicate into a golden record, then run graph analytics to expose hidden relationships.
Where does the term come from?
The record linkage problem was formalized by Ivan Fellegi and Alan Sunter in their 1969 paper "A Theory for Record Linkage," published in the Journal of the American Statistical Association. Their probabilistic framework, scoring record pairs across multiple fields with weighted thresholds, is still the model that most modern systems follow. The phrase "entity resolution" emerged in computer science literature during the 1990s as the discipline expanded to include graph methods and machine learning.
In financial services, the regulatory obligation became explicit through FATF Recommendation 10, which requires customer identification across all products and channels. The EU's Fourth Anti-Money Laundering Directive (4AMLD, 2015) extended this to group-level consistency, requiring uniform customer data across subsidiaries and effectively making entity resolution an operational requirement rather than a best practice.
How FluxForce handles entity resolution
FluxForce AI agents monitor entity resolution-related patterns in real time, flag anomalies for analyst review, and generate evidence-backed decisions with full audit trails.