operational-resilience

Operational Resilience Testing: What It Is, What Regulators Expect, and What Gets You Cited

Published: Last updated:

Operational Resilience Testing is the structured process by which financial institutions verify that critical business services can withstand disruption and recover within pre-defined impact tolerances. Required under the UK PRA/FCA SS1/21, the EU's Digital Operational Resilience Act (DORA, Regulation (EU) 2022/2554), and the Basel Committee's 2021 Principles for Operational Resilience, it is a core supervisory expectation for all regulated firms.

What is Operational Resilience Testing?

Operational Resilience Testing is the structured, recurring process by which financial institutions validate that their critical business services can absorb operational disruption, recover within defined impact tolerances, and continue functioning without causing intolerable harm to customers, counterparties, or market stability.

The concept sits at the intersection of IT risk management, business continuity, and prudential supervision. Unlike a standard disaster recovery drill, it's designed to expose real failure modes: cloud provider outages, payment system breakdowns, data corruption events, third-party vendor collapses. The goal isn't to avoid disruption entirely. It's to confirm that when disruption happens, the institution stays inside its own pre-defined tolerances for downtime, data loss, and customer impact.

Regulators have moved the discipline to the service level, not just the system level. Firms must map their important business services (IBS), identify every resource those services depend on, then test what breaks when any of those dependencies fail. This means stress-testing everything from core banking platforms to the MLRO's ability to file suspicious activity reports during a major incident.

The control appears under several related labels: scenario testing, impact tolerance testing, business continuity testing, and operational risk stress testing. All are components of the same overarching discipline.

The Basel Committee's 2021 Principles for Operational Resilience define it as one of seven core requirements for banks operating in today's threat environment. The UK regulator went further, setting binding rules with hard compliance deadlines. What was once voluntary best practice is now a matter of regulatory necessity, with enforcement consequences for institutions that can't demonstrate it.

Why is Operational Resilience Testing required?

The regulatory case for Operational Resilience Testing has hardened considerably since 2019. It's now explicitly mandated across multiple jurisdictions, with specific deadlines, documentation requirements, and supervisory testing.

In the UK, the PRA and FCA published final rules in PS21/3 and SS1/21 in March 2021. From March 2025, banks, insurers, and significant investment firms must be able to demonstrate they can remain within impact tolerances for their important business services. Testing is mandatory: firms must test under a range of severe but plausible disruption scenarios and document the results.

The EU's Digital Operational Resilience Act (DORA) became effective in January 2025. DORA mandates ICT risk management frameworks including threat-led penetration testing for the largest institutions and recurring resilience tests for all in-scope entities. Banks, payment institutions, crypto asset service providers, and their critical ICT third-party providers are all in scope.

In the US, the OCC's Bulletin 2020-10 and the Federal Reserve's SR 21-5 set out supervisory expectations for operational resilience across the banking sector. The FFIEC Business Continuity Management booklet requires scenario-based testing with documented results reviewed by boards.

Compliance with FATF Recommendation 1 also requires that financial crime controls are tested as part of a firm's risk-based approach. Sanctions screening platforms and transaction detection controls are explicitly in scope for resilience validation. When these controls degrade or go offline during an operational incident, firms face compounded regulatory exposure across both prudential and financial crime supervision — two separate enforcement regimes asking the same uncomfortable question.

What do regulators expect to see?

On exam day, examiners want evidence that resilience testing is real, documented, and acted upon. Vague policy frameworks and self-assessments don't pass.

The PRA's SS1/21 is specific: firms must maintain a mapping from important business services to the people, processes, technology, facilities, and third parties that deliver them. Examiners will ask to see that mapping, and they'll trace specific test results back to it.

Board-approved governance documentation. A resilience framework with clear executive ownership (typically CRO or COO), defined impact tolerances for each important business service, and evidence of at least annual board review.

Scenario test plans and results. Not theoretical scenarios. Documented tests of severe but plausible events: cyber attacks, third-party outages, mass staff unavailability, data loss events. Results must show whether the firm stayed inside its tolerances or breached them, and what remediation followed.

Third-party dependency testing. Firms can't disclaim responsibility because a vendor failed. Examiners expect evidence that critical third-party relationships are included in resilience testing, with contracts that give the firm the right to audit.

AML and financial crime continuity plans. MLROs face direct scrutiny on whether suspicious activity report filing can continue during a major incident. If transaction monitoring goes down and the firm continues processing transactions without detection capability, that's a material gap requiring immediate escalation — not a footnote in an incident report.

Board MI. Meeting minutes showing test results were presented to the board, with clear escalation paths when tolerances were breached.

Remediation tracking. Findings from tests must feed into a tracked register with closure dates. Examiners look for closure rates and item aging, not just a list of open issues.

What does good Operational Resilience Testing look like?

Best practice goes well beyond annual disaster recovery drills. The Basel Committee's 2021 Principles for Operational Resilience and the FCA's final rules both describe a continuous, iterative process built around real service outcomes, not just system availability metrics.

A mature resilience testing programme works through these steps:

  1. Map important business services to resource dependencies. Document every person, process, technology, data set, and third party that an IBS depends on. Update the map whenever the service changes.

  2. Set quantified impact tolerances. For each IBS, define a maximum tolerable duration of disruption (for example: "payment processing offline for no more than 4 hours"), a data loss threshold, and a customer impact ceiling.

  3. Test against tolerances, not just recovery time objectives. A recovery time objective (RTO) measures when a system comes back online. An impact tolerance measures whether the business service stayed within acceptable bounds during the outage. These are different questions, and regulators want evidence of the latter.

  4. Test the worst case. The FCA expects "severe but plausible" scenarios: simultaneous outage of primary and backup data centres, compromise of the firm's cloud provider, prolonged loss of key personnel under a pandemic-type scenario.

  5. Include financial crime controls in scope. Customer due diligence processes and transaction detection systems must be resilience-tested. A firm that can process payments but can't screen them isn't operationally resilient, regardless of what its IT recovery metrics show.

  6. Test third-party dependencies directly. Review SLAs, but also run tabletop exercises or simulation tests with critical vendors. Evidence of actual vendor testing is increasingly a standard examiner request.

  7. Report findings to the board with quantified gap assessments. Each test should produce a pass/fail verdict per IBS, a root cause analysis for any breach, and a tracked remediation plan with named owners and deadlines.

Common audit findings and exam citations

Regulators have cited institutions for operational resilience failures across four recurring themes.

Untested controls. The most common finding: controls existed on paper but hadn't been tested under realistic stress conditions. In the Danske Bank Estonia case, examiners found that AML controls were technically present but functionally inoperable under the actual transaction volumes the branch was processing. No resilience testing caught the gap before it became a €200 billion scandal. A rule that hasn't been stress-tested against real volumes isn't a working control.

Impact tolerances set to pass, not to challenge. Firms that set tolerances so wide that they're trivially met attract supervisory scrutiny. The FCA has been explicit in multiple Dear CEO letters: tolerances must be set to protect customers, not to avoid exam failures.

Poor governance of test results. Multiple citations describe boards that approved resilience frameworks but never received test results. Without board-level oversight, remediation doesn't happen at pace, and the same findings recur year over year.

Third-party blind spots. The Deutsche Bank mirror trades case demonstrated what happens when oversight of complex counterparty relationships is weak. Regulators now expect firms to test whether they can operate through a critical vendor failure, not just whether the vendor has an acceptable recovery plan on file.

AML control outages treated as technical incidents, not compliance failures. Multiple FCA reviews since 2020 have flagged firms that continued processing transactions while PEP screening or detection systems were offline. Processing without screening is a regulatory violation regardless of cause.

The FCA's 2021 multi-firm review of operational resilience found that fewer than 30% of surveyed firms could demonstrate scenario testing that genuinely challenged their impact tolerances. That's where examiner attention concentrates in every subsequent review.

Metrics and KPIs

Measuring control health for operational resilience testing requires going beyond whether tests were completed on schedule. The goal is to know whether the firm can actually stay inside its tolerances when it matters.

IBS coverage rate. What percentage of important business services have been tested in the last 12 months? A mature programme targets 100%. Below 80% is a gap regulators will flag.

Scenario severity distribution. What percentage of tests were "severe but plausible" versus low-risk or theoretical? Track the ratio. Programmes weighted toward easy scenarios produce reassurance, not assurance.

Impact tolerance breach rate per test cycle. In each scenario test, how many IBS breached their defined tolerance? Track this over time. A declining breach rate signals genuine improvement. A flat one signals untested controls or tolerances set too wide.

Remediation closure rate and item age. Open findings from resilience tests should be tracked to closure. Average age above 90 days is a governance warning sign. Regulators treat aged, unresolved findings as evidence of weak board oversight.

Third-party testing coverage. Of the firm's critical third-party providers, what percentage have been subject to resilience testing or exercise-based simulation in the past 12 months?

AML control downtime incidents. Track every instance where transaction monitoring or screening systems experienced unplanned downtime, with duration, impact assessment, and any regulatory notification made. This metric sits at the intersection of resilience and AML compliance, and it's one regulators ask about directly.

Time to restore versus tolerance. Track actual time to restore (TTR) against the stated impact tolerance for each IBS. Consistent overruns mean the tolerance is wrong or the recovery capability is insufficient.

Board reporting should cover all IBS with traffic-light status, scenario test outcomes, and a tracked remediation register presented at least quarterly.

How Operational Resilience Testing connects to other controls

Operational resilience testing is the quality assurance layer that sits across every other financial crime and risk control in the institution.

The most direct connection is with transaction detection: any resilience programme that doesn't treat monitoring system availability as a critical IBS metric is leaving a material exposure unaddressed. Regulators treat detection control outages as AML failures, not IT incidents.

The typologies that exploit degraded controls are worth naming. Layering and smurfing and structuring activity tends to accelerate during periods when detection systems are impaired. Sophisticated criminal networks time high-volume, rapid-fire transactions to coincide with known maintenance windows or incident response periods when monitoring is reduced.

Sanctions exposure follows the same pattern. If screening systems go offline during an incident and transactions are processed without checks, the firm has breached its obligations regardless of intent. That's a bilateral exposure: prudential supervisor and sanctions authority asking separate questions about the same event.

FATF Recommendation 11 record-keeping requirements apply during resilience events too. Firms must maintain records even when primary systems are down. Testing whether backup record-keeping is adequate should be a standard scenario component in every annual test cycle.

A firm that can demonstrate its financial crime controls remain operational under severe disruption is one regulators trust. One that can't explain what happens to its AML controls during a cloud outage is going to struggle in any supervisory review that asks the question — and they're all asking it now.

How FluxForce supports Operational Resilience Testing

FluxForce agents continuously monitor financial crime control availability, alert throughput, and detection rates in real time. When a control degrades or goes offline, the platform flags the gap immediately and creates an auditable incident record with timestamps, affected services, and downstream exposure.

Automated evidence capture means every test run, every alert processed, and every detection decision is logged with a full audit trail available for examiner review. The reporting layer surfaces IBS coverage metrics, remediation status, and board-level dashboards without manual extraction.

For compliance teams running scenario tests, FluxForce tracks whether controls stayed inside their impact tolerances throughout each test cycle.

Book a demo to see how FluxForce supports resilience testing.

How FluxForce strengthens Operational Resilience Testing

FluxForce AI agents operate Operational Resilience Testing in real time, capture audit-ready evidence automatically, and surface the gaps examiners cite before they become findings.

← Back to Controls