In our 3-part blog series, I explore three critical challenges we’ve encountered while helping financial institutions move to the cloud. And when I refer to “the cloud”, I specifically refer to hyperscaler cloud platforms like AWS, Azure and GCP.
This blog post explores how to strike the right balance between building high resilience and managing the required investments, such as cloud capacity, engineering effort, and operational work, while also developing comprehensive disaster recovery capabilities for critical services.
In part two, we’ll dive into how to build cybersecurity capabilities tailored to the financial sector’s unique landscape. Finally, the third part focuses on how financial institutions can maintain business agility while operating within a strictly regulated environment. I’ll also share key insights on how to overcome these challenges to build a secure, resilient, and future-ready organisation.
Building comprehensive disaster recovery capabilities and finding the right balance between high resilience and right-sized investments
With the introduction of DORA (Digital Operational Resilience Act), we now work closely with financial institutions to help them strengthen their resilience in a structured, regulatory-driven way. DORA requires organisations to specifically identify their critical and important services and ensure that formal disaster recovery (DR) planning and regular testing are carried out for these services.
Generally, disaster recovery planning in the cloud has focused on ensuring business continuity and responding to and recovering from a major incident or outage. While this remains essential for financial institutions as well, the added responsibility for societal impacts and security of critical financial infrastructure brings an extra layer of complexity to the planning process.
Financial institutions aren’t only responsible for keeping their own operations running. They must also ensure that critical services, such as payment systems and access to trusted networks, continue to function. These services have a direct and significant impact on citizens’ access to essential services and, just as importantly, on their trust in the financial system as a whole.
When designing disaster recovery (DR) strategies, a comprehensive approach must be taken to address both technical resilience and the people and service operations around critical systems.
For example, disaster recovery capabilities for a business system can be built by distributing resources across multiple regions and availability zones, enabling automated failover, data replication, and rapid recovery in case of disruption. However, technology alone isn’t enough. Clear roles and responsibilities must be defined, service processes must be resilient, and regular crisis exercises must be conducted. This ensures that not only the systems, but also the people and operations supporting them, are fully prepared to maintain essential services and uphold public trust, especially from a security of supply perspective.
Also, building resilience capabilities requires investment, so it’s essential to strike the right balance between ensuring operational continuity and managing capacity costs.
While a multi-region strategy across all business solutions would deliver strong resilience, it would also lead to significant costs. That’s why right-sizing your disaster recovery strategies and clearly understanding which business solutions require the highest levels of resilience is crucial.
Recommended approach in short
As financial institutions navigate this new era of operational resilience, success lies in making informed, risk-based decisions that align with both regulatory expectations and practical realities. By taking a strategic, right-sized approach to disaster recovery—grounded in clear priorities, robust processes, and continuous testing—organisations can’t only meet DORA requirements but also build lasting trust, stability, and readiness for the challenges ahead.
Customer-level disaster recovery planning starts by assessing the criticality and dependencies of each system. Once these are defined, baseline disaster recovery (DR) strategies for each criticality class are established. These baselines set the minimum recovery requirements that every service must meet and are aligned with the availability expectations of their respective class.
Also, both technical capabilities (such as infrastructure failover and backup) and operational elements (like processes, roles, and communication plans) must be clearly defined. An important part of the work is to design and implement a cross-account testing plan that allows for regular validation that the implemented capabilities are truly in place and effectively supporting the needs of the business.
If you’re planning or already navigating your cloud journey, we’re happy to share what we’ve learned and support you in solving the toughest parts securely, pragmatically and in line with your regulatory environment. Don’t hesitate to contact me or Juuso Lehto.