Fluctuating markets, inflation, and rising rates have all contributed to the tightening of budgets, and with growth and earnings slowing down, companies must find budget-friendly ways to improve their credibility with clients by showing continuous improvement. Experience tells us that those who remain stagnant in a downturn are behind the curve when the economy is ready to expand.
Resiliency is an often-overlooked area that drives up customer satisfaction without breaking the bank, and with the capital markets industry inching toward zero-day settlements, clients are challenging themselves and their vendors to higher levels of standards than ever before.
There are several well-known outages that could've been avoided with the right level of resiliency built in. For example, a leading North American exchange had what they deemed a technical glitch that resulted in hundreds of erroneous stock prices. People were left with very few details other than “system issues,” but logic says if the right level of resiliency had been in place, this wouldn’t have occurred.
In the UK, the Financial Conduct Authority has been “deeply concerned” about the increasing number of technology outages for years. At an FCA annual public meeting, the regulator’s executive director of supervision said the number of “operation resilience breaks” reported in terms of IT failings had increased 300% year-on-year.
With these events in mind, how can companies ensure resiliency and set operations up for success? In this post, we’ll explore common misconceptions about resiliency and five ways to improve your approach.
Challenges to Overcome in Creating Resiliency
Making resiliency a priority can be a challenge, though before establishing your approach, it’s important to understand common misconceptions.
These misconceptions include the following:
- Only Front-Office Systems Need Resiliency – Conventional wisdom tells us that trading systems need to be always on, or there can be a significant impact based on how markets are moving. However, as settlement cycles tighten, it's becoming extremely critical that back-office systems are continuously up and running to ensure clean and timely settlements, as well as reconciling trade differences prior to the start of the day.
- Resiliency is an Infrastructure Problem to Solve – Oftentimes, business leaders and application owners alike see outages and think there’s a hardware problem. Increasing capacity or adding servers can often be patchwork to solve the immediate problem, only delaying an inevitable outage in the future. Shifting your mindset to taking a software-based approach to resiliency will put you on a path to success.
- Resiliency Will Destroy Your Budget – When you factor in the hidden costs of outages, including client penalties and the operational overhead of manual work, architecting for resiliency pays for itself very quickly. Resilient operations will enable your organization to focus on revenue-generating activities rather than spending valuable time fixing outage issues. In addition, there are many low-cost architecture options to ensure your applications are up and running without client impact. This can be done by classifying the tiering and data replication strategy of your application based on performance and availability patterns.
- Transitioning to the Cloud Solves the Problem – Cloud Service Providers will support your resiliency needs in various ways. The struggle organizations often run into is that they carry the same infrastructure-based mindset into the cloud—and their on-premises problems along with them. If resiliency isn't thought of upfront as part of the cloud application architecture, then you'll likely have a similar fate as many did when they were shut out for hours during cloud outages in 2021.
- Resiliency is Only Needed When a Data Center Goes Down – It's true that you need to prepare for a scenario where you've completely lost all connectivity to your landing zone, whether that be a traditional data center or a region within the cloud. The reality is that this is a very rare occurrence. More often, issues occur with your application while the entirety of the landing zone remains intact. Therefore, the focus needs to be on providing resiliency within the same environment with an understanding that transitioning to another region should only be a last resort.
It's often challenging to get teams mobilized to focus on resiliency. It's not a revenue generator, and it doesn't jump out on balance sheets as a cost savings. The savings are there, without question, but they need to be looked at differently. This includes looking at operational efficiencies, increased productivity by engineers able to work on revenue-generating activities instead of production issues, and, of course, cost avoidance savings.
5 Ways to Improve Your Approach to Resiliency
With these challenges and misconceptions in mind, how can companies design an effective approach to resiliency?
Here are five helpful ways to improve:
- Start With Your Software – It's important to have a software-based approach to resiliency. Product Owners, Architects, Application Developers, SREs, and Application Support teams need to collaborate early and often in the SDLC to understand the different integration points of the application and architect it to support an always-on environment. Once the feedback loop is established, infrastructure support teams can be brought in to help provide the best options to suit the needs of the application. Deep collaboration and engagement across multiple levels of the tech stack are a must.
- Don't Make Resiliency a Money Decision – Product Owners are focused on new features and functions by nature. It's the exciting part of the job and, let's face it, often generates more revenue. When you prioritize this work over resiliency features, architecture could cost more in the long run. Re-engineering your application will be more expensive—and time-consuming—than implementing basic resilient operability features. Additionally, the more your application goes down, the less likely your clients are to invest further with you, costing you more money in the end.
- Start Small – There is a tremendous amount of technical debt out there. Database or middleware decisions in the past may not always be best positioned for the future. Older coding languages may not be flexible enough to get you to the leading edge. When doing an assessment of your tech stack, it’s easy to focus on the big picture with a massive price tag to re-architect and re-engineer. Start small: Get your core application in an always-on, steady state, and then move to the next layer.
- Shift Your Mindset – The best organizations know that transformation requires you to invest in your people. Resiliency isn't a one-person job, and to successfully collaborate across multiple layers of your organization, you need the right mindset. This often includes being willing to break old habits and traditions.
- Chaos Testing – The importance of chaos testing to understanding your failure points and user experience cannot be understated. This should be a highly disciplined approach to testing a system’s integrity and should involve proactively simulating and identifying failures before they lead to unplanned downtime or a negative user experience. Think of this in terms of what will happen if you pull out any of the pieces that make your application successful—including upstream and downstream. To gain the most benefit, you should perform these tests in coordination with clients and simulate as close to live activity as possible.
Finding a More Resilient Path Forward
Not investing in resiliency can risk damaging your reputation, causing clients to think twice about investing more with you. Outages can cause executions to be missed, settlements to be delayed, and portfolio managers to start the trading day uninformed about their current holdings.
Operating without resiliency can lead to fines, penalties, regulatory scrutiny, and top-tier investment firms questioning their relationships and looking for ways to break their contracts. It's a small industry—and a bad reputation travels further and faster than a good one does.
There are budget-friendly ways to architect your application that will ensure the right level of resiliency is in place while meeting the needs of your clients. Shifting ownership to your product and application owners will help you position your application to support critical industry changes. However, making decisions based on budgets will ultimately cost your client investment in the long run.
Avoiding these risks requires a change in attitude from the industry toward investing in resiliency. The first step is to have a proactive approach to ensuring that critical trading and post-trade applications are always on.
It’s also important to note that there are other critical areas to focus on, such as infrastructure. This can include hardware, database and middleware replication technologies, network, monitoring, and cybersecurity solutions. Operational structure should also be considered, including site reliability engineering. Stay tuned for future posts on steps you can take to continue improving your approach.
To learn more about current challenges in the market and how Endava can help, visit our industries page.