This article was written in collaboration with Markus Boehme and Toby Dixon.
THE DATA UNIVERSE IS IN EXPANSION
Data-driven business models are at the top of the strategic agenda of many Corporate and Investment Banks (CIBs): analytics, machine learning (ML), client insight, hyper-personalisation, process mining, enhanced risk management, surveillance – to name a few. Data makes the difference: any solution is only as good as its input data. While data is ubiquitous, it lives in a universe in expansion. A data big bang has scattered data sources from the original on-premises mainframe across a cloud of microservices constellations, beyond the organisation’s frontier. Regional constraints, such as GDPR (General Data Protection Regulation), also pull apart some of the data in global organisations.
CIB data comes in all sorts of volume, velocity, and variety (the “3 Vs”): from a single transaction record to years of market data timeseries, from accounting batch reports to real-time risk exposure, from structured data to dark (matter?) data. Microservices with their dedicated storage and the fintech model with its highly interoperable systems, each focused on a specific business feature, will continue fragmenting the data universe.
Leveraging data comes with costs spread across acquisition, enhancement, storage, accessing, and processing. For example, market data is notoriously onerous to acquire and manipulate. AFME (Association for Financial Markets in Europe) recently published a report on “The Rising Cost of European Fixed Income Market Data”. At the same time, the UK’s FCA (Financial Conduct Authority) announced that it will review the costs of market data. There are numerous stories of multiple lines of business of the same bank paying twice for the same data, or a Value at Risk system in UAT (user acceptance testing) pulling several years of historical market data, or struggling to comply with licensing restrictions. Overall end-to-end costs of data are often underestimated at the design stage and have a habit of getting out of hand in live operation.
ADRIFT IN THE DATA SPACE
How do you engineer a data platform across different business lines, asset classes, front to back, across systems including SaaS (Software as a Service) third parties, in a way that is readily accessible to all end users, whether they are tech-savvy data scientists or “low code” analysts?
Several CIBs went through an evolutionary, ‘tag on and gradually improve’ approach, with on-premises data warehouses and then data lakes. However, the results were often disappointing. They provided limited data integration, scalability, and maintainability. While they made some data accessible, they didn’t reduce the underlying inefficiency and added to the cost base. More confusing yet, multiple approaches have emerged beyond the traditional centralised data warehouses, such as data mesh, data fabric, and data virtualisation. They are variations on the centre of gravity of the storage, serving layer, and governance. Assessing the maturity and impact of these approaches can be daunting.
These approaches ended up settling for less in their pursuit of keeping projects manageable. But with rare exceptions, they did not deliver the expected benefits, and the business users got increasingly frustrated.
REACH FOR THE STARS
It is time to tackle those challenges at a fundamental level: simultaneously architecting for end-to-end cost and business impact. It might at first feel like organising a trip to Mars. The right architecture, leveraging latest technology opportunities in a phased approach, makes this journey possible.
Third-party vendors, too, have replaced their data warehouses and adopted SaaS solutions to distribute their data. “Data marketplaces” are emerging as a simple way to discover and access data outside an organisation, from mere business calendars to market insights. Some vendors go a step further by offering to host all client data and become the central data platform, although with some major constraints.
A large asset manager recently planned to replace an on-premises Enterprise Data Management system with a state-of-the-art Cloud Data Platform, including a proprietary Operations Portal, while the Portfolio Management System and the associated data sources and processes are being replaced. The proposed solution, based on the integration of PaaS and SaaS components, provides an unprecedented agility to deal with the sheer number of data sources, while retaining full control over the operations.
The technology around data is still evolving quickly. Most vendor platforms have had a major update within the past twelve months. They are usually more integrated across the whole data chain (ingestion, storage, transformation, serving, data science), and they provide end-to-end governance, such as monitoring, data lineage, catalogue, and access control, and reduce the stitching effort.
The cloud has become the natural home of the modern data platforms. It is now accepted by clients and regulators. It provides the required elasticity to scale computing and storage separately. But this could be another quantum leap for organisations that have not yet adopted a cloud strategy.
AI is now powering several traditionally painful processes, such as cataloguing data, identifying outliers, pairing, and reconciling. This can reduce the need for a data steward and accelerate the work of the analysts. ML algorithms can assist the business end users, based on their interactions with reports, to be truly self-reliant without knowledge of SQL, data models, or even the data format. They can recommend relevant data, enrichment patterns, transformations, and cleansing.
THE RIGHT CREW WITH THE RIGHT FLIGHT PLAN
The data journey usually involves all applications in the firm (and some data sources outside your solar system), all types of users, and an array of recent technologies. It is crucial to have the right team and the right approach.
A client had been struggling for years with the governance of hundreds of user reports, the lack of accuracy and performance of regulatory MiFID reports, the limitations of their executive dashboard, or even the aggregation of investment positions data. The new platform simplified the reporting estate by 50% and added a robust monitoring; the new regulatory reports integrated more data sources for accuracy, and their generation speed increased tenfold; the management team had a user-friendly, detailed, and accurate dashboard that was generated ten times faster; a single source of truth was finally available to fuel analytics.
Furthermore, the right team is often multidisciplinary. In addition to the usual suspects, such as data engineers, data architects, and data analysts, other experts must support the multiple dimensions of a project. For example, a cloud architect can support the associated cloud migration; an industry SME (subject matter expert) will help navigate the business processes and the data semantic and ontology challenges; an API design expert will unlock data for other systems. Whilst getting the design right is of paramount importance, the team required to deliver on this continues this theme – the data will evolve almost as fast as the platform, and agile delivery with multi-disciplinary teams lends itself to the constant need for correcting course.
An experienced team will avoid the many pitfalls and apply best practices, advise on the options and trade-offs of each design approach, and take advantage of the latest technology and keep costs in mind.
Despite the many moving parts, the right flight plan can still deliver incremental value. For example, building a data platform from a simple storage and reporting platform to a data master and book of records, to an integration hub between systems, to an advanced machine learning sandpit.
Finally, the right flight plan will include post-landing aspects. It will be forward-looking to make sure there is a sound operating model to reap the expected benefits, such as the accessibility of high-quality data to all end users according to a clear governance, the reliability and adaptability of the platform when data sources change, and the capability to automate and develop advanced analytics.
DON’T SETTLE FOR LESS
The data challenge is well understood by now, both in terms of the opportunity it presents to organisations and the size of the task ahead of them to be able to take advantage. We are at an inflection point between these two competing forces; technology has made the necessary advances in the past five years to the point where it can be safely invested in and iterated on, whilst the size and complexity of the data we are all now required to manage will only continue to exponentially grow.
Those firms who address their data estate now will prosper – those who don’t will get left behind. Now is the right time for organisations who haven’t started, or who have tried and been disappointed, to future-proof their data platforms and position themselves for the future.