Data Mesh seems to herald a paradigm shift in data storage and processing. Instead of central data repositories, such as data warehouses or data lakes, companies could in future rely on a distributed data architecture to finally be able to exploit the full potential of their data. Let’s take a look at the principles of this new data architecture concept, its advantages, and what needs to be considered when deciding whether it is a good fit for a company.
“Data is the new oil” – this quote by British mathematician Clive Humby is over 15 years old, and most companies have since recognised the meaning of his words: they are trying to exploit the potential of their data. To do this, they are collecting ever larger amounts of data in central data stores where it is cleaned and processed so that it can then be further processed as high-quality data.
The data originates from internal operational and transactional systems and domains that are essential for business operations. Furthermore, data from external sources that offer companies additional information is also fed into the data warehouse or data lake.
DATA VOLUMES BECOME A PROBLEM FOR DATA REPOSITORIES
However, companies are slowly reaching their limits with this monolithic data platform architecture – and they often do not even achieve the desired results. They face the challenge of controlling their ever-growing data volumes and harmonising them to reach their full potential. Moreover, this process costs money and takes time. Thus, their ability to react flexibly and quickly to the increasing number of internal and external data sources and to connect them to their existing data is limited.
Furthermore, the origin of the data in these repositories often cannot be fully traced, for example: from which system did it originally come from? Through which other systems did it migrate? When was it changed, how, and by whom? This information is important to ensure a high level of data quality. However, due to the large amount of data that ends up in the repository – as well as the speed at which data changes – it is sometimes neglected and not fully tracked and recorded. This usually causes those subject matter experts who are supposed to work with the data to become reluctant to use it.
As a result, companies struggle to generate meaningful insights from their data and identify new use cases – such as new products or services for their customers. In addition, it takes time to transform the data and make it ready for its consumers. This is especially the case if a company does not employ enough data specialists who know exactly how the data should be processed to reach its purpose.
GETTING MORE OUT OF DATA
The data mesh concept attempts to address these challenges by managing data as a product. This means that the data is structured as data domains, has data owners, and is properly catalogued so everyone interested in certain data within the company can easily access the metadata. The team generating the data is considered the data owner and must prepare its data in such a way that other data consumers in the company can use it easily via self-service options. To do this, they need to satisfy several principles when building and managing their data products, such as Data Integrity, Discoverability, Self-Description, and Interoperability. This increases consumer confidence in the products.
The biggest advantage here is that the data-producing departments naturally know their data best. Accordingly, it is easier for them to derive benefits from it and to develop new possible use cases.
In this new data architecture, the role of the data scientists and engineers also changes: they are no longer acting as go-between for the data-producing and data-consuming teams, but they become part of the data-producing team. In this way, they learn the domain knowledge necessary to support their team colleagues in the best possible way when preparing the data products. This simplifies and speeds up the entire process, which at the same time leads to lower overall costs.
CENTRAL STANDARDS AND A CENTRAL REGISTER
The data mesh approach is particularly suitable for larger companies that work with very large data sets and a variety of data sources. Smaller companies, on the other hand, usually can get by with a central data repository. When implementing a data mesh approach, companies should consider two key things to set up the necessary processes:
A central data governance model: data mesh only works if all data products in a company adhere to consistent standards and guidelines. Only then are they interoperable, and data consumers can merge multiple data products and work with them according to their individual needs. Therefore, companies must first define standards and policies that determine how data products are categorised, managed, and accessed.
A central data catalogue: for data consumers to be able to find data products, companies need a central data catalogue. All existing data products are listed in this catalogue, including additional information such as the origin of the data. Furthermore, data owners can add sample data sets that data consumers can use to try out the product before using their own data sets.
CONCLUSION: THE PARADIGM SHIFT IS IMMINENT, AND FEW ARE READY TO CAPTURE THIS UNAVOIDABLE MARKET TRANSITION
Data mesh is a new, decentralised approach to storing and processing data that might see widespread adoption. But the more companies realise that data repositories, which have become established in recent years, are no longer sufficient for their requirements, the more they will look for alternatives. Data mesh offers them the opportunity to get more out of their existing data, while at the same time deploying the staff more efficiently and making internal processes more effective and flexible.