Skip directly to search

Skip directly to content

 

Is Data Mesh Going to Replace Centralised Repositories?

 
 

Insights Through Data | Adriana Calomfirescu |
26 July 2022

Data Mesh seems to herald a paradigm shift in data storage and processing. Instead of central data repositories, such as data warehouses or data lakes, companies could in future rely on a distributed data architecture to finally be able to exploit the full potential of their data. Let’s take a look at the principles of this new data architecture concept, its advantages, and what needs to be considered when deciding whether it is a good fit for a company.

INTRODUCTION

“Data is the new oil” – this quote by British mathematician Clive Humby is over 15 years old, and most companies have since recognised the meaning of his words: they are trying to exploit the potential of their data. To do this, they are collecting ever larger amounts of data in central data stores where it is cleaned and processed so that it can then be further processed as high-quality data.

The data originates from internal operational and transactional systems and domains that are essential for business operations. Furthermore, data from external sources that offer companies additional information is also fed into the data warehouse or data lake.

DATA VOLUMES BECOME A PROBLEM FOR DATA REPOSITORIES

However, companies are slowly reaching their limits with this monolithic data platform architecture – and they often do not even achieve the desired results. They face the challenge of controlling their ever-growing data volumes and harmonising them to reach their full potential. Moreover, this process costs money and takes time. Thus, their ability to react flexibly and quickly to the increasing number of internal and external data sources and to connect them to their existing data is limited.

Furthermore, the origin of the data in these repositories often cannot be fully traced, for example: from which system did it originally come from? Through which other systems did it migrate? When was it changed, how, and by whom? This information is important to ensure a high level of data quality. However, due to the large amount of data that ends up in the repository – as well as the speed at which data changes – it is sometimes neglected and not fully tracked and recorded. This usually causes those subject matter experts who are supposed to work with the data to become reluctant to use it.

As a result, companies struggle to generate meaningful insights from their data and identify new use cases – such as new products or services for their customers. In addition, it takes time to transform the data and make it ready for its consumers. This is especially the case if a company does not employ enough data specialists who know exactly how the data should be processed to reach its purpose.

GETTING MORE OUT OF DATA

The data mesh concept attempts to address these challenges by managing data as a product. This means that the data is structured as data domains, has data owners, and is properly catalogued so everyone interested in certain data within the company can easily access the metadata. The team generating the data is considered the data owner and must prepare its data in such a way that other data consumers in the company can use it easily via self-service options. To do this, they need to satisfy several principles when building and managing their data products, such as Data Integrity, Discoverability, Self-Description, and Interoperability. This increases consumer confidence in the products.

The biggest advantage here is that the data-producing departments naturally know their data best. Accordingly, it is easier for them to derive benefits from it and to develop new possible use cases.

In this new data architecture, the role of the data scientists and engineers also changes: they are no longer acting as go-between for the data-producing and data-consuming teams, but they become part of the data-producing team. In this way, they learn the domain knowledge necessary to support their team colleagues in the best possible way when preparing the data products. This simplifies and speeds up the entire process, which at the same time leads to lower overall costs.

CENTRAL STANDARDS AND A CENTRAL REGISTER

The data mesh approach is particularly suitable for larger companies that work with very large data sets and a variety of data sources. Smaller companies, on the other hand, usually can get by with a central data repository. When implementing a data mesh approach, companies should consider two key things to set up the necessary processes:

A central data governance model: data mesh only works if all data products in a company adhere to consistent standards and guidelines. Only then are they interoperable, and data consumers can merge multiple data products and work with them according to their individual needs. Therefore, companies must first define standards and policies that determine how data products are categorised, managed, and accessed.

A central data catalogue: for data consumers to be able to find data products, companies need a central data catalogue. All existing data products are listed in this catalogue, including additional information such as the origin of the data. Furthermore, data owners can add sample data sets that data consumers can use to try out the product before using their own data sets.

CONCLUSION: THE PARADIGM SHIFT IS IMMINENT, AND FEW ARE READY TO CAPTURE THIS UNAVOIDABLE MARKET TRANSITION

Data mesh is a new, decentralised approach to storing and processing data that might see widespread adoption. But the more companies realise that data repositories, which have become established in recent years, are no longer sufficient for their requirements, the more they will look for alternatives. Data mesh offers them the opportunity to get more out of their existing data, while at the same time deploying the staff more efficiently and making internal processes more effective and flexible.

Adriana Calomfirescu

Global Head of Data Delivery

Adriana has 25+ years of progressive leadership experience across the analysis, design, and implementation of information technology and data systems. She’s responsible for identifying technology trends in the data world and ensuring a constant growth of the technical competences in the data discipline, while also providing governance for the Data projects at Endava. Starting with a small, dedicated team of data engineers in 2015, under Adriana’s leadership, the Data Delivery discipline has grown to include over 400 associates in 17 locations across the globe.

 

Related Articles

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

Most Popular Articles

11 Things I wish I knew before working with Terraform – part 2
 

Architecture | Julian Alarcon | 23 July 2019

11 Things I wish I knew before working with Terraform – part 2

11 Things I wish I knew before working with Terraform – part 1
 

Architecture | Julian Alarcon | 25 June 2019

11 Things I wish I knew before working with Terraform – part 1

Is Data Mesh Going to Replace Centralised Repositories?
 

Insights Through Data | Adriana Calomfirescu | 26 July 2022

Is Data Mesh Going to Replace Centralised Repositories?

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1
 

Software Engineering | Matjaz Bravc | 29 June 2021

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

An R&D Project on AI in 3D Asset Creation for Games
 

AI | Radu Orghidan | 17 May 2022

An R&D Project on AI in 3D Asset Creation for Games

Scalable Microservices Architecture with .NET Made Easy – a Tutorial
 

Architecture | Matjaz Bravc | 25 January 2022

Scalable Microservices Architecture with .NET Made Easy – a Tutorial

Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score
 

Insights Through Data | Alveiro Garcia | 08 June 2021

Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

8 Tips for Sharing Technical Knowledge – Part 1
 

Software Engineering | Laurentiu Spilca | 12 November 2020

8 Tips for Sharing Technical Knowledge – Part 1

Data is Everything
 

Insights Through Data | Adina Gabriela Stavar | 12 January 2021

Data is Everything

 

Archive

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

We are listening

How would you rate your experience with Endava so far?

We would appreciate talking to you about your feedback. Could you share with us your contact details?