<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
RSS Feed

Insights Through Data | Calin Constantinov |
27 January 2021

A graph database is now among the Top 20 most popular DBMS solutions!

DB-Engines.com, a platform covering the popularity of over 360 DBMS (database management system) alternatives, published a new set of rankings in November 2020. For the very first time, a graph database made it into the Top 20, and the trend continued in December 2020!


This should come as no surprise. Out of all models, graph databases have almost constantly seen a steep rise in popularity over the last seven years. The people behind the graph database system, Neo4j, claim that about 100K developers now engage with their product each month. What’s more, about 47K professions list Neo4j as a skill on their LinkedIn profile. While there are multiple graph database implementations, all things considered, Neo4j is an industry leader (Forrester, 2020).


So, what are graph databases?

We often hear data scientists say, “If I just had more data, then I could improve my predictive lift.” But the thing is, you already have more data – they’re called relationships, and they’re already hiding in your data sets (Neo4j Insider Guide, 2020). Arguably, graph databases have the analytical and discovery capabilities that no other technology can provide.

Not only do they open up unexplored avenues, but they can also power otherwise impractically expensive computations for enabling so-called data-driven insights. Such technologies can handle petabytes of data while supporting rates of millions of updates per second. Let’s not forget, companies that use data technologies are 23 times more likely to find new customers and nine times more likely to make existing ones loyal (McKinsey, 2014). Also, graphs come with a great ‘feature’ which humans tend to value very much: They are visual! A graph can easily be represented on a whiteboard and immediately grasped by a wide audience. This comes from the fact that such representations maintain the fundamental structure of naturally occurring data.

While many only see them as a niche technology, finding relationships in combinations of diverse data by using graph techniques at scale will form the foundation of modern data and analytics. By 2023, graph technologies will facilitate rapid contextualisation for decision-making in 30% of organisations worldwide (Gartner, 2020), and some analysts consider graph databases to have the potential to replace the existing relational market by 2030 (Forbes, 2020). Before then, the application of graph processing and graph DBMSs will grow at 100% annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science. Specifically, graph analytics growth will be observed due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries (Gartner, 2019).

But who uses Neo4j?

Just three years ago, 51% of global data and analytics technology decision-makers either were implementing, had already implemented, or were upgrading their graph databases (Forrester, 2017). As a reflection of the aforementioned growth in popularity, the numbers have increased spectacularly since then.


Many financial services and insurance companies are using Neo4j across a wide range of risk management use cases. Endava has used it with a number of our clients, too.

What are companies using Neo4j for?

Most people assume that banks use machine learning and artificial intelligence to detect activities related to money laundering. However, in the US, AML (anti-money laundering) compliance staff have increased up to tenfold at major banks over the past years (McKinsey, 2014). The reason is that most of the work is still done manually.

Alternatively, numerous companies have started using Neo4j for modelling financial risk, for detecting fraud rings, and for solving AML compliance challenges. A common scenario relates to decisions regarding fraudulent activity in financial transactions.

With the real-time data analysis and visualisation provided by Neo4j, fraud patterns are more accurately identified. Graph analysis systems can typically account for data from customers up to 10 degrees of separation apart. Additionally, much more historical information can be reviewed, often uncovering additional suspicious activities, and even leading to the detection of emergent clusters of ‘money mules’. For instance, PayPal leverages Neo4j to process more than 1 billion transactions per day across 3 billion node graphs. This technology has already saved PayPal more than $700 million while enabling the company to perform predictive fraud analysis (Bank Administration Institute, 2016).


Additionally, transactions need to be fulfilled within imposed cost and time limits. However, even if it does not touch US soil (or a US bank), a transaction that is at some point converted to USD is subject to multiple compliance obligations. However, there is no fully automated solution today for end-to-end routing of payments that checks all legal requirements and does all risk assessments for the intermediate entities. Fortunately, this is also a graph problem which Neo4j is happy to help with!


Apart from the financial sector, there are many other use cases for graphs. For instance, the search for information takes 14 to 30% of engineers’ time (Deloitte, 2019). Fortunately, Neo4j can play an important role in easing the implementation of Knowledge Graphs. In fact, the Knowledge Graph of Historical Lessons Learned at NASA is built around Neo4j. Also, an apparently non-related application of the same technology is represented by eBay’s ShopBot for Graph-Powered Conversational Commerce, a system that is very good at determining the next question to ask the user. In this application, a knowledge graph was coupled with natural language understanding and artificial intelligence to store, remember, and learn from past interactions with shoppers. An interesting space to watch relates to using graph technologies for improving AI and ML.


Additionally, the Panama Papers are a perfect example of how powerful the right analytics tool can be. For the Panama Papers investigation, the ICIJ (International Consortium of Investigative Journalists) team fed 2.6 TB of spaghetti data made up of 11.5M heterogeneous documents to Neo4j and ended up winning the Pulitzer Prize in 2017. Lastly, Neo4j was also behind NBC News’ effort of uncovering 200K tweets tied to Russian trolls and their role in the 2016 US presidential elections.

Graph database tech’s prospective future

The lack of standards has long been a problem for the graph database world. As a data exchange format, GraphML is slowly becoming widely adopted. In terms of data querying, SQL is unlikely to be the right model for a graph-centric language. The good news is that since September 2019, Graph Query Language (GQL) is being developed and maintained by the same international working group that also maintains the SQL standard. GQL is heavily based on Cypher, Neo4j’s expressive and intuitive query language. Similar to how SQL composes tables, GQL will compose graphs. The two might even complement each other, interoperating by wrapping subqueries of the other type.


With all that in mind, Neo4j (and graph databases in general) should clearly not be neglected. Many of us will likely encounter and even use graphs on a daily basis very soon!

Calin Constantinov

Innovation Community Lead

Calin is a Java technical lead, mentor, and speaker with a lot of stories to tell. He has a Ph.D. in graph data analysis. His preferred use cases include real-time recommendation systems, trust & reputation engines, and social ranking algorithms. Naturally, his main passion is Neo4j. Calin now models everything as a graph. This is where he gets stubborn: if you ever meet him, he will spend most of his time convincing you that whatever else you're doing is wrong! Outside of work he refers to himself as an "Enthusiastic Beer Drinker".


From This Author



  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture