Skip directly to search

Skip directly to content

 

Following the Patterns – The Rise of Neo4j and Graph Databases

 
 

Insights Through Data | Calin Constantinov |
27 January 2021

A graph database is now among the Top 20 most popular DBMS solutions!

DB-Engines.com, a platform covering the popularity of over 360 DBMS (database management system) alternatives, published a new set of rankings in November 2020. For the very first time, a graph database made it into the Top 20, and the trend continued in December 2020!


Graph-1

This should come as no surprise. Out of all models, graph databases have almost constantly seen a steep rise in popularity over the last seven years. The people behind the graph database system, Neo4j, claim that about 100K developers now engage with their product each month. What’s more, about 47K professions list Neo4j as a skill on their LinkedIn profile. While there are multiple graph database implementations, all things considered, Neo4j is an industry leader (Forrester, 2020).

Graph-2

So, what are graph databases?

We often hear data scientists say, “If I just had more data, then I could improve my predictive lift.” But the thing is, you already have more data – they’re called relationships, and they’re already hiding in your data sets (Neo4j Insider Guide, 2020). Arguably, graph databases have the analytical and discovery capabilities that no other technology can provide.

Not only do they open up unexplored avenues, but they can also power otherwise impractically expensive computations for enabling so-called data-driven insights. Such technologies can handle petabytes of data while supporting rates of millions of updates per second. Let’s not forget, companies that use data technologies are 23 times more likely to find new customers and nine times more likely to make existing ones loyal (McKinsey, 2014).  Also, graphs come with a great ‘feature’ which humans tend to value very much: They are visual! A graph can easily be represented on a whiteboard and immediately grasped by a wide audience. This comes from the fact that such representations maintain the fundamental structure of naturally occurring data.

While many only see them as a niche technology, finding relationships in combinations of diverse data by using graph techniques at scale will form the foundation of modern data and analytics. By 2023, graph technologies will facilitate rapid contextualisation for decision-making in 30% of organisations worldwide (Gartner, 2020), and some analysts consider graph databases to have the potential to replace the existing relational market by 2030 (Forbes, 2020). Before then, the application of graph processing and graph DBMSs will grow at 100% annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science. Specifically, graph analytics growth will be observed due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries (Gartner, 2019).

But who uses Neo4j?

Just three years ago, 51% of global data and analytics technology decision-makers either were implementing, had already implemented, or were upgrading their graph databases (Forrester, 2017). As a reflection of the aforementioned growth in popularity, the numbers have increased spectacularly since then.


Graph-3

Many financial services and insurance companies are using Neo4j across a wide range of risk management use cases. Endava has used it with a number of our clients, too.

What are companies using Neo4j for?

Most people assume that banks use machine learning and artificial intelligence to detect activities related to money laundering. However, in the US, AML (anti-money laundering) compliance staff have increased up to tenfold at major banks over the past years (McKinsey, 2014). The reason is that most of the work is still done manually.

Alternatively, numerous companies have started using Neo4j for modelling financial risk, for detecting fraud rings, and for solving AML compliance challenges. A common scenario relates to decisions regarding fraudulent activity in financial transactions.

With the real-time data analysis and visualisation provided by Neo4j, fraud patterns are more accurately identified. Graph analysis systems can typically account for data from customers up to 10 degrees of separation apart. Additionally, much more historical information can be reviewed, often uncovering additional suspicious activities, and even leading to the detection of emergent clusters of ‘money mules’. For instance, PayPal leverages Neo4j to process more than 1 billion transactions per day across 3 billion node graphs. This technology has already saved PayPal more than $700 million while enabling the company to perform predictive fraud analysis (Bank Administration Institute, 2016).


Graph-4

Additionally, transactions need to be fulfilled within imposed cost and time limits. However, even if it does not touch US soil (or a US bank), a transaction that is at some point converted to USD is subject to multiple compliance obligations. However, there is no fully automated solution today for end-to-end routing of payments that checks all legal requirements and does all risk assessments for the intermediate entities. Fortunately, this is also a graph problem which Neo4j is happy to help with!


Graph-5

Apart from the financial sector, there are many other use cases for graphs. For instance, the search for information takes 14 to 30% of engineers’ time (Deloitte, 2019). Fortunately, Neo4j can play an important role in easing the implementation of Knowledge Graphs. In fact, the Knowledge Graph of Historical Lessons Learned at NASA is built around Neo4j. Also, an apparently non-related application of the same technology is represented by eBay’s ShopBot for Graph-Powered Conversational Commerce, a system that is very good at determining the next question to ask the user. In this application, a knowledge graph was coupled with natural language understanding and artificial intelligence to store, remember, and learn from past interactions with shoppers. An interesting space to watch relates to using graph technologies for improving AI and ML.


Graph-6

Additionally, the Panama Papers are a perfect example of how powerful the right analytics tool can be. For the Panama Papers investigation, the ICIJ (International Consortium of Investigative Journalists) team fed 2.6 TB of spaghetti data made up of 11.5M heterogeneous documents to Neo4j and ended up winning the Pulitzer Prize in 2017. Lastly, Neo4j was also behind NBC News’ effort of uncovering 200K tweets tied to Russian trolls and their role in the 2016 US presidential elections.

Graph database tech’s prospective future

The lack of standards has long been a problem for the graph database world. As a data exchange format, GraphML is slowly becoming widely adopted. In terms of data querying, SQL is unlikely to be the right model for a graph-centric language. The good news is that since September 2019, Graph Query Language (GQL) is being developed and maintained by the same international working group that also maintains the SQL standard. GQL is heavily based on Cypher, Neo4j’s expressive and intuitive query language. Similar to how SQL composes tables, GQL will compose graphs. The two might even complement each other, interoperating by wrapping subqueries of the other type.


Graph-7

With all that in mind, Neo4j (and graph databases in general) should clearly not be neglected. Many of us will likely encounter and even use graphs on a daily basis very soon!

Calin Constantinov

Innovation Community Lead

Calin is a Java technical lead, mentor, and speaker with a lot of stories to tell. He has a Ph.D. in graph data analysis. His preferred use cases include real-time recommendation systems, trust & reputation engines, and social ranking algorithms. Naturally, his main passion is Neo4j. Calin now models everything as a graph. This is where he gets stubborn: if you ever meet him, he will spend most of his time convincing you that whatever else you're doing is wrong! Outside of work he refers to himself as an "Enthusiastic Beer Drinker".

 

Related Articles

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

Most Popular Articles

11 Things I wish I knew before working with Terraform – part 2
 

Architecture | Julian Alarcon | 23 July 2019

11 Things I wish I knew before working with Terraform – part 2

11 Things I wish I knew before working with Terraform – part 1
 

Architecture | Julian Alarcon | 25 June 2019

11 Things I wish I knew before working with Terraform – part 1

AWS Serverless with Terraform – Best Practices
 

Architecture | Vlad Cenan | 10 December 2019

AWS Serverless with Terraform – Best Practices

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1
 

Software Engineering | Matjaz Bravc | 29 June 2021

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

Internet Scale Architecture
 

Architecture | Gareth Badenhorst | 28 January 2019

Internet Scale Architecture

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2
 

Software Engineering | Matjaz Bravc | 20 July 2021

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3
 

Software Engineering | Matjaz Bravc | 24 August 2021

EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

Microservices and Serverless Computing
 

Architecture | Radu Vunvulea | 30 May 2019

Microservices and Serverless Computing

API Management
 

Architecture | Gareth Badenhorst | 30 October 2020

API Management

 

Archive

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Closing-the-gap-between-the-product-owner-and-the-team-part-3

We are listening

How would you rate your experience with Endava so far?

We would appreciate talking to you about your feedback. Could you share with us your contact details?