A graph database is now among the Top 20 most popular DBMS solutions!
DB-Engines.com, a platform covering the popularity of over 360 DBMS (database management system) alternatives, published a new set of rankings in November 2020. For the very first time, a graph database made it into the Top 20, and the trend continued in December 2020!
This should come as no surprise. Out of all models, graph databases have almost constantly seen a steep rise in popularity over the last seven years. The people behind the graph database system, Neo4j, claim that about 100K developers now engage with their product each month. What’s more, about 47K professions list Neo4j as a skill on their LinkedIn profile. While there are multiple graph database implementations, all things considered, Neo4j is an industry leader (Forrester, 2020).
So, what are graph databases?
We often hear data scientists say, “If I just had more data, then I could improve my predictive lift.” But the thing is, you already have more data – they’re called relationships, and they’re already hiding in your data sets (Neo4j Insider Guide, 2020). Arguably, graph databases have the analytical and discovery capabilities that no other technology can provide.
Not only do they open up unexplored avenues, but they can also power otherwise impractically expensive computations for enabling so-called data-driven insights. Such technologies can handle petabytes of data while supporting rates of millions of updates per second. Let’s not forget, companies that use data technologies are 23 times more likely to find new customers and nine times more likely to make existing ones loyal (McKinsey, 2014). Also, graphs come with a great ‘feature’ which humans tend to value very much: They are visual! A graph can easily be represented on a whiteboard and immediately grasped by a wide audience. This comes from the fact that such representations maintain the fundamental structure of naturally occurring data.
While many only see them as a niche technology, finding relationships in combinations of diverse data by using graph techniques at scale will form the foundation of modern data and analytics. By 2023, graph technologies will facilitate rapid contextualisation for decision-making in 30% of organisations worldwide (Gartner, 2020), and some analysts consider graph databases to have the potential to replace the existing relational market by 2030 (Forbes, 2020). Before then, the application of graph processing and graph DBMSs will grow at 100% annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science. Specifically, graph analytics growth will be observed due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries (Gartner, 2019).
But who uses Neo4j?
Just three years ago, 51% of global data and analytics technology decision-makers either were implementing, had already implemented, or were upgrading their graph databases (Forrester, 2017). As a reflection of the aforementioned growth in popularity, the numbers have increased spectacularly since then.
What are companies using Neo4j for?
Most people assume that banks use machine learning and artificial intelligence to detect activities related to money laundering. However, in the US, AML (anti-money laundering) compliance staff have increased up to tenfold at major banks over the past years (McKinsey, 2014). The reason is that most of the work is still done manually.
Alternatively, numerous companies have started using Neo4j for modelling financial risk, for detecting fraud rings, and for solving AML compliance challenges. A common scenario relates to decisions regarding fraudulent activity in financial transactions.
With the real-time data analysis and visualisation provided by Neo4j, fraud patterns are more accurately identified. Graph analysis systems can typically account for data from customers up to 10 degrees of separation apart. Additionally, much more historical information can be reviewed, often uncovering additional suspicious activities, and even leading to the detection of emergent clusters of ‘money mules’. For instance, PayPal leverages Neo4j to process more than 1 billion transactions per day across 3 billion node graphs. This technology has already saved PayPal more than $700 million while enabling the company to perform predictive fraud analysis (Bank Administration Institute, 2016).
Additionally, transactions need to be fulfilled within imposed cost and time limits. However, even if it does not touch US soil (or a US bank), a transaction that is at some point converted to USD is subject to multiple compliance obligations. However, there is no fully automated solution today for end-to-end routing of payments that checks all legal requirements and does all risk assessments for the intermediate entities. Fortunately, this is also a graph problem which Neo4j is happy to help with!
Apart from the financial sector, there are many other use cases for graphs. For instance, the search for information takes 14 to 30% of engineers’ time (Deloitte, 2019). Fortunately, Neo4j can play an important role in easing the implementation of Knowledge Graphs. In fact, the Knowledge Graph of Historical Lessons Learned at NASA is built around Neo4j. Also, an apparently non-related application of the same technology is represented by eBay’s ShopBot for Graph-Powered Conversational Commerce, a system that is very good at determining the next question to ask the user. In this application, a knowledge graph was coupled with natural language understanding and artificial intelligence to store, remember, and learn from past interactions with shoppers. An interesting space to watch relates to using graph technologies for improving AI and ML.
Additionally, the Panama Papers are a perfect example of how powerful the right analytics tool can be. For the Panama Papers investigation, the ICIJ (International Consortium of Investigative Journalists) team fed 2.6 TB of spaghetti data made up of 11.5M heterogeneous documents to Neo4j and ended up winning the Pulitzer Prize in 2017. Lastly, Neo4j was also behind NBC News’ effort of uncovering 200K tweets tied to Russian trolls and their role in the 2016 US presidential elections.
Graph database tech’s prospective future
The lack of standards has long been a problem for the graph database world. As a data exchange format, GraphML is slowly becoming widely adopted. In terms of data querying, SQL is unlikely to be the right model for a graph-centric language. The good news is that since September 2019, Graph Query Language (GQL) is being developed and maintained by the same international working group that also maintains the SQL standard. GQL is heavily based on Cypher, Neo4j’s expressive and intuitive query language. Similar to how SQL composes tables, GQL will compose graphs. The two might even complement each other, interoperating by wrapping subqueries of the other type.
Innovation Community LeadCalin is a Java technical lead, mentor, and speaker with a lot of stories to tell. He has a Ph.D. in graph data analysis. His preferred use cases include real-time recommendation systems, trust & reputation engines, and social ranking algorithms. Naturally, his main passion is Neo4j. Calin now models everything as a graph. This is where he gets stubborn: if you ever meet him, he will spend most of his time convincing you that whatever else you're doing is wrong! Outside of work he refers to himself as an "Enthusiastic Beer Drinker".
13 November 2023
Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities
07 November 2023
Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases
19 September 2023
The Rise of Vector Databases
01 June 2023
Challenges for Adopting AI Systems in Software Development
14 February 2023
Generative AI: Technology of Tomorrow, Today
26 July 2022
Is Data Mesh Going to Replace Centralised Repositories?
09 June 2022
A Spatial Analysis of the Covid-19 Infection and Its Determinants
08 June 2021
Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score