Insights Through Data
| Gabriel Preda |
27 May 2021
INTRODUCTION
A multidisciplinary team represented Endava at the NASA COVID-19 Space Apps Challenge Hackathon between May 30-31, 2020. The challenge consisted of using a wide variety of data from NASA and external sources to study the impact of COVID-19 epidemics in the spring of 2020.
The Endava team was made up of Data Engineers, Developers and Data Scientists as well as a UX Designer, and we tried to solve the challenge using data from various sources, e.g. NASA satellite measurements, EU social and economic data, GitHub COVID-19 data as well as Kaggle UN countries economic and social data. Our aim was to understand if there are correlations between the energy and transportation sectors, pollution, unemployment rates and the incidence and dynamics of COVID-19 cases.
The analysis was done only for European countries, so all the data and findings are related to those only.
OUR APPROACH
For this challenge, our aim was to accomplish the following objectives:
- Understand if there are correlations between NASA satellite data about pollution and various economic and social data, including transportation, energy, oil and coal production and unemployment, as well as health indicators and the COVID-19 incidence.
- Use these correlations to establish a risk score for each country.
- Build a user-friendly application to display this data.
We started by investigating NASA satellite measurements data for Europe, using country coordinates and/or boundary limits, and extracted the data from the netCDF4 format. To be more specific, we analysed and processed MERRA-2 CO2, SO2 and IASI METOP CO data as well as Copernicus air quality data (SO2, CO2, pm2.5 & pm10 data).
IASI CO distribution across Europe, April 2020
For COVID-19 data, we included that provided by John Hopkins University – curated by us – also with a focus on European countries.
We ingested, processed and enriched data related to air and maritime transport extracted from UN open-source data. Starting from the initial datasets, we added information using the lookup files and, where needed, transformed the quarterly data into monthly data by using the quarterly data as an average value for 3 months. This included international intra-EU freight and mail air transport data provided by main airports in each reporting country and EU partner country, air passenger transport data provided by main airports in each reporting country, airport traffic data from reporting airports and airlines and data on the gross weight of goods transported to/from main ports by direction and type of traffic (national and international).
Then we investigated various factors which are related to the impact of COVID-19. This relation could be direct like the correlation between morbidity and mortality to healthcare sector development in each country. Other relations were indirect and could be inferred through the measures imposed by each country, for example changes in unemployment rates among young people or the prevalence of internet access in various countries due to the massive shift towards working from home and online education.
Earthdata MERRA-2 CO Column Burden (COCL) distribution, April 2020
We were also looking at sectors which saw major shifts, like mobility and transportation, or which might have a relationship with pollution. For example, the energy and industrial sectors, which may have had lockdown-related work restrictions, rely heavily on large energy consumption. We also looked to factors that might have an impact on COVID-19 dynamics, like education, literacy, healthcare system quality, population age, the forest percentage of the total land area as well as GDP per capita, population density and percentage of seats for women in parliament.
Even though the hackathon and our analysis took place only at the beginning of a time where the COVID-19 epidemics would turn out to become a global pandemic, we were already able to see some interesting correlations. For example, we observed an inverse correlation between the number of physicians per 1,000 people and the number of fatalities. Furthermore, we also found a correlation between the percentage of fatalities in a population and the unemployment rate.
Correlation between COVID-19 aggregate indicators and UN economic and social indicators (Western Europe)
CONCLUSION
While focusing on the potential methods and execution of such complex data analysis projects, the work of the Endava team already highlighted various interesting insights about how the COVID-19 epidemics impact certain areas of the EU economy as well as various social factors. Some of our findings are captured in the visualisation and can be used as a starting point for further targeted analysis.
Over the last year, the COVID-19 pandemic has disrupted the way that people live and work on a global level. Therefore, it could be very valuable to use the data analysis methods developed during the NASA Hackathon, including the analysis application developed by the Endava team, and apply them to the ever-growing database of environmental, logistical and social data. This could help pinpoint even further the COVID-19 risk factors and thereby reduce and prevent the spreading of the disease. Looking to the future, robust yet flexible data analysis frameworks can of course also be used to support countries in their overall healthcare management systems, independent of COVID-19.
Gabriel Preda
Principal Data Scientist
Gabriel has a PhD in computational electromagnetics and started his career in academic and private research. He co-founded two technology start-ups and has worked in software development for 15+ years. Currently, Gabriel is a Principal Data Scientist at Endava, working for a range of industries and writing about advanced data analytics, geospatial analysis, natural language processing (NLP), anomaly detection, MLOps and generative AI. He is a high-profile contributor in the world of competitive machine learning and currently one of the few triple Kaggle Grandmasters. Outside of data science and machine learning, Gabriel enjoys hiking, climbing and reading.All Categories
Related Articles
-
13 November 2023
Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities
-
07 November 2023
Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases
-
19 September 2023
The Rise of Vector Databases
-
27 July 2023
Large Language Models Automating the Enterprise – Part 2
-
20 July 2023
Large Language Models Automating the Enterprise – Part 1
-
01 June 2023
Challenges for Adopting AI Systems in Software Development
-
14 February 2023
Generative AI: Technology of Tomorrow, Today
-
26 July 2022
Is Data Mesh Going to Replace Centralised Repositories?