<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
 
RSS Feed

Insights Through Data | Gabriel Preda |
27 May 2021

INTRODUCTION

A multidisciplinary team represented Endava at the NASA COVID-19 Space Apps Challenge Hackathon between May 30-31, 2020. The challenge consisted of using a wide variety of data from NASA and external sources to study the impact of COVID-19 epidemics in the spring of 2020.

The Endava team was made up of Data Engineers, Developers and Data Scientists as well as a UX Designer, and we tried to solve the challenge using data from various sources, e.g. NASA satellite measurements, EU social and economic data, GitHub COVID-19 data as well as Kaggle UN countries economic and social data. Our aim was to understand if there are correlations between the energy and transportation sectors, pollution, unemployment rates and the incidence and dynamics of COVID-19 cases.

The analysis was done only for European countries, so all the data and findings are related to those only.

OUR APPROACH

For this challenge, our aim was to accomplish the following objectives:

  • Understand if there are correlations between NASA satellite data about pollution and various economic and social data, including transportation, energy, oil and coal production and unemployment, as well as health indicators and the COVID-19 incidence.
  • Use these correlations to establish a risk score for each country.
  • Build a user-friendly application to display this data.

We started by investigating NASA satellite measurements data for Europe, using country coordinates and/or boundary limits, and extracted the data from the netCDF4 format. To be more specific, we analysed and processed MERRA-2 CO2, SO2 and IASI METOP CO data as well as Copernicus air quality data (SO2, CO2, pm2.5 & pm10 data).

Space rocket before launch
IASI CO distribution across Europe, April 2020

 

For COVID-19 data, we included that provided by John Hopkins University – curated by us – also with a focus on European countries.

We ingested, processed and enriched data related to air and maritime transport extracted from UN open-source data. Starting from the initial datasets, we added information using the lookup files and, where needed, transformed the quarterly data into monthly data by using the quarterly data as an average value for 3 months. This included international intra-EU freight and mail air transport data provided by main airports in each reporting country and EU partner country, air passenger transport data provided by main airports in each reporting country, airport traffic data from reporting airports and airlines and data on the gross weight of goods transported to/from main ports by direction and type of traffic (national and international).

Then we investigated various factors which are related to the impact of COVID-19. This relation could be direct like the correlation between morbidity and mortality to healthcare sector development in each country. Other relations were indirect and could be inferred through the measures imposed by each country, for example changes in unemployment rates among young people or the prevalence of internet access in various countries due to the massive shift towards working from home and online education.

Space rocket before launch
Earthdata MERRA-2 CO Column Burden (COCL) distribution, April 2020

We were also looking at sectors which saw major shifts, like mobility and transportation, or which might have a relationship with pollution. For example, the energy and industrial sectors, which may have had lockdown-related work restrictions, rely heavily on large energy consumption. We also looked to factors that might have an impact on COVID-19 dynamics, like education, literacy, healthcare system quality, population age, the forest percentage of the total land area as well as GDP per capita, population density and percentage of seats for women in parliament.

Even though the hackathon and our analysis took place only at the beginning of a time where the COVID-19 epidemics would turn out to become a global pandemic, we were already able to see some interesting correlations. For example, we observed an inverse correlation between the number of physicians per 1,000 people and the number of fatalities. Furthermore, we also found a correlation between the percentage of fatalities in a population and the unemployment rate.

Space rocket before launch
Correlation between COVID-19 aggregate indicators and UN economic and social indicators (Western Europe)

CONCLUSION

While focusing on the potential methods and execution of such complex data analysis projects, the work of the Endava team already highlighted various interesting insights about how the COVID-19 epidemics impact certain areas of the EU economy as well as various social factors. Some of our findings are captured in the visualisation and can be used as a starting point for further targeted analysis.

Over the last year, the COVID-19 pandemic has disrupted the way that people live and work on a global level. Therefore, it could be very valuable to use the data analysis methods developed during the NASA Hackathon, including the analysis application developed by the Endava team, and apply them to the ever-growing database of environmental, logistical and social data. This could help pinpoint even further the COVID-19 risk factors and thereby reduce and prevent the spreading of the disease. Looking to the future, robust yet flexible data analysis frameworks can of course also be used to support countries in their overall healthcare management systems, independent of COVID-19.

Gabriel Preda

Principal Data Scientist

Gabriel has a PhD in computational electromagnetics and started his career in academic and private research. He co-founded two technology start-ups and has worked in software development for 15+ years. Currently, Gabriel is a Principal Data Scientist at Endava, working for a range of industries and writing about advanced data analytics, geospatial analysis, natural language processing (NLP), anomaly detection, MLOps and generative AI. He is a high-profile contributor in the world of competitive machine learning and currently one of the few triple Kaggle Grandmasters. Outside of data science and machine learning, Gabriel enjoys hiking, climbing and reading.

 

From This Author

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

 

Archive

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture

OLDER POSTS