Skip directly to search

Skip directly to content

 

How Synthetic Data could solve the Patient Privacy Dilemma

 
 

AI | Adrian Sutherland |
14 April 2023

Marlon Brando once snapped at an interviewer that ‘privacy is not something that I’m merely entitled to, it’s an absolute prerequisite’. This holds as true for medical patients in 2023 as it did for Hollywood's greatest leading man in 1960.

Your average patient today has benefited from the digitisation of medical information. Electronic Health Records (EHRs) have undoubtedly made healthcare more efficient. But the technology’s main upsides – convenience and ease of access – raise a big question: who should have access to other people’s medical data?

There is no easy answer here, only a conflict of priorities. Let’s take a closer look at the disputes in play, and how an emerging technology promises to resolve many of them.

WHY PATIENT DATA IS GOLD DUST

It’s worth reminding ourselves of what is at stake here.  In the hands of medical and pharmaceutical researchers, real-world data (RWD) is the key to major breakthroughs that improve patient outcomes.

The more data available, the easier it is to identify trends and patterns. Scaling up development of diagnostics, therapies and treatments could spell countless gains in quality-adjusted life years.

THE PATIENT DATA PRIVACY QUANDARY

While the benefits of sharing healthcare data are clear, clinicians tend to take a more cautious view of its uses.

Physician-patient privilege matters, regardless of its consequences for scientific progress. Patients must always be able to trust that what they disclose to their doctor is strictly confidential.

This is not only a matter of professional ethics. Health records are now a bigger target for fraudsters than credit cards, accounting for 95% of all identity theft. Only the sturdiest firewalls can prevent identity theft on an industrial scale.

Different jurisdictions have different rules in place to establish patients’ informed consent for their records to be used in research. In almost every case, they will stipulate that personal identifiers are removed or obscured. Yet this is not always enough to keep data safe.

DE-IDENTIFICATION IS NOT A MAGIC BULLET

Even rigorously anonymised data can be reconnected to its sources. On various occasions, researchers have tested the mettle of de-identification methods and found them lacking.

By triangulating de-identified data with other information available online, scientists have been able to identify participants in genomic sequencing projects.

As machine learning tools get better, the risks of adversarial attacks on medical databases will only grow.

But what if the data never belonged to anyone in the first place?

HOW SYNTHETIC DATA PROTECTS PRIVACY

AI’s creative abilities stretch beyond writing superhero films and rendering 3D video game graphics. Neural networks trained on real-world data can now generate synthetic data that credibly resembles its sources. While artificial, this data is endowed with the same statistical properties as real-world data.

This field is developing at rapid speed. In just the last few years, Generative Adversarial Network (GAN) modelling has improved upon randomised ‘Monte Carlo’ approaches. The data produced retains deep internal relationships that are more meaningful to analyse.

Researchers can then use this data to create imitation EHRs for patients who don’t actually exist. They can then use this information without violating privacy law. So long as protocols are followed to stop real patient data ‘leaking’ into the artificial datasets, research can be done faster and at greater scale.

The research community needs to convince wary stakeholders that this technology respects privacy by design.

PUTTING SYNTHETIC DATA INTO PRACTICE

Synthetic data’s research benefits are most pronounced when real-world data is hardest to come by. Let’s take two examples.

Treating rare conditions

Some diseases are so rare that it’s hard to find enough data to design treatments for them. With only a few hundred sufferers worldwide, designing clinical trials for a condition like Progeria is almost impossible.

Instead, generative AI algorithms can take data from a small sample of patients and ‘amplify’ it into a credible representation of a larger population. Researchers can then test out different variables ‘in silico’, using computational models to see what works.

Later phases of these experiments will still require validation with real patient data. Nonetheless, the use of synthetic data could markedly accelerate these processes. This brings treatment closer for the medically underserved without risking real, vulnerable people’s health.

Responding to emergencies

As we learned in 2020, what starts small can snowball very fast. Highly-infectious diseases spread at exponential rates that far outpace our abilities to track them. Confounding factors, like uneven access to testing, can dilute the quality of real-world data.

Privacy concerns also come into play here. At febrile moments like the early stages of a pandemic, protecting the identities of the infected becomes even more important.

Using predictive AI tools on synthetic datasets could help decision-makers make quick and ethical decisions during emergencies. By creating a ‘digital twin’ of a population, we can model critical variables like cases, deaths and hospital occupancy.

During the COVID-19 pandemic, UK drug discovery company BenevolentAI used synthetic data to successfully predict that a drug for rheumatoid arthritis could be repurposed to treat Covid. We expect to see a lot more examples in the future of ersatz data delivering very real benefits.

TAKING SYNTHETIC DATA FORWARD

While synthetic data is already bearing fruit, its path to wider use remains strewn with limitations.

Firstly, it will only be as good as its inputs and the model used to process them. If the original data is biased, its synthetic twin will be too. This is a particular problem in healthcare, where data is often noisy and incomplete. The time and manpower needed to verify the original inputs can make synthetic data less efficient than it seems.

Synthetic data is not a perfect replica of its source data, just an approximation. This means it can lose outliers that are critical for truly representing a living, human population.

More research should overcome these challenges. At Endava, we are currently trying to validate whether models trained on a combination of real and synthetic data can outperform those trained on real data alone. We’re using conditional GANs to generate synthetic mammography mass measurements and feeding them to our classification models. The results could inform the development of less intrusive breast cancer diagnostics.

Stay tuned for more updates on this project. In the meantime, read Armin’s latest article for another perspective on how AI is changing the constants of medical research.

Adrian Sutherland

Principal Healthcare Architect

Adrian is an experienced solution director, architect and designer with 25+ years of diverse technology experience. Specialising in healthcare IT, he has worked with major national healthcare organisations in the UK, Europe, Australia and the US as well as on numerous projects for providers and payers. As an expert in healthcare and digital transformation, Adrian is making significant contributions, like authoring whitepapers on enhancing the patient experience, delivering keynote speeches on future wellness and well-being and addressing the role of information design in patient care management and medication alert fatigue at various conferences.

 

Related Articles

  • 27 September 2022

    AI Art in Game Production – an XDS 2022 Table Discussion

  • 07 December 2021

    Hand in Hand with Artificial Intelligence in the Energy Sector

  • 05 October 2021

    How to Improve Intelligent Energy Storage Systems Using AI

  • 11 February 2021

    Mapping the Future Applications of Artificial Intelligence

  • 27 August 2019

    Taming AI in a Cognitive Driven Business World

 

From This Author

  • 18 April 2023

    Alright, I’m Adrian Sutherland

  • 28 February 2023

    4 Healthcare Innovations That Can Benefit People and Profit

  • 24 January 2023

    Four Stakeholders Who Win the Most When Healthcare Innovates

Most Popular Articles

IN-AI-ENABLE RIGHTS: DO WE HAVE THE RIGHT TO STYMIE THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE?
 

Insurance Insights | Kevin Crawford | 02 June 2023

IN-AI-ENABLE RIGHTS: DO WE HAVE THE RIGHT TO STYMIE THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE?

The Time Is Now to Start Thinking About Real-Time Payments
 

Payments | Monica Velez | 31 May 2023

The Time Is Now to Start Thinking About Real-Time Payments

An Anatomy of the Data-Driven Retail Supply Chain
 

Transportation & Logistics Insights | Jeremy Eaton | 25 May 2023

An Anatomy of the Data-Driven Retail Supply Chain

BNPL Regulation to Protect Consumers and Control Third-party Lenders
 

Banking | Annmarie Mahabir | 23 May 2023

BNPL Regulation to Protect Consumers and Control Third-party Lenders

How Offer and Order Management Systems Are Expanding The Aviation Business Model
 

Mobility | Joachim Zintl | 17 May 2023

How Offer and Order Management Systems Are Expanding The Aviation Business Model

Salut! I’m Adriana Calomfirescu
 

Meet the SME | Adriana Calomfirescu | 16 May 2023

Salut! I’m Adriana Calomfirescu

Hi, I’m David Boast
 

Meet the SME | David Boast | 15 May 2023

Hi, I’m David Boast

The Business Impact of Fan Engagement: How to Leverage Technology to Improve Loyalty
 

Innovation | Robert Milner | 12 May 2023

The Business Impact of Fan Engagement: How to Leverage Technology to Improve Loyalty

Staying Relevant – Why Merchants should Embrace Alternative Payment Methods
 

Payments | Steven Purton | 09 May 2023

Staying Relevant – Why Merchants should Embrace Alternative Payment Methods

 

Archive

  • 02 June 2023

    IN-AI-ENABLE RIGHTS: DO WE HAVE THE RIGHT TO STYMIE THE DEVELOPMENT OF ARTIFICIAL INTELLIGENCE?

  • 31 May 2023

    The Time Is Now to Start Thinking About Real-Time Payments

  • 25 May 2023

    An Anatomy of the Data-Driven Retail Supply Chain

  • 23 May 2023

    BNPL Regulation to Protect Consumers and Control Third-party Lenders

  • 17 May 2023

    How Offer and Order Management Systems Are Expanding The Aviation Business Model

  • 16 May 2023

    Salut! I’m Adriana Calomfirescu

  • 15 May 2023

    Hi, I’m David Boast

  • 12 May 2023

    The Business Impact of Fan Engagement: How to Leverage Technology to Improve Loyalty

  • 09 May 2023

    Staying Relevant – Why Merchants should Embrace Alternative Payment Methods

  • 02 May 2023

    How IoT is Changing Insurance

  • 26 April 2023

    A Veteran Game Developer's Perspective on Tool Development

  • 24 April 2023

    How Digital Ecosystems Enhance the Healthcare Experience

  • 21 April 2023

    Green machines: how tech can help companies hit Net Zero targets

  • 20 April 2023

    The Role of People and Technology in the Future of Underwriting

  • 19 April 2023

    Media 2030: Why Advertisers and Publishers Are Racing To Find New Strategies

  • 18 April 2023

    Alright, I’m Adrian Sutherland

  • 14 April 2023

    How Synthetic Data Could Solve The Patient Privacy Dilemma

  • 11 April 2023

    Payments makes the world go round! How banks can get creative

  • 06 April 2023

    Higher Fidelity: Good Outcomes and Harnessing the Challenge of FCA's Consumer Duty

  • 05 April 2023

    AI in Pharma: How Machine Learning is Revolutionising Every Step in Drug Development

  • 04 April 2023

    Hello! I’m Leane Collins

  • 31 March 2023

    The Dos and Don’ts of Successful Carve-Outs in Private Equity

  • 30 March 2023

    Cage of Reason: FCA's new Consumer Duty heralds the rise of the 'Reasonable Insurer'

  • 28 March 2023

    A legal view on the ownership and future of AI-generated works

  • 24 March 2023

    Championing Women in Tech

  • 23 March 2023

    5 Ways Capital Markets Firms Can Ensure Resilient Operations to Improve Credibility and Efficiency

  • 15 March 2023

    Buenas! I’m Leticia Chajchir

  • 14 March 2023

    4 Ways to Improve Customers’ E-Commerce Search Experience

  • 28 February 2023

    4 Healthcare Innovations That Can Benefit People and Profit

  • 21 February 2023

    Hey, I’m Lewis Brown

  • 17 February 2023

    Top Considerations for Financial Services Providers Entering the Cross-Border Payments Space

  • 13 February 2023

    Better Together: Harnessing the Power of Digital Ecosystems

  • 09 February 2023

    What to Include in a Customer Re-Engagement Content Library

  • 07 February 2023

    Supercharging Wealth Management with Hyper-personalisation

  • 02 February 2023

    How Innovating the Insurance Customer Journey Creates a Competitive Advantage

  • 30 January 2023

    G’day, I’m David Marsh

  • 26 January 2023

    Empowering Underwriting and Unlocking Revenue with Legacy Insurance Data Sets

  • 24 January 2023

    Four Stakeholders Who Win the Most When Healthcare Innovates

  • 23 January 2023

    Journey to the Centre of the Cloud with AWS – Part 3

  • 20 January 2023

    Journey to the Centre of the Cloud with AWS – Part 2

  • 18 January 2023

    Journey to the Centre of the Cloud with AWS – Part 1

  • 17 January 2023

    The 4 Most Common Mistakes in Retail Site Design

  • 13 January 2023

    Boost and bolster your innovation. Three tips to help get it to the next level.

  • 10 January 2023

    5 Questions in Smart Energy That Will Define the Net Zero Transition

We are listening

How would you rate your experience with Endava so far?

We would appreciate talking to you about your feedback. Could you share with us your contact details?