<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
 
RSS Feed

Innovation | Radu Orghidan |
08 July 2020

In May 2020, we held a two-day hackathon with Microsoft. The aim of the hackathon was to take medical forms completed in several languages and translate them for our travel insurance client. The hackathon was to test if an automated translation solution could be a viable replacement at a fraction of the cost.

BUSINESS CONTEXT

We all consider the possibility of getting sick while traveling. Travel insurance companies know this and provide compelling packages that can help us gain peace of mind, but also recover the expenses incurred by a health incident. However, the process can be complicated by several factors, with one of the foremost hurdles being the communication difficulties between travel insurance companies and remote doctors. Upon returning home from traveling abroad, patients have to send the receipts, medical letters, and all other documents provided by the foreign medical institution to the travel insurance company.

These documents are potentially in different languages, have different formats, and use medical terminology. Needless to say, insurance companies require accurate translations of these. One of our customers, a travel insurance company, challenged us to build a system that can help them deal with these documents. We accepted through a one-day virtual hackathon with the help of our Microsoft partners.

PREPARING FOR THE PROCESS

We started two weeks in advance, requesting samples of documents to be translated and asking all the necessary questions in order to gain a common understanding of our customer’s objectives and current status. The resulting process workflow is depicted further below in Figure 1. It begins with the email being received by our automated document translation service. Attachments are extracted, and an image is produced for each page of the attached documents (usually as PDFs). Each image is processed, and then optical character recognition (OCR) is performed in order to extract the text.

One optional step for help in dealing with documents that have a previously known structure is document layout analysis. The extracted text is translated and sent back by email. Reconstruction can be performed if the translated document needs to have a similar aspect to the original one.

ARCHITECTURE AND TECHNOLOGY STACK

As shown in Figure 1, our approach involves three main Microsoft Azure technologies:

- Logic Apps, which we used for the process flow automation,
- Functions, which are bits of serverless C# code used for pre- and post-processing, for calling the Cognitive Services, and are triggered by Logic Apps along the pipeline,
- Cognitive Services, which offer the respective machine learning (ML) functionalities for reading the document (using OCR) and for the translation.

All data gets stored in Azure Storage.

Figure 1
Figure 1. The solution architecture.

Including our utilisation of these technologies in this specific case, here is a general overview of them as well:

Azure Logic Apps enable the connection of apps and services, automating workflows without writing code. By using Logic Apps, our dev teams can create business processes and workflows, integrate with SaaS and enterprise applications, and take advantage of the Microsoft Cloud to enhance the integration solutions.

Azure Functions is a serverless compute service that lets you run small pieces of code triggered by events without having to explicitly provision or manage infrastructure. Functions can be used for integrating systems, working with the Internet of Things (IoT), and building simple APIs and microservices for processing large data volumes. There are several pricing plans, depending on the usage needs. We used Azure Functions to call the suitable Cognitive Services for OCR, for context understanding using Read API, and for translation.

Cognitive Services bring ML models into the hands of our developers without the need of building and training models from scratch. By simply calling an API, the app is enhanced with human-like abilities (seeing, hearing, speaking, searching) and accelerated decision-making. Cognitive Services offers all of the above while keeping a large choice of programming languages.

For the sake of further clarification, while we initially planned to use OCR on input images, upon the advice of the Microsoft experts, we ended up using Read API. It provided superior translations and also had the ability to directly access PDF documents. In this case, the expertise brought in by our colleagues from Microsoft was key to our choice of a more suitable technology.

TRANSLATION PROCEDURE

In our particular case, the arrival of a new email into a dedicated mailbox is the trigger for starting the analysis of the attachments (see Figure 2). After that, we iterate through all the attachments and store the IDs of the attachments.

The arrival of a new email triggers the analysis of the attachments.
Figure 2. The arrival of a new email triggers the analysis of the attachments.

Next, the metadata file for each email is created – including some details about the email (see Figure 3).

Creation of the metadata file for each email.
Figure 3. Creation of the metadata file for each email.

The file hierarchy necessary for storing each message and its attachments is created in the Azure blob storage, as shown in Figure 4.

A new folder is created for each email, provided it has attachments. The folder name is the email ID. This folder contains subfolders for each attachment with the name being the attachment ID. Each subfolder contains a structure as shown in Figure 5, where:

- The ‘binary’ folder contains the original file received as attachment;
- The ‘read’ folder contains the JSON object with the text extracted by Read API/OCR;
- The ‘reconstruction’ folder contains a txt file with the translated text;
- The ‘metadata’ file contains details about the initial file.

Azure blob storage for the attachments.
Figure 4. Azure blob storage for the attachments.

The file hierarchy for storing the messages.
Figure 5. The file hierarchy for storing the messages.

Finally, the blob for saving the email metadata is created (see Figure 6), and the second Logic App is called. The email ID and its attachment IDs are sent to the second application.

Creation of the blob for saving the email metadata.
Figure 6. Creation of the blob for saving the email metadata.

In the second Logic App, two functions are called for each attachment: one using Read API for extracting the text from the images, and the other one to translate the extracted text to English, as shown in Figure 7.

The two functions called for each attachment are Read API and Translate.
Figure 7. The two functions called for each attachment are Read API and Translate.

Finally, the third Logic App is used for sending the email (see Figure 8).

Function for sending the email.
Figure 8. Function for sending the email.

The third Logic App, shown in Figure 9, receives the email ID from the previous app and provides the attachment IDs from the email metadata.


Get attachment IDs from the email metadata.
Figure 9. Get attachment IDs from the email metadata.

For each attachment, we get the metadata and the reconstructed txt file from the storage and reconstruction folders, respectively, as shown in Figure 10.

Obtain the metadata and the reconstructed txt file from the from storage and reconstruction folders.
Figure 10. Obtain the metadata and the reconstructed txt file from the from storage and reconstruction folders.

The file content is stored in an object and the txt file gets the name of the original document received. In the final step, the reply for the initial email is created. It will contain the txt files with the translated text as attachments and a message in the email body.

The reply for the initial email is created.
Figure 11. The reply for the initial email is created.

THE HACKATHON 

We had originally wanted to run this hackathon in a more traditional, face-to-face format at our Cluj office. However, due to current circumstances requiring us to work remotely, we had to be flexible. On the day of the hackathon, the four developers and a business analyst from Endava joined the two solution architects from Microsoft on the Teams video conference call (see Figure 12). The customer was also invited to join the event kick-off and the wrap-up meeting.

The morning planning session.
Figure 12. The morning planning session.

The hackathon team consisted of:

Endava
  • Pavel Spataru
  • Dorin Bazgan
  • Daniel Moniry-Abyaneh
  • Jay Chitnis
  • Radu Orghidan
  • Bradley Howard
  • Razvan Berinde

Microsoft

The day started with an online setup session on Teams with everybody involved. The objectives were presented, and the details were quickly aligned with the customer. Then, the team split with each developer, tackling one or two tasks. The development process evolved throughout the day with the partners from Microsoft having one-on-one sessions with our colleagues.

The demo with the customer was scheduled at 5:30 pm. An hour before, we had an all-hands for a technical status update.

As described below, the planned pipeline was presented during the demo.

First, the email containing the PDF document to be translated is sent (see Figure 13).

The PDF document to be translated is sent by email.
Figure 13. The PDF document to be translated is sent by email.

The document is received and processed in less than 30 seconds. The translation is returned to the sender as a text attachment. It can be compared with the original Spanish document for an assessment of the quality of the translation.

When solving a problem related to automated text recognition, it is important to understand the options offered by the Cognitive Services for printed and handwritten text in order to use the most suitable function for each case. In our situation, we needed to asynchronously process text-heavy content in both images and PDF while considering the context. The Read API ticked all the boxes and became our preferred option (see Figure 14).

Context understanding using Read API.
Figure 14. Context understanding using Read API.

The translation’s quality, provided by the Translator text API, can be enhanced by using Custom Translator, which enabled us to build customised dictionaries that can accurately solve the translation of medical terms (see Figure 15).

Translation functionality.
Figure 15. Translation functionality.

A few examples of automated translations are shown in Figure 16. Details are also presented in Figure 17 and Figure 18.

Side by side of translation and original document.
Figure 16. Side by side of translation and original document.


Detail of the translation.
Figure 17. Detail of the translation.


Original document.
Figure 18. Original document.

LESSONS LEARNED AND FUTURE WORK

We originally planned for a one-day event, but we extended it to a second day because we didn’t manage to properly tackle all the technical issues that appeared along the way and the testing of the final PoC. We now recommend running two-day events as a minimum, especially if they’re happening online.

The learning curve was flattened with the help of the experts from Microsoft. Their deep understanding of specific technologies perfectly complements our broader, but sometimes shallower expertise. The Microsoft architects didn’t write any code themselves, but instead helped by teaching us the principles of each service more quickly than it would have taken us to learn by ourselves. If you can, ask the vendor for technical mentors to help your teams adopt new technologies faster.

During the hackathon, we were able to focus on the problem and technology without interruptions. We recommend regular, but sparse check-up calls. Approximately three times a day should be enough.

A hackathon can result in an improved relationship with the customer. In our case, the customer was involved in the hackathon, in the requirements analysis, the creative process, and the demo session, and was able to appreciate our talent and hard work. The PoC is still used by the customer as a reference to compare the accuracy and speed of the results with the current solutions. This leads to new opportunities and insightful discussions about the technical possibilities.

The solutions that we envisaged initially were eventually enhanced by having the opportunity to discuss them with the Microsoft team, who brought alternative solutions to our attention. For example, the OCR functionality that we originally planned to use was replaced by the Read API, a better option that not only considers the context of the phrases, but also provides more accurate results.

During the interaction with our team, the colleagues from Microsoft seised the opportunity to discuss other practical applications within different domains of strategic value for us such as insurance, fintech, multimedia, etc. which has inspired our team to host joint hackathons in our approach to developing healthy relationships with our customers and technical partners. Moreover, virtual hackathons, as opposed to in-person events, can increase the knowledge transfer speed without affecting social distancing.

CONCLUSION

Running a hackathon, even a virtual one, can be a fantastic way to accelerate the sales and delivery of a project by using new technology. We look forward to running more events like this in the future.

Radu Orghidan

VP Cognitive Computing

Radu is passionate about understanding the inner mechanisms of innovation and using them to solve business challenges through cloud and on-premises cognitive computing systems. He is currently focused on machine learning and generative AI to create systems that enhance users’ ability to understand and interact with the physical and digital reality. In Endava, Radu is also looking at strategic approaches to align novel technical tools with business goals. In his free time, Radu is a keen motorcycle rider and loves spending time with his family.

 

From This Author

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

 

Archive

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture

OLDER POSTS