In May 2020, we held a two-day hackathon with Microsoft. The aim of the hackathon was to take medical forms completed in several languages and translate them for our travel insurance client. The hackathon was to test if an automated translation solution could be a viable replacement at a fraction of the cost.
We all consider the possibility of getting sick while traveling. Travel insurance companies know this and provide compelling packages that can help us gain peace of mind, but also recover the expenses incurred by a health incident. However, the process can be complicated by several factors, with one of the foremost hurdles being the communication difficulties between travel insurance companies and remote doctors. Upon returning home from traveling abroad, patients have to send the receipts, medical letters, and all other documents provided by the foreign medical institution to the travel insurance company.
These documents are potentially in different languages, have different formats, and use medical terminology. Needless to say, insurance companies require accurate translations of these. One of our customers, a travel insurance company, challenged us to build a system that can help them deal with these documents. We accepted through a one-day virtual hackathon with the help of our Microsoft partners.
PREPARING FOR THE PROCESS
We started two weeks in advance, requesting samples of documents to be translated and asking all the necessary questions in order to gain a common understanding of our customer’s objectives and current status. The resulting process workflow is depicted further below in Figure 1. It begins with the email being received by our automated document translation service. Attachments are extracted, and an image is produced for each page of the attached documents (usually as PDFs). Each image is processed, and then optical character recognition (OCR) is performed in order to extract the text.
One optional step for help in dealing with documents that have a previously known structure is document layout analysis. The extracted text is translated and sent back by email. Reconstruction can be performed if the translated document needs to have a similar aspect to the original one.
ARCHITECTURE AND TECHNOLOGY STACKAs shown in Figure 1, our approach involves three main Microsoft Azure technologies:
- Logic Apps, which we used for the process flow automation,
- Functions, which are bits of serverless C# code used for pre- and post-processing, for calling the Cognitive Services, and are triggered by Logic Apps along the pipeline,
- Cognitive Services, which offer the respective machine learning (ML) functionalities for reading the document (using OCR) and for the translation.
Figure 1. The solution architecture.
Including our utilisation of these technologies in this specific case, here is a general overview of them as well:
Azure Logic Apps enable the connection of apps and services, automating workflows without writing code. By using Logic Apps, our dev teams can create business processes and workflows, integrate with SaaS and enterprise applications, and take advantage of the Microsoft Cloud to enhance the integration solutions.
Azure Functions is a serverless compute service that lets you run small pieces of code triggered by events without having to explicitly provision or manage infrastructure. Functions can be used for integrating systems, working with the Internet of Things (IoT), and building simple APIs and microservices for processing large data volumes. There are several pricing plans, depending on the usage needs. We used Azure Functions to call the suitable Cognitive Services for OCR, for context understanding using Read API, and for translation.
Cognitive Services bring ML models into the hands of our developers without the need of building and training models from scratch. By simply calling an API, the app is enhanced with human-like abilities (seeing, hearing, speaking, searching) and accelerated decision-making. Cognitive Services offers all of the above while keeping a large choice of programming languages.
For the sake of further clarification, while we initially planned to use OCR on input images, upon the advice of the Microsoft experts, we ended up using Read API. It provided superior translations and also had the ability to directly access PDF documents. In this case, the expertise brought in by our colleagues from Microsoft was key to our choice of a more suitable technology.
In our particular case, the arrival of a new email into a dedicated mailbox is the trigger for starting the analysis of the attachments (see Figure 2). After that, we iterate through all the attachments and store the IDs of the attachments.
Next, the metadata file for each email is created – including some details about the email (see Figure 3).
Figure 2. The arrival of a new email triggers the analysis of the attachments.
Figure 3. Creation of the metadata file for each email.
A new folder is created for each email, provided it has attachments. The folder name is the email ID. This folder contains subfolders for each attachment with the name being the attachment ID. Each subfolder contains a structure as shown in Figure 5, where:
- The ‘binary’ folder contains the original file received as attachment;
- The ‘read’ folder contains the JSON object with the text extracted by Read API/OCR;
- The ‘reconstruction’ folder contains a txt file with the translated text;
- The ‘metadata’ file contains details about the initial file.
Figure 4. Azure blob storage for the attachments.
Figure 5. The file hierarchy for storing the messages.
Figure 6. Creation of the blob for saving the email metadata.
Figure 7. The two functions called for each attachment are Read API and Translate.
Figure 8. Function for sending the email.
Figure 9. Get attachment IDs from the email metadata.
Figure 10. Obtain the metadata and the reconstructed txt file from the from storage and reconstruction folders.
Figure 11. The reply for the initial email is created.
We had originally wanted to run this hackathon in a more traditional, face-to-face format at our Cluj office. However, due to current circumstances requiring us to work remotely, we had to be flexible. On the day of the hackathon, the four developers and a business analyst from Endava joined the two solution architects from Microsoft on the Teams video conference call (see Figure 12). The customer was also invited to join the event kick-off and the wrap-up meeting.
Figure 12. The morning planning session.
- Pavel Spataru
- Dorin Bazgan
- Daniel Moniry-Abyaneh
- Jay Chitnis
- Radu Orghidan
- Bradley Howard
- Razvan Berinde
The day started with an online setup session on Teams with everybody involved. The objectives were presented, and the details were quickly aligned with the customer. Then, the team split with each developer, tackling one or two tasks. The development process evolved throughout the day with the partners from Microsoft having one-on-one sessions with our colleagues.
The demo with the customer was scheduled at 5:30 pm. An hour before, we had an all-hands for a technical status update.
As described below, the planned pipeline was presented during the demo.
First, the email containing the PDF document to be translated is sent (see Figure 13).
Figure 13. The PDF document to be translated is sent by email.
The document is received and processed in less than 30 seconds. The translation is returned to the sender as a text attachment. It can be compared with the original Spanish document for an assessment of the quality of the translation.
When solving a problem related to automated text recognition, it is important to understand the options offered by the Cognitive Services for printed and handwritten text in order to use the most suitable function for each case. In our situation, we needed to asynchronously process text-heavy content in both images and PDF while considering the context. The Read API ticked all the boxes and became our preferred option (see Figure 14).
Figure 14. Context understanding using Read API.
The translation’s quality, provided by the Translator text API, can be enhanced by using Custom Translator, which enabled us to build customised dictionaries that can accurately solve the translation of medical terms (see Figure 15).
Figure 15. Translation functionality.
Figure 16. Side by side of translation and original document.
Figure 17. Detail of the translation.
Figure 18. Original document.
LESSONS LEARNED AND FUTURE WORK
We originally planned for a one-day event, but we extended it to a second day because we didn’t manage to properly tackle all the technical issues that appeared along the way and the testing of the final PoC. We now recommend running two-day events as a minimum, especially if they’re happening online.
The learning curve was flattened with the help of the experts from Microsoft. Their deep understanding of specific technologies perfectly complements our broader, but sometimes shallower expertise. The Microsoft architects didn’t write any code themselves, but instead helped by teaching us the principles of each service more quickly than it would have taken us to learn by ourselves. If you can, ask the vendor for technical mentors to help your teams adopt new technologies faster.
During the hackathon, we were able to focus on the problem and technology without interruptions. We recommend regular, but sparse check-up calls. Approximately three times a day should be enough.
A hackathon can result in an improved relationship with the customer. In our case, the customer was involved in the hackathon, in the requirements analysis, the creative process, and the demo session, and was able to appreciate our talent and hard work. The PoC is still used by the customer as a reference to compare the accuracy and speed of the results with the current solutions. This leads to new opportunities and insightful discussions about the technical possibilities.
The solutions that we envisaged initially were eventually enhanced by having the opportunity to discuss them with the Microsoft team, who brought alternative solutions to our attention. For example, the OCR functionality that we originally planned to use was replaced by the Read API, a better option that not only considers the context of the phrases, but also provides more accurate results.
During the interaction with our team, the colleagues from Microsoft seised the opportunity to discuss other practical applications within different domains of strategic value for us such as insurance, fintech, multimedia, etc. which has inspired our team to host joint hackathons in our approach to developing healthy relationships with our customers and technical partners. Moreover, virtual hackathons, as opposed to in-person events, can increase the knowledge transfer speed without affecting social distancing.
Running a hackathon, even a virtual one, can be a fantastic way to accelerate the sales and delivery of a project by using new technology. We look forward to running more events like this in the future.