<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
 
RSS Feed

Cloud | Radu Orghidan |
30 April 2020

What would you say if, right after a good breakfast with your family, you looked up to see a shark swimming above their heads? Yesterday morning, we had several such visits in my house. Amazingly, we were able to accommodate a quite large tiger on our sofa, a curious turtle on the carpet and a friendly shark under our kitchen’s ceiling. Not to mention the cute pony that was waiting patiently on the terrace for my daughter.

Augmented Reality

In April 2020, as I sit and write this article, millions of children around the globe are not able to go to school or on outings because they are on lockdown. This has resulted in the need to find new ways to overcome our physical bounds and access knowledge to keep us both mentally healthy and entertained as we wait to be able to return to normal life.

The techniques used to blur the boundaries between the individual and the world around them and enhance the reach of their brains are called cognitive augmentation (CA). Human interfaces for augmented cognition have been studied for a long time as their usage leads to seamless knowledge transfer and improved learning and decision-making processes.

Let’s have a look at the submerged part of the 'iceberg' that brings the AR applications that run on your mobile phone to life.

What’s under Augmented Reality’s hood? 

The augmented reality app from Google gave us the opportunity to meet animals up close on a life-sized scale. The rest of the animals (you can find the full list here) will certainly be 'visiting us' during the next couple of days.

The zoologic uncanny valley that I was expecting is successfully bridged by Google’s AR app through an exceptional combination of spatial stability and the quality of the models. The animals are able to ‘sit’ or ‘move’ on planar surfaces (the floor, table, or ceiling), with their 3D models presented without any glitches. The animations are natural and well done. Of course, there is more work to be done on the animals with long fur, such as the lion’s mane, but this is to be expected if you think about the complexity of modelling hair. On the other hand, the flat or smooth surfaces, such as the turtle’s back, look awesome. The real scale is also a great asset for understanding the size of the animal.

Another great AR application is the virtual measuring tape from Apple which allows taking physical measurements of the world using a smartphone.

Phone

Technically speaking, AR is based on classical computer vision algorithms with SLAM (Simultaneous Localisation And Mapping) at its core. Such an algorithm compares visual features of the scene from successive camera frames in order to calculate the relative movement between frames. There are already very good SLAM implementations from frameworks like ARKit (Apple), ARCore (Google), MRTK (Microsoft) or the cross-platform AR Foundation (Unity3D), so there is no need to implement SLAM from scratch.

In spite of the advances offered by the available SLAM frameworks, there are more ingredients needed to reach a high-quality AR experience. First, the application has to run at the edge, meaning on devices that are near the source of data (the camera, microphone, etc.) and are not necessarily connected to internet. These devices must be able to process data and take decisions in real time without the lag introduced by the back and forth communication with remote servers. Then, there is an increasing need for intelligent, machine learning specific functionalities that we, as human beings, take for granted. For instance, the recognition of people, objects and even larger scenes, pose estimation, semantic segmentation, motion understanding, visual anomaly detection, text reading, OCR, audio recognition or text translation are just a few examples of areas in which humans expect high performance and reliability from machines. More complex applications rely on algorithms that provide humanlike performance. For instance, semantic segmentation is crucial for placing virtual objects in a scene behind other existing objects.

AR applications are an ideal growth bed for machine learning 

With the camera always on, a continuous image stream is produced providing structured data (objects reliably tracked, orientation, displacement, etc.) in a repeatable manner (the same object or features are seen from different angles). Therefore, ML models can be used to boost AR applications and make their use even more natural for us humans.

Just as my kids paid attention to the details of a 3D giant sea turtle and engaged with it for much longer than they would normally do by studying its (2D) picture in a book, AR can be applied in other domains as an enabler for cognitive augmentation. Let’s take a look at a few examples.

At Endava, we used AR for remote collaboration in factories or for creating a fully interactive and explorable virtual motor show. These experiences can be achieved using VR headsets or specific AR devices such as Microsoft’s HoloLens or smart contact lenses with embedded cameras. Similarly, field workers in the surveillance industry can also use AR enhanced glasses to quickly spot potential menaces detected by ML.

The real estate industry can also benefit from the use of AR and VR by helping users understand the configuration of a space or various decoration options through an immersive experience. Pinterest’s tool for placing real objects in virtual scenes enables a visual discovery of items which, in turn, leads to a higher engagement from users. Devices such as the Matterport camera can create a photo realistic 3D model of the environment that can be accessed remotely by visitors. Ikea uses a similar idea for virtually placing furniture in our homes.

All the above examples show useful applications of AR or VR built upon ML predictions. However, even more important is the powerful combination between these technologies which enable the double loop learning. We can collect data about how we interact with our world when we have access to its augmented version and understand the 'why' of our actions based on the patterns driving our decisions.

The AR & ML technology stack

A successful AR application is one that is in the hands of the consumers and is heavily used. A new AR app can help a provider approach new customers, offer bespoke solutions or create new revenue opportunities through direct or cross-sales. However, in order to make it successful on the long run, it’s important to collect data, discover usage patterns and improve upon those. Let’s return to Google’s application that shows 3D animals in AR. What if the 'animal' would be able to detect the emotion on the face of a child and interact accordingly? What if the app could recognise the objects in a room and the cat could jump up onto a table or the dog could try to play with a toy? The sky is the limit for the interactions that the AR assets can have with the objects and the living creatures in the scene.

The edge deployed AR & ML apps are the result of a technology stack as the one shown in Figure 2. The training and inference of the ML models use the computing power provided by the hardware placed at the bottom layer of the stack. The AR applications rely on the edge device’s steam power. The AR assets are placed in the scene as (static or animated) 3D objects whose appearance is boosted by ML models. With training data provided by the initial AR applications, the ML models are either created using available Platforms as a Service (PaaS) or accessed through Software as a Service (SaaS) APIs. New utilisation patterns, discovered during the use of the applications, are fed into the platform to create or improve the ML models.

Chart
Figure 2. AR & ML technology stack

One of the main challenges in running ML on the edge is the need of low latency. In the case of AR apps, this is literally visible. Usually a graphic card or an NCS stick, such as Movidius, are used to increase the inference speed by running massively parallel algorithms.

What’s the optimal team shape? 

This brief overview reveals the complexity of an eco-system that lives at our fingertips. As data scientists or researchers, we often tend to see a technology or an ML model as a goal in itself. Once our own intricated mix of algorithms and hardware works, in perfectly controlled conditions, we tend to claim victory. Unfortunately, this huge effort, from our perspective, represents only a tiny share of the effort required for production level deployment of a mature application that works out there, in the hands of the merciless users. Nobody will appreciate the robustness of the SLAM algorithm if it fails, without warning, in low light conditions. The users will leave the application if the animals turn upside down because of a gimbal lock error. And no one cares if the app crashes because the load balancer didn’t work properly.

My point here is that, for a successful product, several teams must cooperate along its whole data science lifecycle. Let’s take a quick glance at this process that involves, at least, three teams: A Data Science Team (DS), a Development and Testing team (DT) and an MLOps team. While the teams may have overlapping roles, there are a few quite unique profiles that are needed for a successful project.

First, the business need must be clearly understood with the help the Business Analyst from the DS team. If you aim at delivering real value in the market, make sure that the problem you pick is structured, repeatable and predictable.

Then, the next step is data acquisition. As a rule of thumb, the available data needs to check the 4 V's: Volume, Variety, Velocity and Veracity. The DS team must include a Data Analyst whose role is to provide inspiration from data and help the decision makers avoid the confirmation bias. Since data is the new oil, plan from day 1 how you’ll ramp up the volumes of real data handled by the application and involve a Software Architect from the DT team early in the discussions.

The model’s proof of concept (PoC) is usually built by the Applied ML engineer (part of the DS team) upon experimenting with existing models. Although many times transfer learning works just fine, in some cases the models for solving narrow problems have to be built from scratch by another member of the DS team: The Researcher.

Once the PoC provides satisfying results, it has to leave the Jupyter notebook and move down the pipeline, towards the end users. The engineering skills needed for the productisation, integration, end-to-end testing, monitoring and quality assurance are provided by the collaboration between the DT and DS teams.

In software engineering, deployment at scale, infrastructure, and the monitoring of the live operation is ensured by the DevOps team. MLOps, the equivalent of the DevOps for ML, is the set of practices for a healthy development lifecycle that leads to systems that are operable, manageable and maintainable. Therefore, for successfully deploying an ML enhanced AR application in the wild, we’ll need a cross-functional team with an eye on the end goal and with a tight connection with the end users.

Depending on the complexity of the project, the technical team must also involve roles such as a UX designer, a Data Product Manager, a Graphic Designer, a Project Manager and a Product Owner.

Besides the technical considerations, a successful team needs to also consider the major challenge of the product integration in the 'production line' of the end user. Even for an entertainment application, the users must adopt it and be able to enjoy its benefits. Therefore, a few other roles should be involved: A Domain Expert, an Ethicist, a Philosopher or a Musician among others.

The users of the final applications, like in my kids’ example, are implicit supporters of Moravec’s paradox. They are not shocked by a lion in the living room and expect the animal to navigate between the objects without difficulty. As native digitals, they take for granted that the technology understands them and thinks like humans do. We are truly on a path in which multi-experience replaces technology-literate people with people-literate technology.

The fact that, after a while, my kids left me alone admiring the functionality of the app and imagining the wonderful improvements that could be made prove that it needs ML to be added for a more engaging experience! Or that I’m a hopeless geek.

Radu Orghidan

VP Cognitive Computing

Radu is passionate about understanding the inner mechanisms of innovation and using them to solve business challenges through cloud and on-premises cognitive computing systems. He is currently focused on machine learning and generative AI to create systems that enhance users’ ability to understand and interact with the physical and digital reality. In Endava, Radu is also looking at strategic approaches to align novel technical tools with business goals. In his free time, Radu is a keen motorcycle rider and loves spending time with his family.

 

From This Author

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

 

Archive

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture

OLDER POSTS