<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
RSS Feed

Cloud | Radu Orghidan |
01 October 2019


Each cloud service provider currently offers their customers similar computational services and upsell their offering through unique and compelling AI functionality. Due to the complexity of the available services, companies often lack a set of criteria to clearly distinguish between providers, platforms, or product instances in order to make the best decision for their needs.

The main goal of this paper is to offer a structured view of the commercial AI cloud solutions currently on the market. It proposes a trifold perspective according to: the business focus areas, the technologies driving AI and the start-up situation in 2019.

Additionally, performing predictive analysis in a real-time scenario is not a simple task. Besides the technical and administrative aspects that need to be considered, it requires a strong commitment from the main stakeholders. The secondary objective of the paper is to offer a set of guidelines for a successful cognitive computing approach.

This paper is made up of four sections and has been shared over two parts:

Part one, which you can access here, consists of:

Introduction: Distinguishing between Narrow AI and Artificial General Intelligence (AGI) and how Cognitive Computing Systems are a step in the right direction toward achieving AGI.

Chapter 1: which tackles the trends in the enterprise cloud market. It covers issues related to operational costs, main players, market share and growth forecast. It sets the scene for understanding how the complexity of the ML domain and the speed at which these services are evolving makes it difficult to find a reliable, up to date, comparison between the available services.

Chapter 2: This chapter approaches this endeavour by looking at the AI domain from three different perspectives. First, we examine the key initiatives that drive companies to use AI: Insights, User Experience and Process Automation. Second, we analyse the classification of services proposed by the public cloud vendors and examine the position of the AI services among them. Finally, the 2019 start-ups that use the publicly available services are presented and the relation between the unicorns and their preferred technologies is highlighted.

Part two:

Chapter 3: This chapter focuses on the AI functionalities, grouped around the three key initiatives. The services offered by the main players in the cloud market are referred according to their commercial names and compared at a high level.

Conclusion: The paper ends with a set of general recommendations and an Annex which presents a summary of the different concepts offered as a service.

3. AI functionalities offered by the main cloud providers

The cloud providers offer similar tools for the most important AI functionalities, as presented in detail in Table 4. Due to the high competition, all the companies adopted the red ocean strategy which also involves a continuous value-cost trade-off and the differentiation vs low-cost approach.

In this chapter, we present a comparison between these tools together with the differentiation factors of each vendor.



Google Cloud


Computer Vision


Amazon Rekognition, Amazon Textract

Cloud Vision, AutoML Vision

Azure Cognitive, Services - Computer Vision


Amazon Rekognition Video

Cloud Video Intelligence, AutoML Video

Azure Cognitive Services - Video Indexer

Natural Language Processing


Amazon Transcribe

Cloud Speech-to-Text

Speech Services, Azure Bot Service


Amazon Comprehend

Cloud Natural Language, AutoML Natural Language, Document Understanding AI

Speaker Recognition


Amazon Translate

Cloud Translation

Microsoft Bot Framework, Azure Bot Service

Conversational Interface

Amazon Lex

Dialogflow Enterprise Edition

Machine Learning and Deep Learning

Fully Managed ML

Amazon SageMaker

Cloud ML Engine

Azure ML Studio

Auto-generated Models


Cloud AutoML (beta)

Azure ML Service

Table 4. AI Tools offered by the main cloud providers.

Computer Vision

The Computer Vision related functionalities enable computers to gain, ideally, high-level understanding from images and video sequences in a similar way to humans. Computer vision algorithms are designed to automatically identify objects, people, text, scenes, and human activities. Moravec's paradox states that low-level perception skills, such as recognising or following an object in an image, are much more difficult to emulate by computers than high-level, abstract reasoning, such as chess playing. This is especially true in the computer vision domain because of the unpredictable environmental conditions, object topology or shapes, occlusions etc.

Traditionally, computer vision algorithms tackled these problems using camera geometry and predefined geometrical properties of objects together with a limited set of image primitives. Pixel-wise probabilistic classifiers (such as naive bayes), logistic regression or the usual distance-functions were used for object recognition with error rates hovered around 26%.

However, in 2012, convolutional neural networks (CNNs) represented a breakthrough in the history of computer vision, leading to a dramatic increase in the accuracy of the object detection methods. Even though CNNs have been around since the 80s, they became practical only after the machines became powerful enough and the GPUs started being used at a large scale for training and inference.

Nowadays, cloud providers offer both the tools and the necessary hardware to run such computationally intensive algorithms:

Amazon Rekognition can recognise and extract objects, recognise activities and faces, perform highly accurate emotion analysis, capture the paths of people in a scene and even detect inappropriate content. Amazon Rekognition can also be used to recognise text from images, using the Amazon Textract service, launched in May 2019. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.
Amazon Rekognition Video can analyse live streams in order to detect and understand activities and emotions, recognises and tracks people, objects, celebrities, and inappropriate content. Amazon Rekognition Video uses Amazon Kinesis Video Streams to receive and process a video stream.

Google Cloud Vision offers powerful pre-trained machine learning models but also enables the user to automate the training of custom machine learning models (AutoML Vision) both in the cloud and on edge devices. Moreover, Google has a dedicated team of people for high-quality annotation of images, videos, and text.
Cloud Video Intelligence offers pre-trained (API) and trainable machine learning models (AutoML Video) that can be used for recognising objects, places and actions in video streams in near real-time.

Microsoft’s Azure Vision Services provides developers with the necessary tools for image classification and tagging of thousands of objects within an image, for face recognition and OCR capabilities, both in the cloud and on edge devices. Also, it offers advanced algorithms for computer vision.
Video Indexer is a cloud application built on Azure Media Analytics, Azure Search, Cognitive Services (such as the Face API, Microsoft Translator, the Computer Vision API, and Custom Speech Service). Video Indexer enables the extraction from videos of insights such as face detection, celebrity identification, OCR, content moderation, labels identifications, scene segmentation, black frame detection, keyframe extraction, rolling credits etc.

Since OCR belongs to the Computer Vision domain, this recent study [Harding, 2019], shows a comparison between the accuracy and price for three image samples: a hand-written letter , webpage text , and text written on a whiteboard . The authors claim that, in their experiment, Microsoft’s solution performed much better than the other solutions and was also cheaper.

Table 5
Table 5. Accuracy and price comparison between three of the most important cloud providers. Source [Harding, 2019].

Natural Language Processing

The Natural Language Processing related functionalities deal with automatic speech recognition which is used by developers that need to add speech-to-text capability in their applications. The first really successful implementation of an end-to-end speech recognition functionality was released in 2006 by the research group lead by Jürgen Schmidhuber. The group used a recurrent neural network called the long short-term memory (LSTM) together with a connectionist temporal classification (CTC) training algorithm [Graves et. al, 2006]. The same LSTM approach was used in 2015 by Google for speech recognition in its smartphones and in the Google Translate service. LSTM was also used by Apple for Siri and the Quicktype function, by Amazon in Alexa and by Facebook for automatic translations. Currently, developers can use speech-related functionalities from all the main cloud providers:

Amazon Transcribe offers the ability to expand and customise the speech recognition vocabulary, to recognise multiple speakers, to transcribe audio to text in real-time or to record different speakers on different channels.
Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in unstructured text data. In the medical field, Amazon developed the Comprehend Medical tool that extracts complex medical information from unstructured text to identify medical information, such as medical conditions, medications, dosages, strengths, and frequencies from a variety of sources. Amazon Comprehend Medical also identifies the relationship among the extracted medication and test, treatment and procedure information for easier analysis.
Amazon Translate is a tool for language translation automation that uses deep learning models to deliver more accurate and more natural-sounding translation than traditional statistical and rule-based translation algorithms. Amazon Translate allows you to localise content - such as websites and applications - for international users, and to easily translate large volumes of text efficiently.
Amazon Lex is a real-time service for building conversational interfaces into any application using voice and text. It is natively integrated with Amazon Connect and provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognise the intent of the text. Amazon Lex offers to developers the same deep learning technologies as Amazon Alexa for building sophisticated chatbots.

Google Cloud Speech-to-Text enables developers to convert audio to text, to recognise 120 languages, to issue voice command-and-control, to transcribe audio from call centres among other functionalities.
Cloud Natural Language can analyse the structure and meaning of the text and extract information about people, places, and events, and better understand social media sentiment and customer conversations. Natural Language enables you to analyse text and also integrate it with your document storage on Google Cloud Storage.
AutoML Natural Language enables developers to create custom ML models for the classification of English language content in a set of predefined categories.
Document Understanding AI uses ML to analyse documents by classifying them, extracting the essential data and enriching this information. This tool can speed-up the digitisation of companies by performing the classification, interpretation, structuring of electronic or scanned documents.
Cloud Translation is a development tool that can be used to dynamically translate between languages using either AutoML Translation to train custom models or Translation API’s pre-trained neural machine translation.
Dialogflow Enterprise Edition is an end-to-end development suite that can be integrated in websites, mobile applications, popular messaging platforms and IoT devices for seamless conversational interfaces. The speech-to-text functionality supports 120-plus languages while and text-to-speech is supported by 20-plus languages. Dialogflow is a spin of Google’s DeepMind project and has been trained mainly using data collected from YouTube captions.

Microsoft’s Azure Speech Services offer SDKs and API to perform speech-to-text, text-to-speech, and speech-translation.
Speaker Recognition APIs provide advanced algorithms for speaker verification and speaker identification.

Speaker Verification APIs can verify the authenticity of a user in a similar form to a fingerprint.
Speaker Identification APIs can distinguish individually the persons speaking in an audio file, among a set of previously registered speakers.

Microsoft Bot Services provides an orchestration platform for virtual agents adapted to particular use cases. A bot represents the foundation of any conversational platform and Microsoft’s framework provides the foundational pieces that can be further extended using Cognitive Services that enable the bot to see, hear and understand in a more human-like manner.

Machine Learning and Deep Learning

The Machine Learning and Deep Learning services provide software engineers and data scientists with the ability to build, train, and deploy machine learning models. The machine learning workflow includes operations related to data preparation, the choice of the appropriate algorithm, the choice of the platform for training the model, and the platform for the inference. Then, the developer has to scale and manage the cloud/edge/hybrid production environment, to ensure the correct operation and deliver the expected results.

All three cloud providers considered in this article offer end-to-end solutions for building custom machine learning solutions:

Amazon SageMaker framework is a fully managed service that covers the entire ML workflow, including a mechanical-turk approach for difficult data labelling. It provides a visual interface, improved by tools such as SageMaker Neo and SageMaker Ground Truth
Google Cloud ML Engine is a managed service that enables developers and data scientists to build ML models and bring them into production. All functionalities can benefit from the ML libraries such as Tensorflow, PyTorch and scikit-learn, and can run on top of hardware accelerators, such as Google’s TPUs (Tensor Processing Units) or classic GPUs. Moreover, Kubeflow, a high-level python SDK, enable ML architects to build ML pipelines by orchestrating the jobs, the models and visualising the results.
Microsoft’s Azure Machine Learning Studio gives access to a visual workspace that can be used to easily build, test, and iterate on a predictive analysis model. No programming required. Other cognitive services, for vision, speech or language processing, are available and pre-trained models can be accessed through AutoML.


Conclusions and recommendations

This paper presents an overview of the market expansion of the main cloud providers and underpins the current trends with a focus on AI services. The commercial cloud solutions available can be classified according to several criteria: the business focus areas (Insights, User Experience and Process Automation), the technologies driving AI (Computer Vision, Natural Language Processing and Machine Learning and Deep Learning) and the start-up landscape in 2019. Our conclusion is that, currently, the main focus of young companies is on AI solutions to improve user experience followed by process automation services and solutions for fraud detection.

The AI functionalities offered commercially all offer similar performance, with a few weak spots for each provider. In conclusion, the cloud market is not a zero-sum and a hybrid approach is recommended in order to take advantage of the best performance capabilities of each commercial solution.

Performing predictive analysis in a real-time scenario is not a simple task. Besides the technical and administrative aspects that need to be considered, it requires a strong commitment from the main stakeholders. It is crucial to educate and engage the senior management leaders in discussions about strategic priorities in relation to AI automation, building digital twins, predictive analysis, human activity augmentation, etc.

The development team has to find ways to gather data from any relevant physical device within the customer environment. The development process must begin with an exploratory phase that shall bring to light the principal components of the data involved. Then, the most suitable machine learning model and provider(s) can be determined. It is recommendable to develop a Proof of Concept (PoC) first in order to limit start-up costs and to have lean progress by checking periodically the results.


[Graves et. al, 2006], Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks". In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376.

[Harding, 2019] Bill Harding, 2019 Examples to Compare OCR Services: Amazon Textract/ Rekognition vs Google Vision vs Microsoft Cognitive Services

[Barot, 2019], Soyeb Barot, Solution Comparison for Cloud-Based AI Services, ID: G00377714, Gartner, 23 May 2019

Radu Orghidan

VP Cognitive Computing

Radu is passionate about understanding the inner mechanisms of innovation and using them to solve business challenges through cloud and on-premises cognitive computing systems. He is currently focused on machine learning and generative AI to create systems that enhance users’ ability to understand and interact with the physical and digital reality. In Endava, Radu is also looking at strategic approaches to align novel technical tools with business goals. In his free time, Radu is a keen motorcycle rider and loves spending time with his family.


From This Author

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals



  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture