Artificial Intelligence (AI) is penetrating every pore of our lives with its rapid development. Industry 4.0, robotisation and digitalisation, in which AI has a lion’s share, have caused the transformation of many jobs and even the disappearance of some. Until now, jobs that require a high degree of creativity and specialisation seemed untouchable. But is that really the case? The latest findings suggest that AI will bring a degree of automation to even the most demanding professions, allowing professionals to focus on what they do best – creativity and problem-solving – instead of performing repetitive tasks. Even developers, writers or artists won’t be excluded.
LARGE LANGUAGE MODELS FOR TEXT GENERATION
Generative models that create content are able to produce written work, such as essays, news articles, poems and prose. They can also generate realistic visual content based on natural language instructions. In the near future, we can expect these models to also create video and audio material. The most recognisable pioneer in this field is certainly the company OpenAI. In 2018, it presented its first language model of the GPT series, the Generative Pre-Trained Transformer, designed to continue a given text in the most meaningful way. For example, given the prompt “Once upon a time, a dog Piki met a fox with a broken paw,” the model can generate a fairy tale of any length. In addition, these models can translate and summarise texts, answer questions, and provide logical explanations.
These models belong to the family of unsupervised learning models, meaning they do not need a labelled training set for learning. Instead, they learn by trying to predict the next word in each sentence of a huge corpus of freely available text. The result of this type of training is their ability to complete a given cue with its most natural progression.
A breakthrough in the development of generative models occurred five years ago in the field of natural language processing with the invention of so-called attention mechanisms. These enable language models to self-refer to past parts of the text much more effectively and thereby understand a wider context, allowing them to generate longer, more convincing and coherent content. A neural network architecture that uses attention mechanisms is called a transformer network.
After the original GPT model was published, it was quickly followed by others. GPT-2 and GPT-3 came with even more parameters and a larger training set. Soon, models like Google’s T5 and Pathways, DeepMind’s Chinchilla, and Meta’s NLLB-200, which can understand up to 200 languages, were developed. The texts generated by the latest models are incredibly convincing and often difficult to distinguish from text written by people. Most recently, OpenAI introduced ChatGPT, their most knowledgeable and human-like chatbot created so far, which impressed not only the AI world but quickly became popular also among the general public.
Having been trained on huge text corpora, these models hold a significant amount of information. For example, GPT-3 was trained on 45TB of text, including the entire Wikipedia platform, a large corpus of books and random internet pages. But even though such models ‘know’ a lot, we must be careful with the information they serve us because the model may just as well make something up. It is also worth noting that training such models can take weeks and an enormous amount of computing resources, so their development is increasingly limited to big players like OpenAI, Google, Meta, Amazon or DeepMind.
How these models will help creative professions in the future was well demonstrated by Boris Cergol, our Adriatic Region Head of Data, at this year’s Slovenian Advertising Festival. Using the GPT-3 model, he created an idea for an innovative imaginary product, its target customer segment, product name and slogan, corporate values of the manufacturer, a sales pitch for potential investors and content for a TV ad in less than ten minutes. He has also given an insightful talk on generative AI that you can read about and watch here.
CAN LANGUAGE MODELS CODE?
Language models can generate not only text but also programme code. To some extent, general language models such as GPT-3 are already capable of it, but their developers have started training specialised models adapted to the generation of programme code. Such neural networks maintain the same architecture, and their training set contains a whole bunch of code in different programming languages in addition to natural language corpora.
For example, the training set for the Codex model, also developed by OpenAI last year, includes the entire GitHub platform. Codex and other models that followed, such as Facebook’s InCoder or Salesforce’s CodeGen, are capable of not only continuing programme code from a given cue but also of generating code from natural language comments, generating functions or classes from so-called docstrings, generating unit tests from given code and/or specifications, restructuring and interpreting code, and finding vulnerabilities in software.
Although such models are still in their infancy, there are indications that they will help developers perform routine and repetitive tasks in the not-so-distant future. The Codex model is already available as an engine behind GitHub Copilot, an extension for the Microsoft Visual Studio code development environment, which allows developers to intelligently complete several tens of lines of code directly from the text editor. With this kind of automation of repetitive routine tasks, developers can save time that would otherwise be spent searching for solutions in documentation or on platforms like Stack Overflow.
VISUAL ASSISTANTS CAN EVEN DRAW, PAINT AND MAKE PHOTOS
Recently, the public focus has extended from text generation to visual content. The most visible breakthrough was once again achieved by OpenAI with the DALL-E model, which can generate convincing visual material from instructions given in natural language. It produces a wide variety of content, from real to imaginary situations in practically any style, such as a photorealistic still life on the moon or a teddy bear riding across a rainbow in the company of a unicorn in the style of Van Gogh.
The DALL-E model was quickly followed by DALL-E 2 and Google’s Imagen, but the real revolution was recently caused by the company Stability AI. In contrast to the prevailing practice, they released Stable Diffusion, their model for generating image material from text, to the public with a permissive license.
It will be very intriguing to see how the models develop further and what we will be using them for. As a final test: can you guess whether the present text was written by a human or AI?