In September 2022, we were invited to host a discussion about “The Future of Art & AI in Game Production” at the XDS conference in Vancouver. XDS stands for External Development Summit and has been hosted in Vancouver since its inception. It was started 10 years ago because there was nothing like it for the games industry. The main purpose is to connect big game studios and publishers with external service providers for engineering, art, QA, UX and localization. Watch this video looking back at the past 10 years of XDS to learn more about the event.
With this article, we would like to give those who could not attend the discussion the opportunity to still benefit. The participants agreed to have their names and the outcome publicized.
Together with leading industry experts, we set out to explore how the currently stunning and rapid progress in AI-assisted art might change the future of how games are being made. I lead the conversation for 90 minutes and was joined for a very engaging discussion by:
- Arno Schmitz, Lead Character Artist at Guerrilla
- Chris MacDonald, Art Director of External Development at Electronic Arts (EA)
- Declan Paul, COO at Airship Images Limited
- John Mayhew, Director of Development, External Art at Bioware
- Radu Orghidan, Vice President Cognitive Computing at Endava
Photo credit: Chris Wren / XDS
We started by all of us talking about our experience with AI-assisted art, and it became very apparent that everyone – either personally or with their team – had done some experiments with MidJourney, Stable Diffusion or similar text-to-2D-art generators. The common opinion was to see them as a tool for inspiration or even creating mood boards. There was an expectation that this technology will evolve extremely fast and also reach other aspects of the art pipeline. Interestingly, Declan mentioned approaches to create a whole website, game screenshots and UI rather than ‘just’ concept art with those tools.
At Endava, we inspired a few people to create a #midjourneypoem by feeding the AI only a poem or lullaby rather than elaborate prompts. Chris touched on other examples, like how AI already disrupted stock trading and the latest powerful AI additions to photoshop. We agreed that even now AI assets are already used for the illustration of articles and as a basis for creating video games.
Radu highlighted the differences between Narrow AI and General AI and the importance of context. Especially Narrow AI, with its small and specific tasks or pattern-finding, is making rapid progress. He specifically pointed out that art and 3D generation are some of the most promising areas for research and future developments in the field. Earlier in the year, Radu already presented some findings of our research work on how to create 3D assets for games with the help of neural networks.
After these initial thoughts, I had the group vote on a few claims by show of hands.
It seems that those statements weren’t very controversial in our group, but they sparked further discussion. I also introduced the idea that towards the end of the session, every participant should write down a similar claim and we would all vote on those again.
From there, we headed right into the nature of how we work as artists and if AI will change what creativity or our human role in it is. AI art has already reached the point where it can surprise us. In a lot of its output, we can recognize the style of famous artists or even how the AI-generated result was probably influenced by certain pictures or artifacts of known IP. But is this so different to how human artists work and base their work on experience and influence of existing art? AI can for sure result in happy accidents, but can it be controlled and iterated into a final result? As an artist, after all, you want and need control over the output, so this technology won’t be stuck in ideation.
We discussed how difficult it is to reproduce the same asset or character given the wide range of data used for training the text-to-image AI models. Everyone agreed upon the necessity to build narrower models, fine-tuned to a given style. One application could be to transfer realistic assets into a stylized version matching your game’s art direction and trained using the work of your studio’s artists. The ability to create personalized models, focused on a specific look and feel, is the next step in the evolution of art-generating AI. There already are solutions for text inversion, such as Lexica.art, that can produce the prompt used for the generation of an AI image.
Let’s take a brief technical excursion into the magic behind AI art with our VP Cognitive Computing Radu:
Image-generation models combine the breakthroughs of the research work on zero-data learning, natural language understanding and multimodal learning.
The zero-shot learning approach refers to the ability of a model to generalize its training space to unseen object categories. These models combine the observed and non-observed categories through auxiliary attributes which encodes some distinguishing properties of the objects. The encoding of these properties is a representation of compressed data that defines a new dimension, known as latent space. The latent space, also known as embedding space, is useful for clustering similar representations or for interpolating data to generate new samples. The idea of combining word and image embeddings with the attention mechanism of transformer models, weighting the significance of different parts of the input data, lead to the creation of CLIP (Contrastive Language-Image Pre-training).
CLIP is able to solve bi-directionally the correspondence between text and images, and its architecture is a fundamental part of image generation models. However, given the subjectiveness of the human language, this approach can lead to an abundance of false positives. Thus, using CLIP for stylization needs additional regularization that can be obtained through tools such as generative adversarial networks (GANs).
An important question arising out of this is how it can play into a product or IP. Can artists opt out of being included in the training? Does a human need to do further editing in order to be able to secure rights as part of a product, or do we need to create custom, trained versions that only rely on your own data pool? Do we need a persistent metadata trail of how specific pieces of art were created? If you asked the AI to create a 3D car model that looks like a Ferrari, will you be able to legally protect that piece of art or would you have to restrain yourself to ask the AI to create a sportscar with certain features? Of course, this whole discussion relates to how we currently regulate these questions and where we draw boundaries. What is inspiration and what is piracy? Surely, we won’t have global regulations for this before AI truly manifests itself as a common tool.
Rather than diving into the various phases of an art production pipeline, our discussion focused on these foundational questions. The notion of how to train your AI and what data to use also directly leads into the topic of diversification, especially considering six middle-aged men sitting around the table. On the one hand, an AI should not be biased and be based on a well-balanced dataset; on the other hand, AI needs to also take the context of the user into account to offer its best results. Think about it like entering a search query into Google – often, it wouldn’t be as helpful if it didn’t also consider your current location and language.
So, what does it mean if I only ask the AI for beautiful architecture without any additional specifications? Should it take into account what I prompted earlier and what game I am making and that I might be a European located in Amsterdam? Or should it rather help me as a person to overcome the limitations of my own experiences and surroundings? While this becomes much more of an issue the more general the AI is, it should always be considered.
Even the best discussion has to come to an end eventually, so we entered the final phase. Trying to be a bit more concrete again, we each wrote down a claim on the table in front of us in order to vote on it. During the vote, we still had some debate and contextualization going on. Thus, similar to the prompts of AI art tools, we quickly realized that with such a complicated topic, statements need quite a bit of extra elaboration to be judged. You can clearly see this if you compare the last two statements. Nevertheless, the following image shows that we had very clear votes across the board.
Only time will tell how accurate these predictions are, but if the expected time frame of 3 to 5 years for a massive impact on the games production pipeline turns out to be true, it shows how pressing it is to invest in this technology and approach. In the final minutes of our discussion, we addressed the elephant in the room: where will this investment come from? Will it be big publishers creating proprietary tools across their studios? You can see efforts like EA Seed already pushing the envelope.
Will it be service providers aiming to get a competitive advantage or start-ups coming out with amazing services? While big software companies, like Unity or Adobe, also heavily invest in this field, they aren’t usually focused on very isolated issues or a specific production pipeline. So, there is a great opportunity for smaller game studios to embrace AI as an integral part of their games, like AI-generated biomes and environments or individually crafted items.
Undoubtedly, we are looking at a huge impact and a massive potential to change how games of the future will be made!