NVIDIA’s research into artificial intelligence has produced a number of really cool AI-powered tools, one of the most recent ones being the so-called StyleGAN2, which could very well revolutionize image generation as we know it.
StyleGAN2 was presented by NVIDIA at this year’s (virtual) Conference on Computer Vision and Pattern Recognition which took place in the middle of June. They aired a video showcasing the StyleGAN2 in action - viewable here - and dang, is it cool. But just to backtrack a bit first, for those who are unsure of exactly what they’re looking at (like I was initially)…
What is a GAN?
A GAN - or Generative Adversarial Network - is a type of machine learning that was developed only six years ago. This form of AI essentially works by pitting two neural networks against one another in a kind of game (a neural network being an algorithm made to recognize patterns within a set of unordered data, which they tend to do with a significant level of accuracy after a given training period).
An example of this would be the recognition of a human face which is learnt by a neural network once it has “studied” a bunch of photos of actual human faces.
How Does A GAN Work?
The two neural networks “playing” together in a GAN are referred to as the Generator and the Discriminator. The former generates real and fake data points (such as images) which it then passes to the Discriminator, who must discriminate between them and identify which data points are real or fake. During training, the Discriminator learns which points are indeed real and which are fake.
But the Generator’s goal is to try fool the Discriminator with these data points, and with time the “game” will become more and more challenging; the fake data points will become harder to correctly identify. The result, in this case, is an image (or bunch of images) which become increasingly more accurate until a final product that looks super authentic is produced.
NVIDIA’s StyleGAN2
StyleGAN2 is NVIDIA’s most recent GAN development, and as you’ll see from the video, using so-called transfer learning it has managed to generate a seemingly infinite number of portraits depicting different human faces in an infinite variety of painting styles. The video shows an immensely satisfying demonstration of how StyleGAN2 can shift seamlessly between these procedurally-generated portraits, each one of them truly beautiful.
It is explained in the video that the StyleGAN2 neural networks succeed in “separating elements of style from aspects of content.” It was apparently trained to do so using a dataset where every image was unique in both style and subject."
The subjects in these images appear to retain the same pose, eye-gaze direction, lighting, and facial expression whilst everything else from hair color and style, to face shape, age, and even gender is smoothly transitioned in realtime as the mouse cursor moves across a “palette” of faces - a “style map.”
The main point here is that all of the complexity behind this GAN has been condensed into a tool that can be used for fluid image generation, by anyone. On that note, the particularly great thing about this is that all the relevant code used by StyleGAN2 has been made openly available so that anyone can make use of it via Github.
The Impact Of StyleGAN2
So what could this mean for the way we generate images in this future? While the video showcases StyleGAN2 working its magic with a bunch of portraits of human faces, there’s really no limit to what kind of image can be played around with here. As mentioned in the vid, this tool could open up opportunities in a whole range of industries wherein visual concept creation plays a central role.
One of the most prominent benefits would be the sheer efficiency made possible. Companies which need to produce a large repository of images of their products for a website or catalog in a short period of time could easily do so. Or an entire cast of anime characters - each totally unique - could be conjured up within minutes. The process which forms the foundation of visual concept creation behind video game design could be completely revolutionized.
The Consequences Of StyleGAN2
An obvious concern that may spring to mind at this point, however, is the potential negative impact this kind of AI-driven image generation could have on jobs at an individual level. While it’s tempting to get mesmerized by the possibility of an astounding drop in time, effort, and money spent - especially for larger companies - the gravitation towards such tools and away from employing artists, photographers, models, and so on could leave some content creators worried for their own roles in their respective industries.
On the other hand, perhaps this same tool could also prove immensely helpful for smaller-scale or freelance creators who might benefit from the boost in ideation and instant visualization that would save them additional time and resources, too.
Despite its infancy as an AI development, GANs such as this one are already boasting some impressive achievements, which - as always - might best be approached with a level of cautious optimism for now.
Source: NVIDIA