Next time you search for an image on Google, imagine Google actually making the image for you, to your search specifications.
Text-to-image models are pieces of AI-driven software which will do exactly this for you. In the last few months the latest releases have improved vastly in quality, resolution and usability, and the resulting imagery is all over social media and beyond. This brings huge opportunities, as well as significant threats, across the creative industries.
As with every genie, you need to be careful what you wish for, or more precisely how you describe your wish. Results can be unpredictable, though this is what makes the process so exciting.
Text-to-image in action
The best way to explain text-to-image systems is to show some examples of what they can do. I’m going to focus on two areas I know, entertainment and brand licensing.
Product concepts
Character concepts
Marketing concepts
Storyboarding
Fashion
…And the plain weird
How to do it yourself
It all looks so easy! You may be keen to try it our yourself. The most-used systems at the moment are Stable Diffusion, Midjourney and DALL·E 2. I’ve been using Stable Diffusion UI up until now, as it’s free to use if you install it on your computer, and is fairly straightforward to set up.
If you don’t want to go to this trouble, there are now browser-based interfaces which allow you to dive straight in to Stable Diffusion and start generating images. Dream Studio is a popular one.
As I found through experience, the text prompts which you use are all-important. I find the The DALL·E 2 prompt book and Promptomania very useful.
How worried should creatives be?
The software is still very new. There are issues with resolution, quality and unpredictability; though these are all improving fast.
Having been being trained on images found on the web, the systems come with many racial, cultural and gender biases baked in.
Artists with distinctive visual styles are already complaining about their work being ripped off. Brands will need to keep a close eye on developments, though it’s unclear how they will be able to police this explosion of content.
The law has some catching up to do, and legal judgments differ on whether AI-generated images can be copyrighted. On top of this, each piece of AI image generation software has its own terms and conditions.
And who knows how the algorithms will be effected, once they start learning from AI-generated images as well as human-generated ones?
Where’s it going next?
We know how Instagram has influenced food, fashion, and holidays. It’s very conceivable that AI-generated images will do the same for product design, architecture, fashion and more.
While it seems to mainly be creatives and tech-folk using these tools, this will surely change very soon. Users generating their own content based on their favourite brands, either with or without the brands’ blessing, is likely to take off.
Right now it seems inevitable that text-to-image models are going to quickly become an essential creative tool. Like other tools which lower the barrier to entry to a specialism, image generation is going to likely mean reduced costs, new people entering the world of creatives, and potentially a threat to the work of various specialisms. However, right now I believe that the and the democratisation of image-making combined with the cross-pollination of ideas and techniques make for exciting times to come.
New uses are being tried out daily at the moment. Some obvious areas are text-to-video which is already in development, as well as 3D graphics and models and combining text-to-image with text generation systems.
I hope to have some text-to-image results which are worthy of sharing soon. I’d love to see what others come up with and hear about what ideas you might have, find me on Twitter or LinkedIn and let’s learn together!
By Steve McInerny, Director of Sharp Sharp