2022 is not the birthtime of AI Painting technology, but it is the “year of AI Painting explosion” that goes from small scale to mass popularity.
The formal name of AI Painting is Text-to-Image, which belongs to a kind of Cross-modal generation: it refers to the conversion of one modality (text, image, speech) into another modality while maintaining semantic consistency between the modalities.
Why people are so surprised by the progress of AI Painting？ From the beginning of this year to now, AI Painting technology has indeed seen continuous breakthroughs.
The CLIP model based on the massive amount of Internet images trained without annotation, to CLIP open source triggered a boom in building AI Painting models, then found the Diffusion model as a better image generation algorithm. And finally used the stable Diffusion model to solve the problem of huge consumption of time and memory resources… All these events can show that the change in AI Painting is measured in days during the year.
The explosion of AI Painting is not only because of the innovation of technology, but also because of the business market and the development trend of gradually facing the public (In addition to selling the images themselves directly, the keywords used to generate the description of the work can also be purchased on many platforms, such as PromptBase). Since the beginning of this year, AI Painting platforms have been appearing one after another.
- DALL-E 2
Both DALL-E and the upgraded DALL-E 2 originated from the OpenAI team, a top international AI research organization co-founded by Elon Musk, which announced the Image GPT model in July 2020, bringing the Transformer model, a breakthrough achievement in natural language processing, to image inpainting and generation tasks.
Not only did they open source their new deep learning model CLIP (Contrastive Language-Image Pre-Training), one of the most advanced image classifications AI available today, in January 2021, but they also introduced a new AI model, DALL-E. By simply entering a text description, DALL-E can paint a series of images that match the requirements. alternative images. It is the first platform that enables ” Text2Image “.
DALL-E 2 is its upgraded version. It features a realistic style, is simple enough to operate, high degree of completion, and is fast enough to serve as a search engine: 10 images (1024 × 1024) are generated in 60 seconds, with infinitely extendable variations and even partial erasure to regenerate.
New users generate 200 images per month for free, 60 images per month after that, and 460 images for $15 to follow.
In terms of copyright, OpenAI, the organization to which DALL-E 2 belongs, has listed several strict restrictions: the copyright of image generation ultimately belongs to OpenAI; it is for personal learning and exploration only, not for commercial use, and cannot be used to produce NFT; the results of face generation cannot be published on social media that are too realistic, and there is a risk of portrait infringement.
Midjourney is the platform that won the competition with the generated painting “Space Opera” over human painters.
It is characterized by a simple page and a wide range of options. Midjourney is built on the communication software Discord. After typing “/image” in the dialog box, you type the descriptor in English and hit enter. The process is like chatting with an AI. 60 seconds later, you will receive 4 rendered images in the dialog box. If you are not satisfied with “Figure 1”, you can click the “U1” button to add more details and press the “V1” button to extend the changes until you are satisfied.
Midjourney has a creative community, zero thresholds for interaction and very good output, and the output style is very clearly optimized for portraits and has a clear stylistic tendency.
Each new user can generate 25 images, and if you want to generate more images, you need to pay $10/month for basic membership, which allows you to generate 200 images, and $30/month for standard membership, which allows you to generate unlimited images.
In terms of copyright, if you are a free user, the copyright of the images belongs to AI, and you can take the images for commercial use after paying $30 per month. However, if you make a profit of $20,000 or more, you will have to give Midjourney a 20% share of the profits. Because it’s a paid business, Midjourney’s product iterations are very fast.
Cutout.pro is an image processing platform that has been developed for many years. The platform’s AI Painting function uses the latest stable diffusion algorithm. The function page is easy to operate, and there are many auxiliary tools: you can upload an initial image for modification, there is an automatically generated example on the left to guide you to enter a prompt, and there are 16 templates with distinctive styles on the right to choose from.
This platform is also a web-based application that does not require users to install software on their personal computers. It can be done using the cloud server provided by the website. The generation speed is also very fast, it takes only a few seconds to generate a picture, and if you are not satisfied you can keep clicking “Generate Again” to generate a new picture.
In effect the website generates very good pictures, and as the keywords increase the pictures become more and more detailed. Stylistically the characters and backgrounds fit well into different styles, from 17th-century Dutch painters to current cyber and Japanese anime.
In terms of cost: users can generate unlimited free images with a watermark, HD images need to consume points. Copyright is the most common cc0 1.0 policy.
- Disco Diffusion
If DALL-E 2 is good at realism, Disco Diffusion is better at portraying atmospheres, landscapes and various conceptual arts.
Disco Diffusion can achieve the most complex keyword descriptions, supports a lot of self-setting parameters, and is a huge treasure trove to be explored, but it takes a long time to become a picture, and the interface is relatively complicated: you don’t need to download any software, it runs directly on your browser, and it’s free now, but the operation is rather cumbersome. A whole web page is full of code, difficult to operate, generally you need to wait for half an hour, if you stare at the screen, you will see the image from full of noise, gradually become clear, and have details up. During use, Disco Diffusion may prompt you to free up enough running memory on your computer, but because it runs on computing resources such as GPUs provided by Google for free, it does not require much hardware from the user’s computer, just open your browser and run. In addition to just typing text and letting the AI play freely, you can also pad an Initial Image in advance to constrain the AI’s creation.
Disco Diffusion is theoretically commercially available, and its program is based on the MIT open-source protocol, which allows all Internet users to use, copy, modify, and even sell the generated image for free. However, there is a risk that your descriptive words will lead to controversy about plagiarism.
- Stable Diffusion
Stable Diffusion is considered the most powerful AI drawing tool available, and is fully open source, with many “variants” on the market, such as Waifu Diffusion, which is dedicated to generating anime portraits. It requires only a consumer-grade 8GB 2060 graphics card to achieve DALL-E 2-level image generation, and the generation efficiency can be increased by 30 times. The style is significantly more artistic, and the manual operation is not difficult.
Fee: There is a free generation quota of 200 sheets, after which you need to pay for points (the more complex the generation, the larger the size, and the more points are consumed).
Copyright requirements: you can commercially create your own images, but if the images are generated through DreamStudio, they automatically become CC0 1.0 licensed, so that the service provider Stability.ai can also process your images without paying or even going through your consent, and will also become a general public domain royalty-free image resource. If you deploy your open-source Stable Diffusion and consume your GPU resources, you own the copyright.
Google was one of the first companies to work on AI Painting: they launched Deep Dream back in 2015, and recently they released two models at once: Imagen and Parti.
Imagen’s image generation has a diffusion model similar to DALL-E 2, but the input is based on a large AI language model – allowing for better image generation results from text descriptions due to higher language understanding.
The new AI model Parti (Pathways Autoregressive Text-to-Image) attempts to use an alternative architecture (autoregressive) that more closely resembles the functionality of large language models that predict appropriate new words based on the context of previous words and sentences or paragraphs. Parti can accurately translate long and complex text inputs into images, showing that it can better understand the relationship between language and subject matter.
The release of Parti was accompanied by a blog post describing the process of creating images using Google’s text-to-image model, which can be accessed here: https://blog.google/technology/research/how-ai-creates-photorealistic- images-from-text/
However, Imagen and Parti have not yet been released in beta, and their specific usability will have to wait a while.
- NovelAI and others
If you want to try the anime style, then you should try NovelAI. AI Painting feature charges: $10 for 1000 tokens per month, $25 for 10,000 tokens per month, and each painting consumes tokens. NovelAI is cc0 copyright, i.e. public copyright.
In addition to these applications, more models and commercial applications are emerging in also: NUWA-Infinity by Microsoft, Make-A-Scene by Meta, and other platforms such as NightCafe Creator and WOMBO Dream.
- Indicator Comparison
|DALL-E 2||Midjourney||Cutout.pro||Disco Diffusion||Stable Diffusion||NovelAI|
|Output speed||Fast||Fast||Fast||Slow||Very fast||Fast||Fast|
|Style||Realistic||Stylized||Diverse||Atmospheric / Landscape||Diverse||Conceptualized||Anime|
While AI Painting is getting a lot of attention and popularity, of course, it also faces problems such as copyright, privacy, plagiarism and so on. But it is undeniable that a big era belonging to AI Painting has come.
Where does the road of artificial intelligence ultimately lead to? We may borrow the Slogan on the home page of Stability.AI to answer “AI by the people, for the people”.