Buckle up ladies and gentlemen, we have a new AI image generator in town, and it’s surprisingly good.
It’s surprising because it comes from Google and because it’s not the basic, somewhat ugly, lazy generator you’re used to seeing in Bard. It’s also hidden from the general public —but that doesn’t mean you can’t use it.
Its name is ImageFX and it’s Google’s latest venture into the realm of AI image generation. It’s available via Google’s AI Test Kitchen, an experimental platform that allows users to interact with Google’s projects while they’re still in development.
Despite being in its early beta phase, ImageFX provides amazing results in terms of accuracy and photorealism. Its availability, however, is confined to specific regions, namely the U.S, Kenya, New Zealand, and Australia, and its usage is restricted to English, demonstrating Google’s cautious approach and its desire for a controlled environment for user feedback and system refinements.
Those living outside the allowed regions could bypass geographical restrictions with methods like VPNs or proxies—at their own risk.
Powering ImageFX is Imagen 2, a sophisticated AI model developed by Google’s renowned AI lab, DeepMind. Imagen 2 is designed to interpret and visualize textual prompts, boasting capabilities to produce diverse images and styles. Google asserts that Imagen 2 sets a new standard for image quality among its generation of AI models.
The introduction of ImageFX is part of Google’s broader strategy to explore various facets of generative artificial intelligence. It joins a suite of specialized tools, including MusicFX for music creation and TextFX for stylized text generation.
Google vs. Dall-e 3 vs. MidJourney
Google’s ImageFX marks a notable entry into the realm of AI-driven image generators, directly competing with established players like Dall-E 3 and MidJourney. A distinct edge for ImageFX in its early beta phase is its cost-free access, diverging from Dall-E’s integration with ChatGPT at a monthly rate of $20, and MidJourney’s annual subscription nearing $100.
While cost-effectiveness is a big factor, it’s the comparative features and output quality that sets these tools apart. ImageFX excels in producing hyperrealistic images, surpassing Dall-E 3’s somewhat cartoonish renditions and MidJourney’s focus on aesthetically appealing visuals.
But just because ImageFX is free doesn’t mean it’s bad. ImageFX offers unique features like seed control, allowing users to finely tune the creative process by adjusting the initial noise configuration. This level of control is unmatched by Dall-E 3 or MidJourney, allowing users to make subtle adjustments while maintaining the core elements of the image.
Additionally, ImageFX can highlight key prompt words and suggest creative alternatives—a feature not available from its competitors.
ImageFX does have its limitations, however. The tool exclusively generates square images, whereas Dall-E 3 and MidJourney provide flexibility in aspect ratios. Moreover, unlike MidJourney, ImageFX does not support image editing features like inpaint and outpaint, limiting its versatility. Finally, the conversational feature of Dall-E 3—which allows beginners to instruct the model in natural language—contrasts with the keyword-based prompting required by ImageFX and MidJourney.
The approach to prompting differs significantly among these models, too. ImageFX does not support negative prompting, which lets users specify what to exclude from the image. MidJourney offers this functionality, adding a layer of precision to the creative process. Dall-E 3 also lacks direct negative prompting, but its conversational interface allows users to guide the model indirectly, offering a different approach to refining image outputs.
An image is worth a thousand words
Decrypt got access to ImageFX and was able to compare its generations against MidJourney and Dall-E 3. We used the same prompt for all models and the results below are always presented in the same order from left to right: First is ImageFX, second is MidJourney, and third is Dall-E 3.
Photorealism:
Prompt: Photo of a cryptocurrency trader with worried expression
Both ImageFX and MirJourney generated pretty realistic results. However in terms of style, ImageFX seems photorealistic whereas MidJourney looks a bit more hyperrealistic, meaning the first is more true to life whereas the second is more artistic, with saturated colors, exaggerated bokeh, etc.
Dalle-3 fails to generate photos. Instead it created a 3d render focusing more on the content. It’s easier to tell it was a crypto trader because of the charts in the background, but it was definitely not a photo.
Illustrations:
Prompt: Illustration of a mysterious bear surfing a cybernetic wave
This prompt was a little bit more abstract to test how models interpret non-standard ideas. ImageFX and MidJourney generated the most aesthetically pleasing images, but MidJourney looks more like a render than an illustration and ImageFX tried to capture the essence of what a cybernetic wave could be. Instead, MidJourney associated the term “cybernetic” to the bear. Dall-e 3 captured the essence more closely. It was obviously an illustration, and it resembles the cybernetic aesthetic, but the bear’s morphology is wrong, and the image lacks in quality against its competitors.
Long natural-language:
Prompt: Highly detailed photography scifi close up of a mysterious computer expert working on a laptop . Behind him, an FBI agent awaits to capture him wide shot photorealistic intricate
In order to conduct this comparison, the prompt for MidJourney was changed to “highly detailed photography scifi close up of a mysterious computer expert working on a laptop with an FBI agent behind him awaiting to capture him, wide shot, photorealistic, intricate.”
MidJourney refused to generate images under the first prompt.
ImageFX generates a nice, detailed photograph respecting all the details. MidJourney didn’t generate a “mysterious” computer expert. It also sticks to its signature style with excessive bokeh and attention-grabber light trails or rain droplets on the different generations. This was the best example, as the rest seemed to depict an astronaut, a cyberpunk marine, or something similar. Dall-E generates an image in which all the elements of the prompt are recognizable—the FBI logo , the mysterious computer expert, etc.—but it is not a photo, and the anatomy of the hacker is wrong, featuring the typical spaghetti fingers.
Text in Image:
Prompt: A futuristic city with a neon sign saying “EMERGE by Decrypt”
Usually, the best text generator is Dall-e 3 by far, However, in this specific case and under the conditions set by the comparison’s methodology, it didn’t properly write the text. ImageFX couldn’t generate the whole phrase—its text generation capabilities are there, but probably are the least impressive of the bunch.
That said, Dall-E and ImageFX were the best at capturing the essence of what a futuristic city is whereas MidJourney generated an aesthetically pleasing city but not one that’s futuristic at all.
Conclusion
AI aficionados are now blessed with a cornucopia of AI models that serve many needs. With most offered for free, there’s no need to pick winners—each has a specific use case that makes it stand out.
ImageFX is the best of the three if you don’t want to spend money. It is also the best in terms of photorealism.
MidJourney is not good at respecting the prompts but is perfect for those looking for aesthetically pleasing images.
Dall-E 3 is the best for beginners who want to generate renders and don’t want to even think about prompt engineering, keywords and parameters and instead just want to talk to its AI as if it was just another friend.
But yeah, if you want a conclusion, we liked ImageFX—a lot.
Edited by Ryan Ozawa.