top of page

Google Whisk: Is It The Coolest AI Image Generation Tool in Town?

Writer's picture: ClickInsightsClickInsights

We are over two months late with this blog and there’s a good chance that many of you already know about this amazing AI tool launched by Google in the US in December last year. But since the tool was made available in more than 100 countries last week, we thought that now would be a good time to introduce it to those of you who have still not heard of it.



So, What is Whisk?

Whisk is Google’s latest AI tool that helps you take automated image generation to the next level. Whisk is based on Google’s latest Imagen 3 image generation model and instead of relying solely on text prompts, it helps you create your desired images by using other images as the base prompt. All you need to do is drag and drop images to start generating new ones.


However, there are a few nuances to how Whisk works. So, let’s take a look at how it works.


How Does Whisk Work?

Google Whisk isn’t a brand new AI model. It uses both Google Gemini and Google Imagen 3 to generate images. It uses deep-learning models to create content based on existing data and allows anyone to customize how the image will come out. It also allows specific image inputs for the main subject, the scene, and preferred art styles.



The Gemini model automatically writes a detailed caption of the chosen images in the background. It then feeds those descriptions into Imagen 3. This process captures the subject’s essence without creating an exact replica. Users can then also remix their subjects, scenes, and styles to create something that is uniquely theirs.


Exploring Whisk

Exploring Whisk feels like a fast-paced rollercoaster compared to other text-based tools that obsess over every little detail to craft an image.



After skimming through the Welcome page (which, let’s be honest, I only half-read), clicking past the email sign-up (no thanks), and briefly pretending to care about the privacy policy, I finally landed on Whisk’s main page. Right away, I saw a prompt with a dinosaur plushie as the image style. The other options? An enamel pin and a sticker. I went with the dinosaur—because, obviously.


Next up: uploading an image for the subject. Feeling adventurous, I uploaded a photo of my smartwatch on my wrist. Big mistake. The third option on the right just kept spinning like it was buffering into another dimension. So, I tried again with a more cartoonish image from my hard drive. This time, it worked instantly, turning into plushie figurines of three mythical creatures. Lesson learned: realism is not Whisk’s thing.



Once my image was generated, I hopped into the editing section, where there’s a text prompt area. I went with the suggested one—“the character is eating ice cream”—and boom! New images appeared with my little creatures happily holding ice cream cones. If you’re feeling more creative, you can scroll down and hit “start from scratch,” which lets you upload your own images or type in your own text. Stuck? There’s an “Inspire Me” button that auto-generates images so you don’t have to think too hard.



Whisk also has a My Library section, where all your creations live. You can toggle the library on or off (in case you don’t want to keep evidence of your AI art experiments), download or delete images, or even copy prompts to use elsewhere. Fun fact: even after I thought my smartwatch experiment was a fail, Whisk actually did generate an image blending the plushie and the smartwatch—it just quietly saved it in My Library. Moral of the story? If things go sideways, check your library before assuming all hope is lost.


Wrapping Up

Using Whisk reminded me a lot of the Microsoft Designer prompt that lets you create Funko Pop!-style figures. Microsoft Designer can churn out both whimsical and realistic images, but it relies only on text prompts and runs on DALL-E 3. So, naturally, I had to test it out. I fed my plushie smartwatch prompt into Microsoft Designer, and let’s just say... the results were nightmare fuel.


Instead of a cute plushie smartwatch, I got watches with disturbingly human faces. Not exactly what I had in mind. Turns out, Whisk’s Imagen 3 model does a better job of deciphering context in images compared to DALL-E 3’s text-based approach.


That said, Whisk isn’t perfect—it can still “miss the mark,” as Google politely puts it. But that’s why the text prompt feature exists: so you can step in and steer the AI back on track when needed.

1 Comment


Harry Gross
Harry Gross
17 hours ago

In Retro Bowl, the power is in your hands! Oversee your team's roster, tactics, and everything else as you guide them to victory. Making the correct choices is your responsibility, and every choice counts

Like
bottom of page