This is legitimately game changing a feature in my SaaS where customers can generate event flyers. Up until now I had Nano Banana generate just a decorative border and had the actual text be rendered via Pillow controlled by an LLM. The result worked, but didn’t look good.
That said, I wonder if text is only good in small chunks (less than a sentence) or if it can properly render full sentences.
Even generating a standard piano with 7 full octaves that are consistent is pretty hard. If you ask it to invert the colors of the naturals and sharps/flats you'll completely break them.
It even worked really well at creating an infographic for one of my quirkier projects which doesn't have that much information online (other than its repo).
Then the question becomes, can it incorporate targeted feedback, or is it a oneshot-or-bust affair?
My experience is that ChatGPT is very good at iterating on text (prose, code) but fairly bad at iterating on images. It struggles to integrate small changes, choosing instead to start over from scratch, with wildly different results. Thinking especially here of architectural stuff, where it does a great job laying out furniture in a room, but when I ask it to keep everything the same but change the colour of one piece, it goes completely off the rails.
I would assume it depends on how it generates the images.
I've used Claude to generate fairly simple icons and launch images for an iOS game and I make sure to have it start with SVG files since those can be defined as code first. This way it's easier to iterate on specific elements of the image (certain shapes need to be moved to a different position, color needs to be changed, text needs an update, etc.).
Claude does image generation in surprising ways - we did a small evaluation [1] of different frontier models for image generation and understanding, and Claude is by far the most surprising in results.
You can use targeted feedback - but it's on the user to verify whether the edits were completely localized. In my experience NB mostly tends to make relatively surgical edits but if you're not careful it'll introduce other minute changes.
And that point you can either start over or just feather/mask with the original in any Photoshop type application.
It would be great if Google could make SynthID openly available so OpenAI etc could also implement it. Then websites like Facebook, or even local browsers, could implement an "AI warning".
I’ve been really excited for you infographic generation. Previous models from Google and openAI had very low detail/resolution for these things.
I’ve found in general that the first generation may not be accurate but a few rolls of the dice and you should have enough to pick a style and format that works, which you can iterate on.
I'm finding it bad at instruction following for architectural specs (physical not software), where you tell it what goes where, and it ignores you and does some average-ish thing it's seen before. It looks visually appealing though.
I tried this prompt:
Here's the result: https://simonwillison.net/2025/Nov/20/nano-banana-pro/#creat...