Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This thing's ability to produce entire infographics from a short prompt is really impressive, especially since it can run extra Google searches first.

I tried this prompt:

  Infographic explaining how the Datasette open source project works
Here's the result: https://simonwillison.net/2025/Nov/20/nano-banana-pro/#creat...


This is legitimately game changing a feature in my SaaS where customers can generate event flyers. Up until now I had Nano Banana generate just a decorative border and had the actual text be rendered via Pillow controlled by an LLM. The result worked, but didn’t look good.

That said, I wonder if text is only good in small chunks (less than a sentence) or if it can properly render full sentences.


It can render full sentences.


It didn’t do so well at finding middle C on a piano keyboard:

https://gemini.google.com/share/c9af8de05628

I did manage to get one image of a piano keyboard where the black keys were correct, but not consistently.


I've tried similar stuff such as: "Show a piano with an outstretched hand playing a Emaj triad on the E, G#, and B keys".

https://imgur.com/ogPnHcO

Even generating a standard piano with 7 full octaves that are consistent is pretty hard. If you ask it to invert the colors of the naturals and sharps/flats you'll completely break them.


reflection seems slightly wrong as well


Fooled me because it was locally correct!


It even worked really well at creating an infographic for one of my quirkier projects which doesn't have that much information online (other than its repo).

"An infographic explaining how player.html works (from the player.html project on Github). https://github.com/pseudosavant/player.html"

And then it made one formatted for social: "Change it to be an infographic formatted to fit on Instagram as a 1:1 square image."


Is the infographic accurate in terms of the way datasette wprks?


Almost entirely. I called out the one discrepancy in my post:

> “Data Ingestion (Read-Only)” is a bit off.


It’s subtly incorrect. R/w permissions for example are described incorrectly on some nodes.


Then the question becomes, can it incorporate targeted feedback, or is it a oneshot-or-bust affair?

My experience is that ChatGPT is very good at iterating on text (prose, code) but fairly bad at iterating on images. It struggles to integrate small changes, choosing instead to start over from scratch, with wildly different results. Thinking especially here of architectural stuff, where it does a great job laying out furniture in a room, but when I ask it to keep everything the same but change the colour of one piece, it goes completely off the rails.


Nano Banana is really good at iterating on images, as shown by the pancake skull example I borrowed from Max Woolf: https://simonwillison.net/2025/Nov/20/nano-banana-pro/#tryin...

I've tried iterating on slides with test on them a bit and it seems to be competent at that too.


I would assume it depends on how it generates the images.

I've used Claude to generate fairly simple icons and launch images for an iOS game and I make sure to have it start with SVG files since those can be defined as code first. This way it's easier to iterate on specific elements of the image (certain shapes need to be moved to a different position, color needs to be changed, text needs an update, etc.).

FWIW not sure how Nano Banana Pro works though.


Claude does image generation in surprising ways - we did a small evaluation [1] of different frontier models for image generation and understanding, and Claude is by far the most surprising in results.

[1] https://chat.vlm.run/showdown

[2] https://news.ycombinator.com/item?id=45996392


You can use targeted feedback - but it's on the user to verify whether the edits were completely localized. In my experience NB mostly tends to make relatively surgical edits but if you're not careful it'll introduce other minute changes.

And that point you can either start over or just feather/mask with the original in any Photoshop type application.


None of it was accurate.

But boy was it beautiful.


Funny thing to say considering the author of Datasette himself says it's accurate.


It would be great if Google could make SynthID openly available so OpenAI etc could also implement it. Then websites like Facebook, or even local browsers, could implement an "AI warning".


I’ve been really excited for you infographic generation. Previous models from Google and openAI had very low detail/resolution for these things.

I’ve found in general that the first generation may not be accurate but a few rolls of the dice and you should have enough to pick a style and format that works, which you can iterate on.


Game changer for architecture diagrams.


I'm finding it bad at instruction following for architectural specs (physical not software), where you tell it what goes where, and it ignores you and does some average-ish thing it's seen before. It looks visually appealing though.


Did you check if the SynthID works when you edit the photos with filters like GrayScale?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: