Learning to paint hyper realistic paintings is something that takes years if not decades of hard work. Learning how to formulate your queries in a way that the algorithm outputs what you want takes days at worst.
If you want something unique and abstract, you're going to need to go through a lot of trial and error to get what you want. That's still a lot easier than teaching yourself how to create such art.
"Graphical starting states" in this instance isn't as hard, a very rough MS Paint picture of the general shapes you want things to appear in is enough. Alternatively you can grab rough cutouts from stock art, position them right, and the algorithm should figure out how to turn it into a single, flowing picture.
Take a look at these examples (https://huggingface.co/spaces/huggingface/diffuse-the-rest/d...), they're far from perfect but the autocompletion is done quite well. It should be noted that the demo application doesn't expose a lot of the flexibility the underlying model provides (like blend strength and such) but I don't know a free online alternative that does.
Trollface:
Disambiguation of lines in the source image is poorly executed. The model appears confused as to whether those lines are indicative of depth, or lighting artifacts. The shape and perspective are poorly chosen, and in all the resulting images the lighting arrangement is quite inconsistent.
The ears are completely unspecified, so too the nose. This is somewhat of a deliberate omission in a trollface, and adding them in without careful thought as to how it changes the piece is... Well, not the best move. The eyes are terribly arranged in all submissions.
The plate of meat, fries, and beans. You can barely see the beans in the first sample, they are hidden underneath the fries, enough that an inattentive eye may miss them entirely. No specifi ation was given as to the state of the meat, or kind, so I suppose the being cut is a nice bonus. Interesting in a sense since one may get the impression the model may have confused the grammatical deep structure such that "fries and beans" was taken as a compound predicate.
The second with the meat surrounded by the beans is an interesting contrast, but without more samples, I have questions about why all the curated samples include rare beef instead of say, sausage.
The Colloseum:
I too could use Photoshop, and select a particular palate. The more interesting aspect here seems to be the color pallete processing, and I'll admit that I wasn't able to find source works of the artist being initated to compare against. Still looking for those.
The Unicorn/Butterfly:
These still disturb me in the sense that once again, we're replacing actual artistic technique, with the ability to tweak prompts or assemble graphical starting states/prompt combos. Is it making some hellish form of combined Natural Language/graphical programming pipeline? Yes.
However, none of this would have any value without being trained on works done by previous artists who likely were not asked whether or not they wanted their works included in the dataset.
As the guy who blew up a Philosophy of Art class by positing that a well executed forgery was as much a work of Art as the imitated piece, I still see here more problems than solutions. Yes, a new art form may have emerged. However, with it comes serious questions around data curation practices. As for the efficacy of the model/runtime characteristics/how this bodes for the environment... I'm increasingly concerned the more I apply ny "what if everyone started doing this?" supposition.
In short, see a hell of a lot of hype, but precious lite coming to terms with what will ultimately be the hard questions.
If you want something unique and abstract, you're going to need to go through a lot of trial and error to get what you want. That's still a lot easier than teaching yourself how to create such art.
"Graphical starting states" in this instance isn't as hard, a very rough MS Paint picture of the general shapes you want things to appear in is enough. Alternatively you can grab rough cutouts from stock art, position them right, and the algorithm should figure out how to turn it into a single, flowing picture.
Take a look at these examples (https://huggingface.co/spaces/huggingface/diffuse-the-rest/d...), they're far from perfect but the autocompletion is done quite well. It should be noted that the demo application doesn't expose a lot of the flexibility the underlying model provides (like blend strength and such) but I don't know a free online alternative that does.