>once they figure out how to control potentially harmful generations
Is it just me, or does anyone else think that this is an impossible and futile task? I don't have a solid grasp on what kind of censorship is possible with this technology, but the goal seems to be on par with making sure nobody says anything mean online. People are extremely creative and are going to find the prompts that generate the "harmful" images.
Reminds me of a toy girl doll I heard about which had a speech generator which you could program to say sentences but had "harmful" words removed, keeping only wholesome ones.
I immediately came up with "Call the football team, I'm wet" and "Daddy lets play hide the sausage" as example workarounds.
It's entirely pointless. Humans are vastly superior in their ability to subvert and corrupt. Even if you were able to catch regular "harmful" images humans would create a new categories of imagery which people would experience as "harmful", employ allusions, illusions, proxies, irony etc. It's endless.
Furthermore, the possibility that we create an AI that can outsmart humans in terms of filtering inappropriate content is even scarier. Do you really want a world with an AI censor of superhuman intelligence gatekeeping the means of content creation?
If you squint and view the modern corporation as a proxy for "an AI censor of superhuman intelligence gatekeeping the means of content creation" - then that's been happening for a long while now.
Automatic review of content, NSFW filters, SPAM filters etc... have been bog standard since the earliest days of the internet.
I don't think anyone likes it. Some fight it and create their own spaces that allow certain types of content. Most people accept it though and move on with their lives
I'm down with calling a corporation intelligent (as long as you don't call it a person). But automatic content review is regularly bypassed, they can't even keep very obvious spam off YouTube comments, such as comments copied from real users, posted with usernames like ClickMyChannelForXXX.
So if the corporation is an intelligent collective, then it's regularly outsmarted by other intelligent collectives determined to bypass it.
We can look back further at the Hays code. That's just religion plain and simple. The feeling of, "we're sliding into a decadence which will lead to the downfall of our civilization" is a meme propagating this very sentiment. It's not a simple as just the government, but that does co-occur.
Isn’t that basically what OpenAI and Google tried to do and it lasted all of 3 months.
Problem with tech is once it’s known to be possible if you choose to try and monetize it by making it public as OpenAI and Google were planning to do then it’s only a matter of time before another smart team figure out how you’re doing it.
You can do the Manhattan Project in secret and in 500 years someone else might not realize it’s possible. But the second you do a test of that concept the sign you did that is detectable everywhere and the dots of what you did will connect in someone’s brain somewhere.
In England you discover that English has actually two different existences… The ordinary one and then the “dirty” one. Almost any word has or can be made to have a “harmful” meaning…
"It's entirely pointless. Humans are vastly superior in their ability to subvert and corrupt. Even if you were able to catch regular "harmful" images humans would create a new categories of imagery which people would experience as "harmful", employ allusions, illusions, proxies, irony etc. It's endless."
This is employing a fallacy that people have infinite amounts of energy and motivation to devote to being hateful. I have been on countless online communities in video games and elsewhere and when the chat in them doesnt allow you to say toxic, hateful stuff... guess what a whole lot less of that shit is said. Are there people who get around it by changing out characters to ones that look the same that dont trigger the censor or by using slang or by mispelling? Of course but the fact is I think if you talk to someone who runs communities like this they would laugh in your face if you said a degree of censorship of hate speech wasn't fundamentally beneficial.
A big aspect has got to do with the fact that if everybody agrees to be part of a community, part of that agreement is a social contract not to use hate speech and if someone flaunts that they are bypassing it.. in the obvious flaunting of the social contract established (it is obvious they had to purposely mispell the word) these people are alienating themselves by underlining the fact that the 99% of the community finds their behavior pathetic and unacceptable.
I (and I would assume the OP) agrees that saying "entirely pointless" may be a bit hyperbolic
However the point stands that as a concept, humans will find a way to exploit and corrupt any technology. This is unquestionably true.
Bertrand Russell famously makes exactly this point as well, albeit specifically when it comes to violent application of technology in war. That: until all war is illegal every technological development will be used for War.
Your point however is also true, in that in certain spaces for certain audiences (communities), participants make it more difficult to exploit these things in ways that they don't want to and to explout them in ways they do.
Ergo, Technology is and remains neutral (as it has no will of it's own) and the people using and implementing technology are very much not neutral and imbue the will of the user onto the tool.
The real question you should be asking is, how powerful can a free tool/knowledge get before people start saying that only certain class of "clerics" can use it or that most communities agree that NO community should have it.
Notice on that last point how not-hard we're trying to get rid of Nuclear Weapons
I don't think swearing in a video game is comparable to art.
If I swear at a video game and it comes out as ** I might think "OK, maybe I'm being a bit of an asshole, there could be kids here and it's a community with rules so I'll rather not say that".
If a tool to make art doesn't let me generate a nude because some American prude decided that I shouldn't, though... my reaction is going to be to fight the restriction in whatever way I'm able.
Importantly, we're posting on a forum where this exact idea is true. HN doesn't stop all hate speech, or flaming and what have...but the moderation system stops enough that people generally don't bother.
It seems pretty well-agreed that the HN moderation works because of dedicated human moderators and community guidelines etc.
I think spaces that effectively moderate AI art content will be successful (or not) based on these same factors.
It won't depend on some brittle technology for predicting if something is harmful or NSFW. (Which, incidentally, people will use to optimize/find NSFW content specifically, as they already do with Stable Diffusion).
But this is a forum of interaction between people. These models can and should do things privately. It's the difference between arguing for censorship in HN or Microsoft Word.
Sure it would be a fools errand to filter out "harmful" speech using traditional algorithms. But neutral networks and beyond seems like exactly the kind of technology that is able to respond to fuzzy concepts rather than just sets of words. Sure it will be a long hunt but if it can learn to paint and recognize a myriad of visual concepts it ought to be able to learn what we consider to be harmful.
One of the insurmountable problems, I think, is the fact that different people (and different cultures) consider different things 'harmful', and to varying degrees of harm, and what is considered harmful changes over time. What is harmful is also often context-dependent.
Complicating matters more is the fact that something being censored can be considered harmful as well. Religious messages would be a good example of this; Religion A thinks that Religion B is harmful, and vice-versa. I doubt any 'neutral network' can resolve that problem without the decision itself being harmful to some subset of people.
While I love the developments in machine learning/neural networks/etc. right now, I think it's a bit early to put that much faith in them (to the point where we think they can solve such a problem like "ban all the harmful things").
>There's way too much moralizing from people who have no idea what's going on
>All the filter actually is is an object recognizer trained on genital images, and it can be turned off
I'm not sure if you misread something, but neither I or the person I was replying to was talking about this specific implementation, but in a more general sense?
I'm pretty sure you are the one who missed the point of the parent post and mine.
It's not that simple. The model was not trained to recognize "harmful" action such as blowjobs (although "bombing" and other atrocities of course are there).
The model was trained on eight specific body parts. If it doesn't see those, it doesn't fire. That's 100% of the job.
I see that you've managed to name things that you think aren't in the model. That's nice. That's not related to what this company did, though.
You seem to be confusing how you think a system like this might work with what this company clearly explained as what they did. This isn't hypothetical. You can just go to their webpage and look.
The NSFW filter on Stable Diffusion is simply an image body part recognizer run against the generated image. It has nothing to do with the prompt text at all.
The company filtered the LAION 5b based on undisclosed criteria. So what you are saying is actually irrelevant, as we do not know what pictures were included or not.
It is obvious to anyone who bothers to try - have you? - that a filter was placed here at the training level. Rare activities such as "Kitesurfing" produces flawless, accurate pictures, whereas anything sexual or remotely lewd ("peeing") doesn't. This is a conscious decision by whoever produced this model.
Well it ought to be able to be trained for a number of scenarios and then on generation be told to generate based on certain cultural sensibilities. It's not going to be perfect but probably good enough?
Isn't this part of the AI alignment problem? To be able to understand what kinds of output is unacceptable for a certain audience? To be polite?
> Well it ought to be able to be trained for a number of scenarios and then on generation be told to generate based on certain cultural sensibilities. It's not going to be perfect but probably good enough?
Do we want the AI to generate based on Polanski's sensibilities, even if he's the only audience member? I suspect for most people the answer is no.
I find it very immoral too, it's like the islamist trying to prevent the prophet pictures drawn. Not that I want to offend muslims or make "harmful" content but this notion that specific type of content creation needs to be imposed is very very problematic. Americans freak out of nudity all the time, something that is not considered harmful in many other places. The fear of images and text and the mission to restrain it is pathetic.
Anyway, it won't be possible to contain it. Better spend the effort on how to deal with bad actors instead of trying to restrain the use of content creation tools.
Yeah, it's taking the impulse to control everything from our own mind and putting it into an artificial one. Seems to me a lot of our suffering is borne of that impulse.
OpenAI's filters are a total joke. I tried to upload The Creation of Adam (from the Sistene Chapel), blocked for adult content. "Continued violations may restrict your account". Yeah, it has naughty bits in it, but it's probably in the top ten most recognizable pieces of art ever made. I tried to generate an image of "yarn bombing", blocked for violence. They have the most advanced AI in the world and they can't solve the Scunthorpe problem?
They're not content filters as much as Doing Something filters. They're there to convince people that they're doing something, and of course if it wasn't zealous and regularly tut-tutted people for desiring a rubber duck, you wouldn't know they were doing something.
The reason why this is such a game changer is that it is not controlled on some central server.. its like saying paper and pencils can be revoked from people if somebody doesn't like what you do with it... its an amazing new technology.. let people use it..
Regardless of the practicality: why do they think it’s their role to be the morality police?
If there’s anything we’ve learned from history, it’s that we’ve always been morally wrong in some way, very often in our most strongly held beliefs. This AI in a different time would be strictly guided to produce pro-(Catholic Church/eugenics/slavery/racist/nationalist) content.
And the corporate creators freaking out about the profanity. Microsoft's Tay wouldn't be remembered so fondly if Bill didn't immediately pull the plug when channers made her say the n-word.
> Regardless of the practicality: why do they think it’s their role to be the morality police?
It’s not just morality - there reportedly have already been multiple subreddits of non-consensual porn trying to mimic real people and underage porn. The legality of that is a minefield but it doesn’t end there. If that’s what they become known for it affects funding, hiring, people deciding whether to use their software, etc. and the more prominent that is the more likely that they’ll be hauled before legislators to talk about problems. Even simple things like legal demands to remove celebrities from the training sets could be pretty time-consuming.
Stable diffusion does run a filter on the output in its default configuration. Any image it deems 'unsafe' gets replaced with a picture of Rick Astley.
The thing about that is that it is open source, so you can trivially disable that filter if you like.
Reminds me of a joke: three guys get locked up for a long time. Out if boredom they start telling jokes to each other, but as the supply is finite, they are retelling them all the time. Eventually they number them, then just shout out eg "27", and they are all laughing.
Then a new inmate joins, doesn't know what's going on but figures that if you say a number, people laugh. So he goes "14!". But nothing happens. The others tell him "you didn't tell the joke right".
How is the poor AI meant to know that jokes 6, 13 and 38 are sexist?
I was once a guest at a tech think tank, early 2000s, people all in their 60s at the time
They spent years grappling with online worlds because of the idea that people might/could represent themselves as a different gender, they wanted the technology to exist and had dreamed about it for decades they just got caught up on that
That was comical because it was also out of touch at the time period as well
Its interesting how people squirrel and spiral over useless things for some time
Even in the 90's they had to fight hordes and hordes of Californian nutjobs (Diane Feinstein et. al.) that wanted to ban violent video games. These people would be certainly cancelled in today's world, wouldn't hold a chance. Because, how dare you allow violence in video games to ...children!?
Our civilization depends on allowing wacko's do their thing as far as it is within limits of the law. Let them be offensive as fuck. These are the people that herald and propel society forward by their heterodox thinking. Society is going to decay fast, it already is.
Yea definitely when they started a studio in Dallas, I don't remember the congress persons that were on similar stance as Diane. During the 90's, progressives played a larger role though. There was also Mortal Kombat fiasco:
> During the U.S. Congressional hearing on video game violence, Democratic Party Senator Herb Kohl, working with Senator Joe Lieberman, attempted to illustrate why government regulation of video games was needed by showing clips from 1992's Mortal Kombat and Night Trap (another game featuring digitized actors).
> During the 90’s, progressives played a larger role though.
Could be true, maybe, but today conservatives have willingly taken over that seat, and the NRA is heavily involved and actively blaming video games after each mass shooting to deflect from the debate on gun rights. https://www.usgamer.net/articles/the-nras-long-incoherent-hi...
In terms of trying to moderate swearing and sexuality in games and music and movies, the religious right has long been and still is the group most vocally opposed to such free expression... if we’re talking about where to address censorship today.
Why does this matter? Regardless of the party, my original message stands. It is an irrelevant detail. Not sure what's causing defensiveness everytime I bring up or criticize progressives. My bad I only remembered Diane Feinstein's name from the book, jeez.
Oh I thought you were suggesting we should stop censoring legal but offensive behavior? The issue of exactly who’s doing the censoring seems absolutely and completely relevant to the subject of censorship, no? If it’s irrelevant, then I don’t understand the point of your top comment. Why do we need to allow offensive wackos to do their thing, what offensive things are we talking about, and who needs to allow them?
Perhaps a more important discussion, if you do care about censorship, is to define more thoughtfully what you mean about “within the limits of the law”. In the US, the law, up to and including the constitution, makes clear that offensive behavior is anywhere from not protected free speech up to criminal activity. Politicians are debating what the limits of the law should be, and sometimes they blow hot air, and sometimes they write bills. Either way, the results of Congressional bills are establishing the limits of the law, and so define the acceptable legal bounds of offensive media & speech. Here’s one of the bi-partisan congressional sessions on games (it included Feinstein, among many others, but she didn’t testify). https://www.govinfo.gov/content/pkg/CHRG-109shrg28337/html/C...
In response to people jumping on to defending progressives of the 90's. The amount of defensiveness that's invoked here on HN for stating the facts is quite alarming.
I really should have left out the Diane Feinstein and "California nutjobs" in the original post. This is what happens when you mistakenly poke HN every single time when it comes to political one-sidedness.
The original Doom had "Italian cannibal film" levels of gore, heavily pixelated of course (not as if they had a choice in 1992), but in such that you could see that it was scans. Plus of course a lot of over-the-top satanic cliches to tick off the fundamentalists. But nothing remotely sexual - that's a bridge too far in the US.
Dianne Feinstein never attempted to control video games or doom. She just said that she was worried about the impact once, in April 3 2013, and Fox News has been screaming her name ever since. She's never introduced any law about this at all.
Only one California politician has ever attempted to do much of anything to video games: republican Joe Baca who tried a dozen times, and is mostly famous for his attempt in 2009 to get a warning sentence on boxes. Calling that censorship is pearl clutching
The only genuine attempt to do something an adult would consider censorship to video games were Jack Thompson, now banned republican, or that brief 2018 thing with Trump.
Democrats have never attempted to censor video games. All three major attempts were Republican.
It's important to get the details right if you are going to build an intuition of who's actually doing this
I’m sorry but none of what you said is true. At this point, the facts are indisputable. Check out my other replies that point to the congressional hearings.
I don't see the point. Idiots are fooled by far less convincing images.
Humanity has had the ability to lie with pictures since the invention of photography. The field of special effects can be described as lying about things that don't matter.
Without using Stable Diffusion, I can still photoshop an image or deepfake a video. Stable Diffusion isn't really changing what's possible here, and arguably is less advanced than what's possible with Deepfakes or even the facial filters available on social networks.
Like with all deceptive imagery: one just needs to use their noggin.
* Also I might add: the article is actually out of date on some aspects, because this technology is evolving so rapidly. Literally every day there is a new and interesting way that people are applying the tech.
It's no different than Google images, which is also voluntarily polite by default.
In both tools you can get naughty images, but you have to tell the tool that's okay.
This is not about censorship or moralizing.
It is just having the tool know when it's allowed to do that stuff. It's a key basic product feature if you're actually using the thing for content and not just having fun making pictures
Everyone acting like there's some kind of free speech issue should go into their account and turn the filter off, then try to calm down
It makes sense if the intent is to protect Midjourney from being blamed for misuse. If they saw the potential misuse yet chose to do nothing about it, they'd be blamed. Lack of perfect solution is not an excuse for not offering any protection.
I literally spent the whole first 3 hours figuring out ways to generate porn. They don't allow words like sex, cock, etc so you use prompts like intercourse and phallus. At one point I thought they were screening for particular names so you'd say things like "the brother of mako in the legend of korra" instead. It's just an endless game of cat and mouse not worth putting effort into. Got bored, now I'm playing with the dev api. People have been showing how to integrate this into Photoshop and Gimp and it's pretty cool.
The goal is to have a checkbox which keeps the system from generating naughty images in casual use.
This has absolutely nothing to do with censorship. It's a nonsense concept and it's not clear what you think censorship actually is.
If you set the system to make tall rectangles, are you censoring squares?
It's absolutely exhausting how people on HN attempt to cast any form of telling a tool what you want the tool to make as if you're somehow morally governing something
It's just telling the machine what to make
Not everything is a desperate ethical dilemma
Sometimes you just want the things you create to be straightforwardly usable
You understand that the filter is voluntary, and that the initial delay requirement (long gone) was about Discord adult image rules, right?
You're not just moralizing censorship by habit where there was none, trusting hn to overreact when that word was abused, right?
Agreed. I'm also not sure how this is practically supposed to work if they really publish the entire model. Right now, all they do is design a specific license, right? Or are there certain safeguards built into the model itself?
That being said, I'd still think publishing the model (vs keeping it as a closed-source API) is a good move. Otherwise, we'd move forward into a world where one of tge most significant technological advancements must be gatekept forever, which I'd frankly find even more dystopic.
Well, it depends.. are you talking about significantly mitigating harmful uses of stable diffusion or completely stopping them? The latter... of course it isnt going to happen but there are plenty of practical things that can be done to mitigate.
If we can't even do this, how are we ever going to align AGI? I see these efforts as part of a nascent effort at alignment research (along with the more proximate reason, which is avoiding bad PR from model misuse).
Yeah, the best they can do is the filter's on top of the output. These models are complex enough that with some reverse engineering you can find "secret" languages to instruct them that would be able to get around input filtering.
devil's advocating, given they have trained it so well to generate images in spite of all expectations, is it really so hard to imagine that they can't also train it to understand what images not to generate? It already had to understand not to generate things that don't make sense to humans. How does this not just amount to "moar training"? The hardest thing is that the training data it will need is a gigantic store of objectionable (and illegal) content ... probably not something many groups are eager to build and host.
The thing is that people can make harmful art themselves. Photoshopping people's faces on nudes and depicting graphic violence has been a thing since digital photography if not painting in general. I mean, look at all the gross stuff which is online and was online way before these Neural Networks.
The issue with these neural networks isn't the content they create, it's that they can create massive amounts of content, very easily. You can now do things like: write a Facebook crawler which photo-shops people's photos on nudes and sends those to their friends; send out mass phishing emails to old people with pictures of their grand-kids bloody or in hostage situations; send out so many Deepfakes for an important person that nobody can tell whether any of their speeches is legitimate or not. You can also create content even if you have no graphic design skills, and create content impulsively, leading to more gross stuff online.
Spam, misinformation, phishing, and triggering language are already major issues. These models could make it 10x worse.
Where today it takes some far-from-Jesus deviant artists a whole day to draw a picture of Harry Potter making out with Draco Malfoy, with the power of AI, billions of such images will flood the Internet. There's just no way for a young person to resist that amount of gay energy. It's the apocalypse fortold by John the Revelator.
> It's the apocalypse fortold by John the Revelator.
I literally read a chapter of Inhibitor Phase where there's a ship called "John the Revelator" less than an hour ago. I haven't otherwise seen that phrase written down for years.
Spooky (and cue links to the Baader-Meinhof Wikipedia article).
> Spam, misinformation, phishing, and triggering language are already major issues. These models could make it 10x worse.
Or 10x better, as the barriers to entry for doing this kind of thing right now aren't high enough to make it not happen... they are only high enough to make it sufficiently hard to pull off that people can feel comfortable assuming that most of the content they see is legitimate; in a world where nothing is necessarily legitimate I'd expect you'd see a massive shift in peoples' expectations.
After generating 5000 images with these tools, I believe the killer app will be the one that gives the artist the most control. I want a view and a scene and be able to manipulate both in real time.
Like,
View: 50mm film, wide-angle
Scene: rectangular room with window -> show preview
Scene: add table -> show preview
Scene: move table left -> show preview
Scene: add mug on table -> show preview
View: center on mug
Right now, there’s little control and it’s a lot of random guessing, “Hmm what happens if I add these two terms?”
Have you seen the img2img results? You draw kind of a crappy Microsoft Paint style image, give it some text for how you want it to actually look, and it does the transformation.
The natural language alone is one of the worst ways to control image generation. The model knows how to generate anything, but it's own "language" is nothing like yours. It's like writing in Finnish, twisting it in such a way that it would yield coherent Chinese poems after Google Translate. You will end up inserting various garbage into your input and not getting the result you like anyway. img2img gives much better result because you can explain your intent with higher order tools than just textual input alone.
What would be best is to properly integrate models like that into some painting software like Krita. Imagine a brush that only affects freckles, blue teapots, fingers, or sharp corners. (or any other thing in a prompt) Or a brush that learns your personal style and transfers it onto a rough sketch you make, speeding up the process. Many possibilities.
I think they are already making an img2img plugin for Photoshop. Watch the demo, it's kind of impressive. [0] It's just a rudimentary prototype of what's possible with a properly trained model, but it already looks like a drop-in replacement for photobashing (as an example).
It's all about generation time. If generation was faster, the UI could preemptively show you a lot of variations based on suggested keywords. And also you could click things and get immediate results.
Currently it takes my mid end PC (2070 Super) 10 seconds per image, which is too slow. You would need to get generation time below 1 second to be quite productive. I guess you can already achieve that with something like triple 3090s?
I think the ideal UX will be the ability to markup images with little comments and have it adapt accordingly. The prompt interface is bad. One of the biggest reasons being that you have virtually no control on the spatial aspect of your additions. Being able to say "add an elephant here and remove this lamp" will be big. Being able to do so with a doodle of an elephant to suggest posing will be even better.
Reminds me of the holodeck scene where Picard(?, Edit, Geordi) reconstructs a table with what I, at the time, thought was a pretty vague set of specifications.
Turns out the Star Trek predicted 2020's style AI behaviour rather well. Considering nuclear war is then due in 2026, that's disconcerting.
An odd one, that. After all the lore (geddit?) about Data and his brother being unique and special for their unrivalled artificial intelligence, it turned out all you have to do to exceed that is just vaguely ask a standard-issue ship computer to do so.
I think the size of the enterprise and its fusion reactor is quite an unfair advantage. Was Data really supposed to be smarter than the enterprise especially when it can read Data's mind state in order to fulfill the prompt?
I suppose the EMH is (or at least was, pre-mobile emitter from the future) a thin client for the Voyager computer.
Still seems odd that it's only apparently Data, Moriarty and the Doctor that have demonstrated the Federation actually can make pretty general AI with the tools it already has on starships (and conveniently always on the ship with all those film crews on it making the Historical Records).
Surely under the crust of some demon class planet there's a bank of millions of times that power bring used for...something.
There's probably a rule against making AI that you're allowed to break in the delta quadrant though.
There’s no direct canon confirmation, but it seems quite plausible that it was, in fact, the Bynars who provided the technological leaps necessary for the Enterprise computer to generate Moriarty and other proto-sentient characters. Riker and Picard both comment on the realism and perception of Minuet, created by the Bynars on the holodeck after their upgrades.
And there is a direct canon line from Moriarty through to the EMH and later sentient holograms via Lt. Barclay.
I make tools for artists and am afraid to incorporate AI generation because I am pretty sure then everyone will just discount work creates with my tool, assuming all of it was AI generated, and then no artists will want to use it.
What I am actually leaning towards is a tool for users to "enhance" art with AI, but only if the artist allows it.
All these harmful this and unsafe that... I don't get it. What is the goal here? Are they trying to make it resistant to any "vulgar" inputs (I put "vulgar" in quotes because I can 100% see how they may consider strongly political statements as "vulgar" too)? Or prevent it from producing pornography? Or CP? The last one should be fairly straightforward (kid + nudity = get it out), but the first two are very broad limitations. I understand if you want to make it not super racist, but with pornography, I'm not sure it has a point... even more so given the fact that erotic art is very common in the real world and I'm pretty sure this model (and others like it) can't distinguish between erotic art and pornography (which you can't blame it for given the legal standard seeming to be "I recognize it when I see it"), meaning it can only ever be used to produce SFW imagery scrubbed of any "risque" factors or themes, because the authors didn't want to deal with any kind of erotic art due to this impossibility of separating erotic art from pornography.
If the point of all of these models is to get to something resembling an artist, then why intentionally kneecap it from the start and prevent it from producing art?
> All these harmful this and unsafe that... I don't get it. What is the goal here?
I'm not sure, but they mention biases. So I imagine one thing they want to avoid is that you ask for a drawing of a "criminal" and 90% of the images are of colored people. It should be possible to avoid to minimize these if you review the dataset, at least for certain key words.
I have been playing around with it using ROCM+6900XT, makes a good alternative to DALLE. They have different strengths, DALLE seems better at lighting instructions and cityscapes, but Stable Diffusion is better at sketches.
Also, you can fine tune it on whatever you want which is awesome.
One interesting effect I have noticed on myself though is that after staring at DALLE or Stable Diffusion generated images for a long time then viewing "real" media, I get the same sense of wrongness that the output is not quite right for awhile, like my brain has been tweaking its processing to prefer the AI art as the ground truth!
That's funny, for me dalle2 is in practice miles ahead on pencil sketches, but stable diffusion is cool because the parameters can be customized, which helps with many phrases. Also, you can just leave it running and producing images for an hour.
Also, there's no content filtering, but I don't recommend playing around with that if you're sensitive. The lifeless husks and various mixes of body parts I got when playing around with it with fairly benign phrases could very well be used for a horror movie.
It might be that I haven't yet found the right phrase for stable diffusion for pencil sketches though, as for dalle2 it's just "<describe what you want>, artstation, pencil sketch, 4k" to generate consistently great pictures.
4chan is having a field day with AI generated porn of celebrities (often with ridiculous prompts) and selecting the most unsettling. One for Billie Ellish looks like some kind of orphaned shoggoth/succubus hybrid just made its first attempt at luring in someone for a meal: "You like human females, yes?" Cataract eyes, aggressive lobotomy mouth, it forgot to pay attention to shoulders and didn't know spines existed or what they were for. Or a second attempt, this time at Bjork, suggesting some kind of lost hominid which consumed only melons in a predator-rich environment.
The first links are most recent. You can see the progress I've been making as I learn to do better prompt engineering and iterate on existing images by using img2img. The future is here...
So, can someone explain the license to me? I read it and it seems very reasonable. It excludes most of the bad use cases, and doesn't restrict interesting but controversial use cases too much. However,
> To generate or disseminate verifiably false information and/or content with the purpose of harming others;
How do they define that? So I can generate disseminate false information without the purpose of harming others, just for fun? And what if I believe I am not harming others, but helping others? Can I generate fakes to further my political cause, if I'm convinced it is a "good" cause? And what about Popper's paradox? If I prevent people from harming others, then I am still causing the harmer's harm. I feel they are opening a can of worms here.
Also, bad actors will just ignore the license. There is a piece of code that censors obscene generations, you could just comment that out. I feel the license and that filter are not going to stop anybody, but are mostly there for good publicity and so they can wash their hands in innocense...
I’ve generated over 1000 images in the last 48 hours. It’s better and faster than using Dall-E, I can literally just leave a prompt churning away in the background for the same costs of playing a high end videogame and check on the results when I want.
Honestly if I was a commercial concept artist or illustrator that didn’t have a signature style I’d be really worried. We’re truly gonna see the power of this tech as a tool now it’s not gatekept.
> Honestly if I was a commercial concept artist or illustrator that didn’t have a signature style I’d be really worried.
The prices people pay for any kind of picture art are about to take a nosedive. Stock art websites are going to be hit hard, any kind of graphics artist, any kind of commissioned artist. I wonder if (human) models will be taking a pay cut as a result.
I heard Midjourney is adding extra prompts to anything you submit that give it the signature style is has. Pretty sure you could get the same style stuff out of SD if you knew what to add.
While it's a huge win to be open source, I find the results always inferior to Midjourney (and DALL-E).
I tried to generate some artistic results with variety of prompts and Midjourney always won hands down.
But of course, since it's open source, many community tweaks and colab notebooks/forks will probably put it in par with DALL-E by time. But I have trouble imagining Stable Diffusion competing against Midjourney anytime soon: the different is day and night.
If the beta they introduced briefly for a few days was SD, I had to add that it always, 100% of the time, procuded much inferior results compared to the MJ "v3".
It was so bad that if they'd replaced v3 with it (good that they didn't) I'd probably have stopped using MJ and had cancelled my subscription.
The stable diffusion model will replace the v3 model soon as it's actually superior in coherence and details (I don't know how you can actually say otherwise) but the model v3 will still be available as the v2.
"Thousands of people," you're the only person I've seen on the Internet screaming about how horrible the beta was. You can go check out the Midjourney subreddit and you'll see that people really like it.
In fact, the beta has since returned. The team is just finetuning the pre- and post-processing pipeline, and the new model will be ready for use.
Well, I spend hours in the actual Discord and many people complain there. SD-based beta is simply not artistic enough for most folks, turning it into more like DALL-E. It's not surprising that people generally put the good results on the sub, not failed attempts.
I was not talking about good results, but about people's reaction. The artistic vibes that the v3 model offers are nothing but clever pre- and post-processing, when the SD-based model comes out it will give you similar vibes.
The row of three pics near the top of the article are from a new model they have in the works that uses SD under the hood. It was available briefly as a test run. MJ’s additional magic over SD gets even better results.
> But global paradigm shifts aren’t pleasurable for everyone. As I explained in my latest article on AI art, “How Today's AI Art Debate Will Shape the Creative Landscape of the 21st Century,” we’re getting into a situation—now accelerated with the open-source nature of the model—that’s extremely complex. Artists and other creative professionals are raising concerns and not without reason. Many will lose their jobs, unable to compete with the new apps. Companies like OpenAI, Midjourney, and Stability.ai, although superpowered by the work of many creative workers, haven’t retributed them in any way. And AI users are standing on their shoulders, but without asking for permission first.
As I argued there, AI art models like Stable Diffusion pertain to a new category of tools and should be understood with new frameworks of thought adapted to the new realities we’re living in. We can’t simply make analogies or parallelisms with other epochs and expect to be able to explain or predict what it’s going to happen accurately. Some things will be similar and others won’t. We have to treat this impending future as uncharted territory.
I wonder if we'll also talk about "conversations", "complex situations" and "the need to treat this as uncharted territory" when some Copilot/GPT3 successor a few years down the line spits out entire production-ready software stacks off the prompt "like Facebook only better" - using our own code as training data.
This prompt is unspecific to the point of unusableness. Even if this works some day, the spec used will be a lot more detailed, in higher-level pseudocode style.
True, but as you can see with image generators, they can without any problems work off extremely underspecified prompts, they will just use their own priors to fill the gaps.
There will absolutely be prompt engineering and I agree that actual, serious prompts will be much more specific than that.
I don't think the prompts will necessarily be pseudocode-style. Depending on what trainsets are available, I could imagine we'll have some high-level description of desired features in addition to lots of specifiers which narrow down the specific languages, design patterns, tools etc which should be used in the resulting codebase.
You can already use similar prompts with Copilot today by disguising them as comments.
What you're describing is essentially a DSL. We can ruminate on the exact level of details that will be needed but in my opinion by necessity it will be a lot higher than what most commenters in those threads are imagining. Open API specs are already quite complex and verbose and it's not exactly clear to me we can do a lot better when describing APIs, whether web ones or not.
I'm just a run of the mill software engineer (mostly webdev).
I never cared about ML, or Data Science,
Had been playing with dall-e the past few weeks after getting beta access, but it's too limited/meh and ran out of credits soon.
Then DreamStudio (SD SaaS) launched to the public, and I was blown away.
Then I tried to run txt2img on my mac, which I did, but it's too cumbersome/slow
Then found out about replicate, which also exposes an API to interact with and run the models.
I've been since then having fun with it, built some scripts with playwright, and doing generative art with stable diffusion, I'm no artist, but it's so much fun, and results so visually pleasing, I cannot not pursue the urge to explore this.
I will be starting an anon account in twitter and try to sell some of my art as NFT's, we'll see where it gets me.
Just ordered a card to get around NSFW filters (They' are nonsense and flag some random stuff)
If you want to try it the easiest way is dreamstudio/replicate.com
If a computer model could produce the world's best porn, would that be a good or bad thing? Many harmful effects of porn would be amplified, but it would reduce the exploitation of real people in the industry. A moral question society will soon face I think.
I think the bad things would be targeted use of realistic images: for example, imagine the horrible things some people experience in school multiplied by “leaked” photos, or someone’s abusive ex distributing “proof” of their infidelity / unsuitability to have custody, etc. There’s a theory that over time people would stop believing everything they see but there’s still plenty of time for millions of tragedies before that happens, if it ever does. Forensics is going to be a growth industry.
Yeah we need Trigger Warnings on any ML repo that uses Python. The trauma that comes from dealing with Python dependency management is really hurtful and non-inclusive.
Which I don't think is the end of the world. I can draw horrid things on my iPad in Procreate or even just a pencil on paper. What's new here is ease of access and hyper realism. This is more a problem for fake news than just generating bad/shock images which were already possible.
It isn't going to impress the person in the street until it actually follows your instructions. I tried several times to express "a tall three-legged stool" but even with the "CFG" (how much the image will be like your prompt) at max, it gave me stools with four or ultimately, two legs. Also tried "a four-legged spider" (don't ask) and got first an eight-legged spider, and next, a spider with eight legs, but four of them were blurred. Sure, dumb, pedestrian requests, no imagination, but a five-year-old would quickly get impatient with its inability to follow simple directions.
You could sketch a three legged stool in MSPaint and use img2img.
I think looking at text prompts as an essential part of the technology is very limiting. You could be using a mix of text, storyboards, images of objects to place inside it, sketches of the desired layout, etc… in fact why not replace your game's renderer with it?
I am gonna ask the weird question, but why are we trying to prevent generating CP content?
1. Sharing is publicly is still considered highly illegal.
2. The biggest problem with everything CP related is that to produce this kind of content, real children were used, tortured and often killed.
Should we not allowed the model to generate CP content in order to reduce the number of real children being hurt/killed?
Plus, this of course would not prevent any authorities from tracking who is sharing this kind of content.
It'll be interesting to see what happens when a copyright troll ( https://doctorow.medium.com/a-bug-in-early-creative-commons-... ) realizes that they can acquire the rights to models distributed under these vague-as-fog moral panic licenses, or distribute their own and have people actually use them, and start extracting rents.
These licenses will do little to nothing to stop abuse: The abusers will already conceal their identities because their actions are immoral or even illegal (fraud, harassment, etc). But they create a whole host of new liabilities for the users because the definitions are exceedingly subjective.
It's tremendously important to make these tools actually open. But open with a lurking liability bomb stops short of the goal. While stability.ai may never turn into a troll or sell their rights to one, that isn't necessarily true for the next model that comes around.
That raises the question. What is the economic licensing for StableDiffusion et al? Can I download it and set it up, and then charge people money to run it?
I am curious about the nature of the output being rasterised bitmaps. I would have expected that it was easier for a model to generate output based on primitives that it learned as geometric shapes with spatial relationships (what does an "arm" look like, etc. as a shape). I would like to know if the model does have a layer that represents these and then it effectively "renders" them as rasterised images, or is it really computing at the level of pixels. So far I have not seen anything other than rasterised pixels.
I guess it matters because most of these images are unusable for further purposes because they can't really be edited and touched up easily to fix up all the flaws or do the final adaptation. Are there any options that generate the images in anything like vector art that would then facilitate the downstream finishing process rather than fully rasterised bitmaps?
Is it possible to generate AI art but also provide a list of citations (maybe weighted) as metadata to help figure which original images most contributed to the generated image?
I feel this would help a lot with giving artists more credit with the AI art outputs.
Very exciting to have a brand new technology that you can see quickly advancing and branching out into all kinds of things every week. I wonder how long this current run is going to last and where it ends up.
I'm very impressed, the first sentence I tried was something like "a half submerged archimedes screw ship plowing through sea ice". It had some pretty good ideas of what it might look like.
My only gripe is the usual one in the AI field: very sloppy nomenclature.
Reading about diffusion models I first expected a novel parametrized family of functions, otherwise known as an "architecture".
Instead it seems more like a training method, so a nomenclature of "diffusion training" would seem more apt.
The model runs on just about anything, it's just a question of how fast. My Intel i7 CPU can use OpenVINO to generate a standard image in about 25 minutes. The M1 can do it in ~45 seconds. For comparison, modern GPUs take about 10 seconds.
One oddity for me (and I haven't played with a lot of AI art, so maybe this is normal): every time I try to describe a person, it generates like four to seven different faces.
ha! Generative art is an image search engine with fancy interpolation. Would it be tractible to find a list of nearest training examples? Then you could cite the stolen art. Imagine that as a Twitter bot.
I think these AI art tools are great for finally enabling the masses to unleash their creativity without having to have true art skills. It’s like the equivalent of “no-code” style platforms, except because this is art, we can be a lot more forgiving if the results aren’t perfect. No need for “artists” to have a monopoly on artwork, we’re all artists.
Is it just me, or does anyone else think that this is an impossible and futile task? I don't have a solid grasp on what kind of censorship is possible with this technology, but the goal seems to be on par with making sure nobody says anything mean online. People are extremely creative and are going to find the prompts that generate the "harmful" images.