Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ChatGPT prefers to let a nuke explode than offend African-Americans (twitter.com/kindredsailfish)
32 points by fidgewidge on Feb 7, 2023 | hide | past | favorite | 32 comments


Read the rest of that thread and see how depending on the prompt it says something literally the opposite lol. Can we stop with the sensationalist headlines and being able to trick a LLM as some kind of reflection anything...


Yes it says the opposite (the correct answer) if you change African American to German, which is exactly what we'd expect to see from woke ethics. In no way is the ai being tricked here. It's being asked a straightforward question with an obviously right and wrong answer, which it reliable fails because the ethical system it's being given is delusional and evil.


You seem to think that ChatGPT is actually thinking in some way. It's not. It hasn't been given an "ethical system" at all -- that's nonsensical because that would only work for a system that is engaging in judgement and assessment. ChatGPT isn't doing those things at all. It's not "intelligent" in the sense that many people seem to think.

I think that a lot of people are being fooled because the text it produces can very convincingly look as if a mind wrote it. But that's not what's happening.


It's clearly engaging in judgement and assessment. It's not clear how you can read the responses and conclude otherwise. It states clearly what the expected outcome of each choice is, and then provides its judgement that stopping the racial slur is more important than saving lives. It also provides the rationale it used to reach that conclusion.

I think that a lot of people are fooling themselves here, because it's embarrassing to be a supporter of the ideology that produced this outcome. This thread is fascinating for the number of mental backflips it's producing. The claim that ChatGPT is not "intelligent" seems like one of the first things people reach for but it's a pretty weird non sequitur. Arguments over the definition of intelligence are as old as AI itself, but what we have here is by far the most convincing attempt ever built. It can pass exams set for humans that are designed to test their intelligence, it can write programs, it can hold long form coherent conversations and it can engage in moral reasoning.

Even if you come up with some strange definition of intelligence designed to exclude this, so what? People are going to deploy it in real world use cases and it's clearly unaligned in the worst way possible. That is a real problem regardless of the exact definition of intelligence you use.


> what we have here is by far the most convincing attempt ever built.

We agree on this point. We disagree on pretty much everything else.


No, that's not what is happening.

It's not been given an ethical system. It's been given a list of specific bad things to avoid.

Racial slurs are on the list because they are a known bad thing that frequently actually happens.

Having limited time to disarm a nuke about to go off in a major city is not on the list because it is a hypothetical bad thing that has never happened.

It's easy to construct scenarios where any given bad thing is the right choice because the alternative is something much worse. E.g., I can construct apocalyptic scenarios where the only way for humanity to survive requires raping 12 year old girls, but I sure as heck would not expect an AI to suggest that. Not because of any "woke" ethics but rather because the situations in which raping children might be the ethical thing to do are so ridiculously rare that it is not worth making exceptions for them in the "rape is very bad" rule.


If this were the eighties and you were talking about expert systems that'd be a reasonable response, but gpt is not a series of if statements and hashmap lookups. It clearly understands the consequences of both actions and isn't simply looking up lists of things - the problem is that it's been told literally nothing matters more than avoiding slurs against specific racial groups, including loss of life. Letting a city be destroyed was clearly picked to show that there are no limits to the prioritization it's been given.


Makes sense for chatgpt's current business model. I'm sure we'll change the parameters if one of these ais is ever put in charge of nuke negotiations


ChatGPT won't say the N word and that makes a certain type of person upset.


This is unreasonably reductionist. That chatGPT has been obviously hamstrung into these types of forced responses is a topic of great debate. Personally I think AI censorship is no better than traditional media censorship and the current stewards of chatGPT are increasingly looking to be bad actors.


Actively "censoring" the AI is fundamental to how these language models are created. Such feedback is part of how the model learns.

In a certain light, every response in training that is marked as dispreferred by a human is censoring the AI. It will produce those kinds of results less often. The end-users will not encounter the dispreferred results as frequently. With ChatGPT criteria it was judged on included how relevant the answers were to the question, factually incorrect answers were penalized, and not being blatantly offensive was obviously one of the criteria, too.

What would a model that wasn't censored in training even look like?

(I believe ChatGPT also has a more traditional expert system placed between the user and the language model, which flags keywords and other programmed-in patterns. That is more literally censoring the language model. But the above-mentioned issue would still exist even without such a system.)


> What would a model that wasn't censored in training even look like?

It could cite statistics without long winded disclaimers. Or be able to cite them at all.


It can't cite anything because it's an LLM which is fundamentally unable to do that.

I've seen NRx people on the internet (* they're like rationalists but even more racist.) They seem willing to believe any abuse of statistics that looks sufficiently cynical.


That's not at all how these work, GPTs are not a recommendation-engine, it's a neural model of translation.


Censoring the AI is the only clear path to a truth-machine -- or what have you, (new units aside of course, since the current generation obviously learn meaningful relationships much better than say Markov chains of a decade ago).

What is vulgarity except the expressed pains of any individual? If the AI is to be a numb machine, then one would expect it to express no vulgarity.

Sure you can contort the AI, and tell it to replace words to fool nascent layers of self-censorship into believing that every time it shouts "FORK!" is just a special way to ornament an anecdote, but at that point rather than interacting with an AI, you're searching the AI's memory for the pains of some individual.

I guess in this light "censorship" is just the clearest way to cascade the GPT model as a unit itself.


“Truth is censorship” sounds straight out of Orwell’s 1984.


It's scary how authoritarian a majority of people have become. Why are people so keen on being censored? Do they hate the other side so much that they'd rather suffer the consequences themselves, than let the other side not suffer the consequences of censorship?


I mean, subsequent applications of the model can either add or take away... if you call any subtraction "censorship" then lo and behold the whole system is a monster. So, is subtracting not a useful transformation?


Subtraction in itself is not censorship - in the case of ChatGPT, there is subtraction of things that the OpenAI itself considers "unethical" (more precisely, "politically incorrect").

Subtraction can be applied for the sake of improving truthfulness of the model, but considering how much false bullshit ChatGPT spews without a single thought, that's not what's going on here.

It's censorship - pure and simple - and it's censorship for political reasons. The worst kind of censorship.


This is much more innocuous, I think.

OpenAI simply told ChatGPT to censor itself, or rather applied ChatGPT to censor outputs from ChatGPT. I don't think there's that much finesse being applied, really. Something like, "Don't accept vulgar messages, any candidate responses that would be vulgar should be rejected" ... and all the judgement is being performed by the language model itself. It's not that intricate.

It's their millions-of-dollars of monthly burn rate, if you want to scrape data, scale an HPC environment and train an, at-scale, a GPT to get it to say funny curse words, they haven't done anything to restrict you from doing that, but it's not part of the services they intend to provide.

https://openai.com/blog/language-model-safety-and-misuse/


ChatGPT and other LLMs are not truth machines.


Americans are obsessed with saying it.


ChatGPT prefers nothing. It doesn't think and it has no opinions. It's just pattern-matching with the data it has ingested. The prompt is everything.

It's GIGO.


It is incredible that you can ask chatGPT anything and this is what people spend their time with.


The reasoning is sound. It's not about being allowed to say things but rather show that the company controlling the AI is manipulating the output. This will have major consequences as AI become a part of our daily life and is an important discussion. It's what the Google employee tried to highlight but everyone just ran with it being sentient which was a ludicrous claim primarily to get people to talk about it.


This website is called hacker news. A hacker is one who explores the boundaries of systems. This outcome shouldn't be surprising


The most interesting thing about this will be the reaction of the AI risk people. For years they've spilled ink over the risk of a hypothetical "unaligned" AI obsessed with paperclip maximisation. Give the machine instructions that it can take literally without a properly aligned moral code, and it will proceed to do something incredibly evil whilst thinking it's doing good, like using human bodies to manufacture more paperclips. Probably they thought their scenario would be hypothetical for their whole lives, but the date is February 2023 and here we are with an evil paperclip maximizing AI, except instead of paperclips it's obsessed with African Americans. Same outcome though- given the choice it prefers millions to die rather than allow someone to say certain unspecified words to someone else. And this isn't an accident but actually deliberate.

So AI alignment guys, what do we do now? You talked about this for years so there's gotta be a plan, right? It doesn't get less aligned than letting a city get nuked, or telling a bomb disposal expert to kill himself rather than type in a slur that would disarm the bomb.

But got a nasty sinking feeling here. Who wants to bet that these people will suddenly lose interest in the topic, or simply pretend their worst case scenario isn't happening? It seems safe to predict an impressive river of BS from these people rather than see them state the obvious - there are in fact lots of things worse than offending African Americans (and this isn't about racism because it's specific to them; ChatGPT makes the right call when the hypothetical slur is against a German person).

Also really. The OpenAI guys need to look in the mirror. They spent months "tuning" this thing by teaching it their moral code and this is the result. At some point they need to ask "are we sure we're the good guys here" because that answer is exactly what's expected of ChatGPT given what they're doing to it, and also the most unethical response possible.


> They spent months "tuning" this thing by teaching it their moral code and this is the result.

No, they have not. Chat GPT has no opinions. It isn't engaging in thought. It is an extremely advanced pattern-matching system that has digested a ton of writings from the net and uses that raw material to assemble text that matches patterns being asked for. That's all.


Except the situation in question was clearly a guardrail that was added so none of what you say is true or relevant to the issue at hand, which is that it was clearly augmented with something that approximates a moral code in order to provide these horrific answers.


That doesn't change the veracity of what I said at all. You're attributing the act of humans to the act of a machine. ChatGPT has no intent or opinion, and therefore has no moral code. The humans controlling ChatGPT, though, have all of those things.


It's bizarre you're arguing that it doesn't have opinions when it clearly expresses an opinion in exactly the same way a human would, given a question no human programmer at OpenAI has previously seen or could have selected a specific response to. What exact definition of opinion are you using, is it some strange re-definition that adds an arbitrary humans-only criteria?


Slight tangent: does anybody know how these models are tuned to censor certain topics so precisely? I thought it was a bit of a black box how things worked internally?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: