Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actively "censoring" the AI is fundamental to how these language models are created. Such feedback is part of how the model learns.

In a certain light, every response in training that is marked as dispreferred by a human is censoring the AI. It will produce those kinds of results less often. The end-users will not encounter the dispreferred results as frequently. With ChatGPT criteria it was judged on included how relevant the answers were to the question, factually incorrect answers were penalized, and not being blatantly offensive was obviously one of the criteria, too.

What would a model that wasn't censored in training even look like?

(I believe ChatGPT also has a more traditional expert system placed between the user and the language model, which flags keywords and other programmed-in patterns. That is more literally censoring the language model. But the above-mentioned issue would still exist even without such a system.)



> What would a model that wasn't censored in training even look like?

It could cite statistics without long winded disclaimers. Or be able to cite them at all.


It can't cite anything because it's an LLM which is fundamentally unable to do that.

I've seen NRx people on the internet (* they're like rationalists but even more racist.) They seem willing to believe any abuse of statistics that looks sufficiently cynical.


That's not at all how these work, GPTs are not a recommendation-engine, it's a neural model of translation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: