I don't understand why they don't let another model "test the waters" first to s...

makin · on Dec 12, 2024

They already do this, it's the moderation model.[1]

This name thing is an additional layer on top of that, maybe because training the model from zero per name (or fine tuning the system message to include an increasingly big list of names that it could leak) is not very practical.

[1] https://platform.openai.com/docs/guides/moderation/overview

indigo945 · on Dec 12, 2024

But how would that work reliably? If I make the statement that "David Mayer" is criminal, an international terrorist or a Nickelback fan, that's definitely libelous. But if I say those things about Osama bin Laden, they're just simply facts. [1]

The legal AI would be impossible to calibrate: either it has the categorize everything that could possibly be construed as libel as illegal, and therefore basically ban all output related to not just contemporary criminal actors, but also historical ones [2], or it would have to let a lot of things slip through the cracks -- essentially, whenever the output to validate suggests that someone's sexual misconduct is proven in court, it would have to allow that, even if that court case is just the LLM's halluzination. There's just no way for the legal model to tell the difference.

[1]: I could not find any sources that corroborate the statement that bin Laden is into Nickelback, but I think it follows from the other two statements.

[2]: Calling Christopher Columbus a rapist isn't libel, and conversely, describing him in other terms is misleading at best, historically revisionist at worst.

throw646577 · on Dec 12, 2024

> [1]: I could not find any sources that corroborate the statement that bin Laden is into Nickelback, but I think it follows from the other two statements.

Pretty sure the literature makes it clear he's a fan of show tunes. So it's down to your conscience and moral backbone as to whether this is better or worse.

https://en.wikipedia.org/wiki/Road_to_Rhode_Island

https://www.youtube.com/watch?v=OCw0VkmWNhc

ravroid · on Dec 12, 2024

I imagine this would be cost prohibitive at scale since it would require two models to run for every user message?