Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It’s completely unreliable because it sounds confident and plausible even when it’s wrong

This describes pretty accurately a non-trivial amount of people I have worked with.



It may describe a number of people you have worked with but it does not describe the average medical textbook which will usually describe the best knowledge we have of a condition. If chatgpt produces something that looks like it came from a medical textbook then it is hard to blame many people for believing it. More generally, people started to trust computers more than people as soon as calculators demonstrated their reliability.


I don't know, I get this feeling ChatGPT has also read all the quack books where the condition depends on the alignment of stars when you were born, or how chicken entrails land when the shaman "studies" it. Those books are also written confidently without giving any sign of being completely fabricated.

In the end, why do people believe what they believe? The answer is it connects with what they already believe. That's it. If you had a diet of what we call science, you'll have a foothold in a whole bunch of arenas where you can feel yourself forward, going from one little truth to another. If you are a blank slate with a bit of quackery seeded onto it, you end up believing the stars predict your life and that you can communicate with dead people with a Ouija board.

CGPT doesn't have an "already believe". It just has a "humans on the panel give me reward" mechanism, and all it's doing is reflecting what it got rewarded for. Sometimes that's the scientific truth, sometimes it's crap. All the time it's confident, because that's what's rewarded.


This is not a new phenomena.

Have most people invested time in making sure their web search results are accurate?

It is dangerous for sure, but that's what you got.


And hopefully after someone shows you that they can be described that way, you stop trusting what they say (at least about the thing they are known to be unreliable about).


A whole lot of people can be trusted within a reasonable identified set of constraint, and we do, while discounting them in others where they're known to spew bullshit. It's very rarely all or nothing.


Obviously. Chatgpt will randomly spew bullshit about nearly any topic though. So you can really only trust it for things you are already an expert in or things that are very easy to verify.


It's not random. It's probabilistic. There is a very big difference. The types of errors are predictable which means they can be improved, and empirically so if you follow the academic literature.


I have read enough of the academic literature to say with some confidence that the majority of the errors I’m talking about are not probabilistic in any meaningful sense. At least not in a way that can be predicted from the amount of readily available training data.


Ironically, this reply is about as useful as if chatGPT wrote it:

- there’s no cited facts

- it boldly proclaims its conclusion

- which you’d only be able to verify as an expert

…so I’m having trouble understanding what you’re complaining about with chatGPT, when that seems to be the standard for discourse.


I don’t trust a random hacker news comment by someone I don’t know or can verify any further than I can throw it, so in that sense they are probably similar.

The comment I was replying to was “the errors you’re talking about are probabilistic if you read the literature” my response is “no they aren’t I have read the literature.”

Note that I’m talking about a specific class of error and proving a negative is difficult enough that I’m not diving through papers to find citations for something n levels deep in a hacker news thread.


Here you go, here's a bunch of papers that you have not read. If you had read them then you would know that the errors are predictable and therefore there are many measurable ways to make improvements.

Toolformer: Language Models Can Teach Themselves to Use Tools: https://arxiv.org/abs/2302.04761

PAL: Program-aided Language Models: https://arxiv.org/abs/2211.10435

TALM: Tool Augmented Language Models: https://arxiv.org/abs/2205.12255

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: https://arxiv.org/abs/2201.11903

Survey of Hallucination in Natural Language Generation: https://arxiv.org/abs/2202.03629


I took the time to read through the only one of those that looked like it was peer reviewed and read the abstracts for the rest.

Survey of Hallucination in Natural Language Generation only provided promising methods for detecting hallucinations in summarization tasks, which are of course much easier to detect. Searching arXiv for a list of non-reviewed papers that sound like they might be related to the topic at hand is fun debate strategy. But no one else is reading this far into an old thread, so I'm not sure who you're trying to convince.

None of these paper prove your claims about hallucinations, and most aren't even trying to. However, even if the errors that I'm saying aren't meaningfully probabilistic aren't hallucinations.


chatGPT (as in the current release of the OpenAI tool that was released 30/Nov/22, so ~3 months ago) has this problem.

This does not mean it is a problem that is impossible to fix.


You shouldn't listen to them either then




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: