Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're not talking about probabilities. You're talking in binaries.

The broken clock is not the correct analogy.



Any analogy is incorrect if you stretch it enough, otherwise it wouldn't be an analogy...

My clock analogy works up to this: ChatGPT success in factually answering a query is merely a happy coincidence, so it does not work well as a primary source of facts. Exactly like... a broken clock. It correctly tells the time twice a day, but it does not work well as a primary source of time keeping.

Please don't read more deeply into the analogy than that :)


A happy coincidence would imply random behavior.

That’s not even remotely how an LLM functions.

You’re not introducing any scale with regards to correctness either.

It is a poor analogy without any stretching required.


Nope, not random behavior. ChatGPT works by predicting the continuation of a sentence. It has been trained in enough data to emulate some pretty awesome and deep statistical structure in human language. Some studies even argue it has built world models in some contexts, but I'd say that needs more careful analysis. Nonetheless, in no way, shape or form has it developed a sense of right vs wrong, real vs fiction, in a way you can depend on it for precise, factual information. It's a language model. If enough data says bananas are larger than the Empire State building, it would repeat that, even if it's absurd.


I didn’t say it was random behavior. You did when you said it was a happy coincidence.

I know it is just a language model. I know that if you took the same model and trained it on some other corpus that it would produce different results.

But it wasn’t so it doesn’t have enough data to say that bananas are larger than the Empire State Building, not that it would really matter anyways.

One important part of this story that you’re missing is that even if there were no texts about bananas and skyscrapers that the model could infer a relationship between those based on the massive amounts of other size comparisons. It is comparing everything to everything else.

See the Norvig-Chomsky debate for a concrete example of how a language model can creat sentences that have never existed.


> the model could infer a relationship between those based on the massive amounts of other size comparisons

That is true! But would it be factually correct? That's the whole point of my argument.

The knowledge and connections that it acquires comes from its training data and it is trained for completing well-structured sentences, not correct ones. Its training data is the freaking internet. ChatGPT stating facts are a happy coincidence because (1) the internet is filled with incorrect information, (2) its training is wired for mimicking human-language's rich statistical structure, not generating factual sentences, and (3) its own powerful and awesome inference capabilities can make it hallucinate completely false but convincingly-structured sentences.

Sure, it can regurgitate simple facts accurately, especially those that are repeated enough in its training corpus. But it fails for more challenging queries.

For a personal anecdote, I tried asking it for some references for a particular topic I needed to review in my masters dissertation. It gave me a few papers, complete with title, author, year, and a short summary. I got really excited. Turns out all the papers it referenced were completely hallucinated :)


> If enough data says bananas are larger than the Empire State building, it would repeat that, even if it's absurd.

And if it did stuff like that almost every answer, it would be a broken clock.

But it doesn't. It's usually right about facts. It getting things right is not a coincidence!


The probability that the broken clock is right is straightforwardly 2/1440 = 0.001 = 0.1%, innit?


Clock correctness is relative. If the antique windup clock in your living room is off by 5 minutes, it's still basically right. But if the clock in your smartphone is 5 minutes off, something has clearly gone wrong.


To the second? To the millisecond? What are we wanting here? You're missing the point.

But I'll play this silly game: ChatGPT is not incorrect 99.9% of the time.


Nor is it only incorrect one billionth of the time, as you seem to be indicating through your hypotheticals. Depending on what I've asked it about, it can be incorrect at an extremely high rate.


That is definitely not what I am indicating. I'm pointing out the absurdity of speaking of probabilistic things in absolutes.

Yes, ask an LLM to multiply a few numbers together and you will get around 100% failure rate.

The same goes for quotes, citations, website addresses, and most numerical facts.

The failures are predictable. That means the models can be augmented with external knowledge, Python or JS interpreters, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: