These tiny models in general have really weird failure modes. I tried the tiny s...

nerdponx · on Jan 2, 2025

Lest we forget that this stream-of-consciousness confusion was state of the art just a few years ago.

It makes sense if you think about it: a small model's "internal state" isn't rich enough to keep track of whatever it was supposed to be talking about.

It makes me think that the reason LLMs need to be so large is that the internal state needs to be bigger than a typical human "idea", whatever that might mean.

acchow · on Jan 3, 2025

The way we do LLMs now is that the program and the data are one and the same. The program mutates itself as it "executes". This is probably also how the brain works since there is no hard separation between "memory" neurons and "data processing" neurons. (biology has no hard separation in general).

kube-system · on Jan 2, 2025

What I find fascinating is how ML models hallucinate in a way that is sometimes reminiscent of a fever dream.

ethbr1 · on Jan 2, 2025

It makes sense that the failure modes of language prediction look a lot like ADD.

p0w3n3d · on Jan 2, 2025

It's because they are precisely lacking attention

jdiff · on Jan 3, 2025

Don't fall into the trap of applying human psychology to LLMs. Bag-of-chemistry quirks do not translate to matrix-multiplication quirks.

ethbr1 · on Jan 3, 2025

Why not? In both cases the result is losing the thread of thought.

hobs · on Jan 3, 2025

Because analogy can be useful in explaining things, or it can be worse than useless - it ties our thinking up into side quests that have nothing to do with the matter at hand.

jdiff · on Jan 3, 2025

...No, no that's not how ADHD works. It's difficult to sum up how wrong this is concisely, but I invite you do to some serious research into ADHD, how it functions, and the great variety of ways in which it can present in different people. It's quite a poor analogy.

ethbr1 · on Jan 3, 2025

I'm aware that anything to do with the brain has a variety of presentations.

Could you try to put a couple sentences down on how ADHD is an inapt metaphor for failure modes in this case?

It's lazy to claim something is wrong without offering a useful point as to how it's wrong. I trust in your ability to summarize.

jdiff · on Jan 3, 2025

For additional context/discussion, I feel this comment[0] elsewhere in the thread put it well.

The reply to that comment also has some information I feel is helpful to show the breakdown here. It mentions that lack of attention presents in only 15-20% of cases. This isn't ADHD, it is something new, the fundamental underpinnings do not relate, and so the analogy/metaphor does not facilitate a better understanding of the situation.

On the contrary, it makes LLM "attention" out to be something entirely different from what it actually is. Without attention, models don't become easily distracted. They are easily distracted regardless. Without attention, LLMs primarily fail to disambiguate between different meanings of identical words, they fail to take context of the sentence structure into account when assigning meaning.

I hopefully don't have to dive into psychological and chemical specifics of ADHD to have demonstrated that this is fundamentally just not at all what ADHD is. Again, there is no underlying harmony between this mechanism and how ADHD affects human attention in 15-20% of cases, and there is no analogy.

The only similarity is that they both use the word "attention". If they'd used a different label, we wouldn't even be having this conversation right now.

[0] https://news.ycombinator.com/item?id=42585600

marxisttemp · on Jan 3, 2025

It’s lazier to claim something is correct without offering a useful point as to how it’s correct. I trust in your ability to theorize.

soulofmischief · on Jan 3, 2025

ADHD is an actively-researched dopaminergic disorder with a host of possible symptoms completely unrelated to attention or hyperactivity.

It is ill-named and thus one often encounters comments such as yours in the real world, which while not meant to be negative, can be marginalizing to those with ADHD who see their disorder as misunderstood and the term misused much like people who say "I'm depressed" or "They're acting schizo again".

LLMs do not have dopamine pathways and therefore we should avoid comparing them to human-specific brain disorders, or marginalizing ADHD folk by trivializing the disorder or spreading misinformation about the presentation of ADHD. LLM hallucination does not "look a lot like ADD", that's such a vague and unsupported claim. Furthermore, "lacking attention" doesn't even make sense with respect to attention models. The "attention" in ADHD and "attention" in transformers share a semantic basis but are two very different phenomena.

robwwilliams · on Jan 3, 2025

For a good overview in ADHD see

https://www.ncbi.nlm.nih.giv/books/NBK441838/

It is not “a dopaminergic disorder” any more than many other neuropsychiatric disorders. Nothing much happens in CNS without some level of modulation by dopaminergic receptors, and to the best of my knowledge variants in these receptors are not known to contribute strongly to ADHD (I just confirmed by reviewed the GWAS Catalog: ebi.ac.uk/gwas/efotraits/EFI_oo3888 ).

Furthermoe lack of attention is considered an important facet of ADHD—-common to about 15-20% of cases.

Humans tend to think in terms of metaphors. Similes and metaphors are crucial in learning and thinking. And yes, sometimes problematic.

Explaining what is wrong with a particular metaphor can help.

taneq · on Jan 3, 2025

A fever dream looks nothing like ADD. If anything it's like a very mild mushroom trip. Did you base this on anything or did it just sound good in your head?

ethbr1 · on Jan 3, 2025

Your fever dreams and/or mushroom trips must be a lot more narratively stable and consistent than mine...

seattleeng · on Jan 3, 2025

As is usually the case, check the data! A lot of the dataset used has fairly morbid scenarios, so the model is working as expected. All the data was synthetically created with GPT4