Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This note contains four papers for "historical perspective"... which would usually mean "no longer directly relevant", although I'm not sure that's really what the author means.

You might be looking for the author's "Understanding Large Language Models" post [1] instead.

Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.

[1] https://magazine.sebastianraschka.com/p/understanding-large-...



> which would usually mean "no longer directly relevant"

Or it could mean the lesson from these papers has been assimilated and spread wide and far, thus they are no longer "news". The pre-layernorm is one.


also one of these papers is from Schmidhuber

and https://news.ycombinator.com/item?id=23649542 gives some context to the "For instance, in 1991, which is about two-and-a-half decades before the original transformer paper above ("Attention Is All You Need")"


> Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.

100%! LOL. I was traveling and typing this on a mobile device. Must have been some weird autocorrect/autocomplete. Strange. And I didn't even notice. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: