This note contains four papers for "historical perspective"... which would usually mean "no longer directly relevant", although I'm not sure that's really what the author means.
You might be looking for the author's "Understanding Large Language Models" post [1] instead.
Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.
and https://news.ycombinator.com/item?id=23649542 gives some context to the "For instance, in 1991, which is about two-and-a-half decades before the original transformer paper above ("Attention Is All You Need")"
> Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.
100%! LOL. I was traveling and typing this on a mobile device. Must have been some weird autocorrect/autocomplete. Strange. And I didn't even notice. Thanks!
You might be looking for the author's "Understanding Large Language Models" post [1] instead.
Misspelling "Attention is All Your Need" twice in one paragraph makes for a rough start to the linked post.
[1] https://magazine.sebastianraschka.com/p/understanding-large-...