A first-order Markov-chain text generator is not complex at all; it's a one-line...

chilipepperhott · 2025-09-23T19:58:36 1758657516

This is actually sick! I have to admit, my implementation is far more (likely unnecessarily) complex.

kragen · 2025-09-23T20:07:00 1758658020

Thanks! I think! Your implementation might be easier to understand and modify, too, although perhaps not because of being unnecessarily complex.

3abiton · 2025-09-23T20:08:09 1758658089

I brushed with MCMC decades ago, and your code is truely beautiful

kragen · 2025-09-23T20:37:00 1758659820

MCMC is considerably more complicated, though!

1718627440 · 2025-09-23T21:10:12 1758661812

What????????? Given that, how was it not obvious for decades, that this devolves into something like the current LLMs? I give it only the first few paragraphs of a random man page and it already passes for a human, that tries to form nonsense sentences, while having mostly correct grammar, but not always. LLMs now feel not very technically advanced. The code isolation in their interface seams to be more impressive than this.

imtringued · 2025-09-24T15:09:45 1758726585

If there was an array programming language with automatic differentiation and a focus on writing neural networks, I'm sure you could write a transformer based LLM including how to train it on a napkin.

kragen · 2025-09-24T15:13:13 1758726793

I think reverse-mode automatic differentiation is significantly harder to implement than a string-indexed hash table and delimiter splitting, but maybe that's not important when those are such a small part of an interpreter for Perl or even Awk?

How big is a Transformer in TensorFlow in Python?

kragen · 2025-09-24T18:40:48 1758739248

(I guess I should have pointed out that forward-mode automatic differentiation was only about 100–150 lines of code when I implemented it in http://canonical.org/~kragen/sw/81hacks/autodiffgraph/, depending on where you draw the lines. But gradient descent isn't practical with forward-mode autodiff unless it's with a very small number of independent variables. Also even 100–150 lines of code is significantly bigger than a hash table and delimiter splitting.)

kragen · 2025-09-23T21:30:10 1758663010

If you only give it a few paragraphs, most words only occur once, so it ends up copying whole phrases from the input until it hits a common word. Computers are very good at sounding like human writing when they're just reproducing things that humans in fact did write. LLMs mostly aren't doing that, though.

1718627440 · 2025-09-23T21:36:17 1758663377

You're right, I was a bit too enthusiastic. I've feed it random C code (both normal and already preprocessed) and it produces random garbage, not remotely passable for syntax. Also it seems to really like emitting parts of copyright messages, since they occur verbatim everywhere.

Happy part of today's 10'000.

kragen · 2025-09-23T23:00:37 1758668437

Welcome!