> The R1-Zero training process is capable of creating its own internal domain sp...

nlpnerd · 2025-01-30T04:08:55 1738210135

That is a slight exaggeration, extrapolation on the author's part. What happened was that RL training led to some emergent behavior in R1-Zero (chain-of-thought, and reflection) without being prompted or trained for explicitly. Don't see what is so domain specific about that though.

svdr · 2025-01-29T21:17:31 1738185451

Yeah, if I understand correctly AI will create it's own internal reasoning language through RL. In R1-Zero it was already a strange mix of languages. They corrected that for R1 to make the thinking useful for humans.

dutchbookmaker · 2025-01-31T03:24:49 1738293889

Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form:

"If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing."