Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The R1-Zero training process is capable of creating its own internal domain specific language (“DSL”) in token space via RL optimization.

Um, what’s that now? Really?



That is a slight exaggeration, extrapolation on the author's part. What happened was that RL training led to some emergent behavior in R1-Zero (chain-of-thought, and reflection) without being prompted or trained for explicitly. Don't see what is so domain specific about that though.


Yeah, if I understand correctly AI will create it's own internal reasoning language through RL. In R1-Zero it was already a strange mix of languages. They corrected that for R1 to make the thinking useful for humans.


Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form:

"If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: