Not trying to be ironic but it would be interesting to see what this below would...

Not trying to be ironic but it would be interesting to see what this below would look like in the strange mix form:

"If the model's actions involve generating tokens (like in language models), then optimizing these token outputs to maximize reward could lead the model to develop a consistent, efficient way of using tokens that's specific to the problem domain. This might look like a DSL because the tokens are used in a structured, perhaps abbreviated or symbolic way that's efficient for the task, not necessarily human-readable but effective for the model's internal processing."