Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The authors mention that Jacobi decoding is equivalent to greedy autoregressive decoding, but in practice don't we often want the sampling temperature to be above zero to avoid repetitions and excessively generic responses?

I'm completely unfamiliar with this decoding strategy so maybe I'm just missing a simple way to account for that.



Yes this is a great question! We are actively working on supporting other sampling strategies other than greedy sampling. In the context of CLLM training, instead of mapping to a static fixed point obtained from Jacobi decoding as the training ojbective, we term it dynamic fixed point. You can keep an eye on our github repo for new progress.


Agreed. It's straightforward to check that a token was the argmax, but it seems difficult to check that a token appeared with the probability you wanted it to. You could still do the fine-tuning step I guess, where you train the trajectories to approach n-token completions with the statistics you want, but I can't see how you can replace the "check for a fixed point" step. Maybe "check the result was above this fixed threshold for likelihood".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: