The authors mention that Jacobi decoding is equivalent to greedy autoregressive ...

snyhlxde · on May 8, 2024

Yes this is a great question! We are actively working on supporting other sampling strategies other than greedy sampling. In the context of CLLM training, instead of mapping to a static fixed point obtained from Jacobi decoding as the training ojbective, we term it dynamic fixed point. You can keep an eye on our github repo for new progress.

matheist · on May 8, 2024

Agreed. It's straightforward to check that a token was the argmax, but it seems difficult to check that a token appeared with the probability you wanted it to. You could still do the fine-tuning step I guess, where you train the trajectories to approach n-token completions with the statistics you want, but I can't see how you can replace the "check for a fixed point" step. Maybe "check the result was above this fixed threshold for likelihood".