Yeah, I'm broadly aware and have seen a few of the papers, though I definitely d...

Yeah, I'm broadly aware and have seen a few of the papers, though I definitely don't try and track the state of the art here closely.

My impression and experience trying low bit quants (which could easily be outdated by now) is that you are/were better off with a smaller model and a less aggressive quantization (provided you have access to said smaller model with otherwise equally good training). If that's changed I'd be interested to hear about it, but definitely don't want to make work for you digging up papers.