Yes, you can create a writing-focused model through distillation, but it's trick...

mercer · on Jan 26, 2025

I understood that at least some of these big models (llama?) is basically bootstrapped with code. is there truth to that?

ben30 · on Jan 26, 2025

Yes, code is a key training component. Open-Llama explicitly used programming data as one of seven training components. However, newer models like Llama 3.1 405B have shifted to using synthetic data instead. Code helps develop structured reasoning patterns but isn't the sole foundation - models combine it with general web text, books, etc.