Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, you can create a writing-focused model through distillation, but it's tricky. *Complete removal* of math/coding abilities is challenging because language models' knowledge is interconnected - the logical thinking that helps solve equations also helps structure coherent stories.


I understood that at least some of these big models (llama?) is basically bootstrapped with code. is there truth to that?


Yes, code is a key training component. Open-Llama explicitly used programming data as one of seven training components. However, newer models like Llama 3.1 405B have shifted to using synthetic data instead. Code helps develop structured reasoning patterns but isn't the sole foundation - models combine it with general web text, books, etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: