Yes, you can create a writing-focused model through distillation, but it's tricky. *Complete removal* of math/coding abilities is challenging because language models' knowledge is interconnected - the logical thinking that helps solve equations also helps structure coherent stories.
Yes, code is a key training component. Open-Llama explicitly used programming data as one of seven training components. However, newer models like Llama 3.1 405B have shifted to using synthetic data instead. Code helps develop structured reasoning patterns but isn't the sole foundation - models combine it with general web text, books, etc.