Small correct, I'm not just asking it to convert ARM NEON to SIMD, but for the function handling q6_K_q8_K, I asked it to reinvent a new approach (without giving it any prior examples). The reason I did that was because it failed writing this function 4 times so far.
And a bit of context here, I was doing this during my Sunday and the time budget is 2 days to finish.
I wanted to optimize wllama (wasm wrapper for llama.cpp that I maintain) to run deepseek distill 1.5B faster. Wllama is totally a weekend project and I can never spend more than 2 consecutive days on it.
Between 2 choices: (1) to take time to do it myself then maybe give up, or (2) try prompting LLM to do that and maybe give up (at worst, it just give me hallucinated answer), I choose the second option since I was quite sleepy.
So yeah, turns out it was a great success in the given context. Just does it job, saves my weekend.
Some of you may ask, why not trying ChatGPT or Claude in the first place? Well, short answer is: my input is too long, these platforms straight up refuse to give me the answer :)
Small correct, I'm not just asking it to convert ARM NEON to SIMD, but for the function handling q6_K_q8_K, I asked it to reinvent a new approach (without giving it any prior examples). The reason I did that was because it failed writing this function 4 times so far.
And a bit of context here, I was doing this during my Sunday and the time budget is 2 days to finish.
I wanted to optimize wllama (wasm wrapper for llama.cpp that I maintain) to run deepseek distill 1.5B faster. Wllama is totally a weekend project and I can never spend more than 2 consecutive days on it.
Between 2 choices: (1) to take time to do it myself then maybe give up, or (2) try prompting LLM to do that and maybe give up (at worst, it just give me hallucinated answer), I choose the second option since I was quite sleepy.
So yeah, turns out it was a great success in the given context. Just does it job, saves my weekend.
Some of you may ask, why not trying ChatGPT or Claude in the first place? Well, short answer is: my input is too long, these platforms straight up refuse to give me the answer :)