In reading the comments here I only saw two references to Apple's local system L...

In reading the comments here I only saw two references to Apple's local system LLM. I wrote my own chat app using it and it effectively handles simple queries locally and otherwise sends queries to Apple's secure enclave servers that protect privacy, according to their privacy statement.

For tech people using Ollama and LM Studio for routine tasks works fairly well.

Some of the small Chinese models like Qwen really are good. In my workflows it is usually obvious to me if I want to use a local model or use something like Gemini 3 research with many built in tools. It takes work, but writing custom tools specific to my needs to use with LM Studio increases the fraction of use cases I can run locally.