There's a fairly low ceiling for max context tokens no matter the size of the model. Your hobby/small codebase may work, but for large codebases, you will need to do RAG and currently it's not perfect at absorbing the codebase and being able to answer questions on it.
Thank you, I experimented in that direction as well.
But for my actual codebase, that is sadly not 100% clear code, it would require lots and lots of work, to give examples so it has enough of the right context, to work good enough.
While working I am jumping a lot between context and files. Where a LLM hopefully one day will be helpful, will be refactoring it all. But currently I would need to spend more time setting up context, than solving it myself.
With limited scope, like in your example - I do use LLMs regulary.