These things work well on the extremely limited task impetus that we give them. Even if we sidestep the question of whether or not LLMs are actually on the path to AGI, Imagine instead the amount of computing and electrical power required with current computing methods and hardware in order to respond to and process all the input handled by a person at every moment of the day. Somewhere in between current inputs and handling the full load of inputs the brain handles may lie “AGI” but it’s not clear there is anything like that on the near horizon, if only because of computing power constraints.
Way back, Perl got off the ground really because, in contrast to the C compilers of the era, code written on one Unix ran on the others, usually unmodified. In my first jobs, where we had heterogeneous mixes of commercial Unixes, this was unbeatable. It also wrote like higher level shell, which made it easy to learn for systems people, who really were the only ones that cared about running things on multiple platforms most of the time anyway.
As things became more homogeneous, and furthermore as other languages also could do that “one weird trick” of cross platform support, the shortcomings of both Perl and its community came to the fore.
My guess is that you're letting the context get polluted with all the stuff it's reading in your repo. Try using subagents to keep the top level context clean. It only starts to forget rules (mostly) when the context is too full of other stuff and the amount taken up by the rules is small.
To a certain extent you are probably still not using it optimally if you are still doing that much work to clean it up. We, for example, asked the LLM to analyze the codebase for the common patterns we use and to write a document for AI agents to do better work on the codebase. I edited it and had it take a couple of passes. We then provide that doc as part of the requirements we feed to it. That made a big difference. We wrote specific instructions on how to structure tests, where to find common utilities, etc. We wrote pre-commit hooks to help double check its work. Every time we see something it’s doing that it shouldn’t, it goes in the instructions. Now it mostly does 85-90% quality work. Yes it requires human review and some small changes. Not sure how the thing works that it built? Before reviewing the code, have it draw a Mermaid sequence diagram.
We found it mostly starts to abandon instructions when the context gets too polluted. Subagents really help address that by not loading the top context with the content of all your files.
Another tip: give it feedback as PR comments and have it read them with the gh CLI. This is faster than hand editing the code yourself a lot of times. While it cleans up its own work you can be doing something else.
Interesting, I actually do have a coding-guidelines.md file for that purpose, but I hadn't thought of having the LLM either generate it, or maintain it; good idea! :-)
As AI tools become more dominant, businesses are going to want their documents to be fully read by their AI in whatever format they are in. I wouldn’t be surprised to see a fight over all of this brewing in the next couple of years.
Author here. Nearly 30 years as a UNIX sysadmin. Yes, I know about POSIX ipc. It's not well supported on the BEAM, and anyway would violate most of what makes Erlang/Elixir apps so robust. Not really an option worth even considering.
Author here. I know all about POSIX IPC (nearly 30 years UNIX admin here). It's not supported on the BEAM in a sensible way. Not to mention that if it were implemented, it would almost certainly violate the sensible failure domain management of the Erlang/BEAM ecosystem and OTP.
As for GRPC: we don't use it anywhere else. So we'd have needed all of the tooling on two stacks, calling patterns without native error handling that we'd have had to implement, and instead we have a thin wrapper that allows the Elixir app to call Go like native Elixir code. GRPC would have been worse in almost every way.