I had to do that too, in Norway. Writing C++ code with pen and paper and being told even trivial syntax errors like missing semicolons would be penalised was not fun.
This was 30 years ago, though - no idea what it is like now. It didn't feel very meaningful even then.
But there's a vast chasm between that and letting people use AI in an exam setting. Some middle ground would be nice.
You should worry about code quality, but you should also worry about the return on investment.
That includes understanding risk management and knowing what the risks and costs are of failures vs. the costs of delivering higher quality.
Engineering is about making the right tradeoffs given the constraints set, not about building the best possible product separate from the constraints.
Sometimes those constraints requires extreme quality, because it includes things like "this should never, ever fail", but most of the time it does not.
> A good analogy here is programming in assembler. Manually crafting programs at the machine code level was very common when I got my first computer in the 1980s. Especially for games. By the late 90s that had mostly disappeared.
Indeed, a lot of us looked with suspicion and disdain at people that used those primitive compilers that generated awful, slow code. I once spent ages hand-optimizing a component that had been written in C, and took great pleasure in the fact I could delete about every other line of disassembly...
When I wrote my first compiler a couple of years later, it was in assembler at first, and supported inline assembler so I could gradually convert to bootstrap it that way.
Because I couldn't imagine writing it in C, given the awful code the C compilers I had available generated (and how slow they were)...
These days most programmers don't know assembler, and increasingly don't know languaes as low level as C either.
And the world didn't fall apart.
People will complain that it is necessary for them to know the languages that will slowly be eaten away by LLMs, just like my generation argued it was absolutely necessary to know assembler if you wanted to be able to develop anything of substance.
I agree with you people should understand how things work, though, even if they don't know it well enough to build it from scratch.
If you're using it 24h/day you probably will run into it unless you're very careful about managing context and/or the requests are punctuated by long-running tool use (e.g. time-consuming test suites).
I'm on the $200/month plan, and I do have Claude running unattended for hours at a time. I have hit the weekly limits at times of particularly aggressive use (multiple sessions in parallel for hours at a time) but since it's involved more than one session at the time, I'm not really sure how close I got to the equivalent of one session 24/7.
I don't know about something this complex, but right this moment I have something similar running in Claude Code in another window, and it is very helpful even with a much simpler setup:
If you have these agents do everything at the "top level" they lose track. The moment you introduce sub-agents, you can have the top level run in a tight loop of "tell agent X to do the next task; tell agent Y to review the work; repeat" or similar (add as many agents as makes sense), and it will take a long time to fill up the context. The agents get fresh context, and you get to manage explicitly what information is allowed to flow between them. It also tends to mean it is a lot easier to introduce quality gates - eg. your testing agent and your code review agent etc. will not decide they can skip testing because they "know" they implemented things correctly, because there is no memory of that in their context.
Humans seem to be similar. If a real product designer would dive into all the technical details and code of a product, he would likely forget at least some of the vision behind what the product is actually supposed to be.
It isn't "just" sub agents, but you can achieve most of this just with a few agents that take on generic roles, and a skill or command that just tells claude to orchestrate those agents, and a CLAUDE.md that tells it how to maintain plans and task lists, and how to allow the agents to communicate their progress.
It isn't all that hard to bootstrap. It is, however, something most people don't think about and shouldn't need to have to learn how to cobble together themselves, and I'm sure there will be advantages to getting more sophisticated implementations.
Right, but the model is still: you tell the AI what to do, this is the AI tells other AIs what to do. The context makes a huge difference because it has to be able to run autonomously. It is possible to do this with SDK and the workflow is completely different.
It is very difficult to manage task lists in context. Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting. It is possible that they have worked out some way to do this, but when you have tens of tasks, merge conflicts, you are running that prompt over months, etc. At best, it doesn't work. At worst, you are burning a lot of tokens for nothing.
It is hard to bootstrap because this isn't how Claude Code works. If you are just using OpenRouter, it is also not easy because, after setting up tools/rebuilding Claude Code, it is very challenging to setup an environment so the AI can work effectively, errors can be returned, questions returned, etc. Afaik, this is basically what Aider does...it is not easy, it is especially not easy in Claude Code which has a lot of binding choices from the business strategy that Anthropic picked.
> Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting.
You ask if I've tried to do this, and then set constraints that are completely different to what I described.
I have done what I described. Several times for different projects. I have a setup like that running right now in a different window.
> It is hard to bootstrap because this isn't how Claude Code works.
It is how Claude Code works when you give it a number of sub-agents with rules for how to manage files that effectively works like task queues, or skills/mcp servers to interact with communications tools.
> it is not easy
It is not easy to do in a generic way that works without tweaks for every project and every user. It is reasonably easy to do for specific teams where you can adjust it to the desired workflows.
I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.
No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is. Again, I would suggest you trying to actually build something like this. Why do you think Anthropic are doing this? They just don't understand anything about their product?
No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work. The constraints that I set are the ones that you require to build this...because I have done this. You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.
> I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.
And yet I have used them exactly in the way I described. That you assume they can't just demonstrate that you haven't tried very hard.
> No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is.
Claude allows your setup to execute arbitrary code that gets injected into context. The entire point is that you don't need to rely on built in capabilities of Claude Code to do any of this.
> No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work.
I know what I described works because I am doing it. You can achieve what I described in a variety of ways: Using skills to tell the agents how to access a shared communications channel. Using MCP servers. Just using CLAUDE.md and describe how to use files as a shared communications channel.
This is only difficult if you lack imagination.
> You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.
No, the exact opposite: I'm saying that this isn't hard, that it isn't anything revolutionary or even special. It's pretty basic usage of the existing facilities. There's no invention there.
You're the one trying to imply this is more revolutionary than it is.
They work for two entirely different things. The problem with these pipelines is that unless the latency is very low they simply aren't suitable replacements for Alexa etc. For that use case, low latency beats smarts.
OpenCode also would be incentivized to do things like having you configure multiple providers and route requests to cheaper providers where possible.
Controlling the coding tool absolutely is a major asset, and will be an even greater asset as the improvements in each model iteration makes it matter less which specific model you're using.
It's absolutely a work-around in part, but use sub-agents, have the top level pass in the data, and limit the tool use for the sub-agent (the front matter can specify allowed tools) so it can't read more.
(And once you've done that, also consider whether a given task can be achieved with a dumber model - I've had good luck switching some of my sub-agents to Haiku).
This was 30 years ago, though - no idea what it is like now. It didn't feel very meaningful even then.
But there's a vast chasm between that and letting people use AI in an exam setting. Some middle ground would be nice.
reply