I once spent a couple hours debugging a perl cgi script. Nothing worked. Called in my colleague. Looks fine. We both were tearing our hair out. Sent it to the line printer, ordered pizza, and one of us read the code while the other typed it in. Couple hours later we finished and it worked.
> I suspect that there is a strong correlation between programmers who don't think that there needs to be a model/theory, and those who are reporting that LLMs are speeding them up.
I also strongly agree with Lamport, but I'm curious why you don't think Ai can help in the "theory building" process, both for the original team, and a team taking over a project? I.e., understanding a code base, the algorithms, etc.? I agree this doesn't replace all the knowledge, but it can bridge a gap.
I agree, the llm _vastly_ speeds up the process of "rebuilding the theory" of dead code, even faster than the person who wrote it 3 years ago can. I've had to work on old fortran codebases before and recently had the pleasure of including ai in my method and my god, it's so much easier! I can just copy and paste every relevant function into a single prompt, say "explain this to me" and it will not only comment each line with its details, but also elucidate the deeper meaning behind the set of actions. It can tell exactly which kind of theoretical simulation the code is performing without any kind of prompting on my part, even when the functions are named things like "a" or "sv3d2". Then, i can request derivations and explanations of all relevant theory to connect to the code and come away in about 1 days worth of work with a pretty good idea of the complete execution of a couple thousand lines of detailed mathematical simulations in a languages I'm no expert in. LLMs contribution to building theory has actually been more useful to me than is contribution in writing code!
From what I've seen they're great at identifying trees and bad at mapping the forest.
In other words, they can help you identify what fairly isolated pieces of code are doing. That's helpful, but it's also the single easiest part of understanding legacy code. The real challenges are things like identifying and mapping out any instances of temporal coupling, understanding implicit business rules, and inferring undocumented contracts and invariants. And LLM coding assistants are still pretty shit at those tasks.
You could paste your entire repo into Gemini and it could map your forest and also identify the "trees".
Assuming your codebase is smaller than Gemini context window. Sometimes it makes sense to upload a package,s code into Gemini and have it summarize and identify key ideas and function. Then repeat this for every package in the repository.then combine the results . It sounds tedious but it is a rather small python program that does this for me.
I've tried doing things like that. Results reminded me of that old chestnut, "Answers: $1. Correct answers: $50."
Concrete example, last week a colleague of mine used a tool like this to help with a code & architectural review of a feature whose implementation spanned four repositories with components written in four different programming languages. As I was working my way through the review, I found multiple instances where the information provided by the LLM missed important details, and that really undermined the value of the code review. I went ahead and did it the old fashioned way, and yes it took me a few hours but also I found four defects and failure modes we previously didn't know about.
Indeed, I once worked with a developer on a contract team who was only concerned with runtime execution, no concern whatever for architecture or code clarity or whatever at all.
The client loved him, for obvious reasons, but it's hard to wrap my head around such an approach to software construction.
Another time, I almost took on a gig, but when I took one look at the code I was supposed to take over, I bailed. Probably a decade would still not be sufficient for untangling and cleaning up the code.
True vibe coding is the worst thing. It may be suitable for one-ff shell script s of < 100 line utilities and such, anything more than that and you are simple asking for trouble
> I.e., understanding a code base, the algorithms, etc.?
The big problem is that LLMs do not *understand* the code you tell them to "explain". They just take probabilistic guesses about both function and design.
Even if "that's how humans do it too", this is only the first part of building an understanding of the code. You still need to verify the guess.
There's a few limitations using LLMs for such first-guessing: In humans, the built up understanding feeds back into the guessing, as you understand the codebase more, you can intuit function and design better. You start to know patterns and conventions. The LLM will always guess from zero understanding, relying only on the averaged out training data.
A following effect is that which bunderbunder points out in their reply: while LLMs are good at identifying algorithms, mere pattern recognition, they are exceptionally bad at world-modelling the surrounding environment the program was written in and the high level goals it was meant to accomplish. Especially for any information obtained outside the code. A human can run a git-blame and ask what team the original author was on, an LLM cannot and will not.
This makes them less useful for the task. Especially in any case where you intent to write new code; Sure, it's great that the LLM can give basic explanations about a programming language or framework you don't know, but if you're going to be writing code in it, you'd be better off taking the opportunity to learn it.
To clarify my question: Based on my experience (I'm a VP for a software department), LLMs can be useful to help a team build a theory. It isn't, in and of itself, enough to build that theory: that requires hands-on practice. But it seems to greatly accelerate the process.
People always wring their hands that operating at a new, higher level of abstraction with destroy people's ability to think and reason.
But people still think and reason just fine, but now at a higher level that gives them greater power and leverage.
Do you feel like you're missing something when you "cook for yourself" but you didn't you didn't plant and harvest the vegetables, raise and butcher the protein, forge the oven, or generate the gas or electricity that heats it?
You also didn’t write the CPU microcode or the compiler that turns your code into machine language.
When you cook or code, you're already operating on top of a very tall stack of abstractions.
Nah. This is a different beast entirely. This is removing the programmer from the arena, so they'll stop having intuition about how anything works or what it means. Not more abstract; completely divorced from software and what it's capable of.
Sure, manager-types will generally be pleased when they ask AI for some vanilla app. But when it doesn't work, who will show up to make it right? When they need something more complex, will they even know how to ask for it?
It's the savages praying to Vol, the stone idol that decides everything for them, and they've forgotten their ancestors built it and it's just a machine.
I agree with your sentiment. The thing is, in the past, the abstractions supporting us were designed for our (human) use, and we had to understand their surface interface in order to be able to use them effectively.
Now, we're driving such things with AI; it follows that we will see better results if we do some of the work climbing down into the supporting abstractions to make their interface more suitable for AI use. To extend your cooking metaphor, it's time to figure out the manufactured food megafactory now; yes, we're still "cooking" in there, but you might not recognize the spatulas.
Things like language servers (LSPs) are a step in this direction: making it possible to interact with the language's parser/linter/etc before compile/runtime. I think we'll eventually see that some programming languages end up being more apropos to efficiently get working, logically organized code out of an AI; whether that is languages with "only one way to do things" and extremely robust and strict typing, or something more like a Lisp with infinite flexibility where you can make your own DSLs etc remains to be seen.
Frameworks will also evolve to be AI-friendly with more tooling akin to an LSP that allows an MCP-style interaction from the agent with the codebase to reason about it. And, ultimately, whatever is used the most and has the most examples for training will probably win...
I agree with this sentiment. Perhaps this is why there is such a senior / junior divide with LLM use. Seniors already build their theories. Juniors don't have that skill.
> Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.
> For me it’s meant a huge increase in productivity, at least 3X.
How do we reconcile these two comments? I think that's a core question of the industry right now.
My take, as a CTO, is this: we're giving people new tools, and very little training on the techniques that make those tools effective.
It's sort of like we're dropping trucks and airplanes on a generation that only knows walking and bicycles.
If you've never driven a truck before, you're going to crash a few times. Then it's easy to say "See, I told you, this new fangled truck is rubbish."
Those who practice with the truck are going to get the hang of it, and figure out two things:
1. How to drive the truck effectively, and
2. When NOT to use the truck... when talking or the bike is actually the better way to go.
We need to shift the conversation to techniques, and away from the tools. Until we do that, we're going to be forever comparing apples to oranges and talking around each other.
My biggest take so far: If you're a disciplined coder that can handle 20% of an entire project's (project being a bug through to an entire app) time being used on research, planning and breaking those plans into phases and tasks, then augmenting your workflow with AI appears to be to have large gains in productivity.
Even then you need to learn a new version of explaining it 'out loud' to get proper results.
If you're more inclined to dive in and plan as you go, and store the scope of the plan in your head because "it's easier that way" then AI 'help' will just fundamentally end up in a mess of frustration.
For me it has a big positive impact on two sides of the spectrum and not so much in the middle.
One end is larger complex new features where I spend a few days thinking about how to approach it. Usually most thought goes into how to do something complex with good performance that spans a few apps/services. I write a half page high level plan description, a set of bullets for gotchas and how to deal with them and list normal requirements. Then let Claude Code run with that. If the input is good you'll get a 90% version and then you can refactor some things or give it feedback on how to do some things more cleanly.
The other end of the spectrum is "build this simple screen using this API, like these 5 other examples". It does those well because it's almost advanced autocomplete mimicking your other code.
Where it doesn't do well for me is in the middle between those two. Some complexity, not a big plan and not simple enough to just repeat something existing. For those things it makes a mess or you end up writing a lot of instructions/prompt abs could have just done it yourself.
My experience has been entirely the opposite as an IC. If I spend the time to delve into the code base to the point that I understand how it works, AI just serves as a mild improvement in writing code as opposed to implementing it normally, saving me maybe 5 minutes on a 2 hour task.
On the other hand, I’ve found success when I have no idea how to do something and tell the AI to do it. In that case, the AI usually does the wrong thing but it can oftentimes reveal to me the methods used in the rest of the codebase.
If you know how to do something, then you can give Claude the broad strokes of how you want it done and -- if you give enough detail -- hopefully it will come back with work similar to what you would have written. In this case it's saving you on the order of minutes, but those minutes add up. There is a possibility for negative time saving if it returns garbage.
If you don't know how to do something then you can see if an AI has any ideas. This is where the big productivity gains are, hours or even days can become minutes if you are sufficiently clueless about something.
Claude will point you in the right neighborhood but to the wrong house. So if you're completely ignorant that's cool. But recognize that its probably wrong and only a starting point.
Hell, I spent 3 hours "arguing" with Claude the other day in a new domain because my intuition told me something was true. I brought out all the technical reason why it was fine but Claude kept skirting around it saying the code change was wrong.
After spending extra time researching it I found out there was a technical term for it and when I brought that up Claude finally admitted defeat. It was being a persistent little fucker before then.
My current hobby is writing concurrent/parallel systems. Oh god AI agents are terrible. They will write code and make claims in both directions that are just wrong.
> After spending extra time researching it I found out there was a technical term for it and when I brought that up Claude finally admitted defeat. It was being a persistent little fucker before then.
Whenever I feel like I need to write "Why aren't you listening to me?!" I know it's time for a walk and a change in strategy. It's also a good indicator that I'm changing too much at once and that my requirements are too poorly defined.
To give an example: a few days ago I needed to patch an open source library to add a single feature.
This is a pathologically bad case for a human. I'm in an alien codebase, I don't know where anything is. The library is vanilla JS (ES5 even!) so the only way to know the types is to read the function definitions.
If I had to accomplish this task myself, my estimate would be 1-2 days. It takes time to get read code, get orientated, understand what's going on, etc.
I set Claude on the problem. Claude diligently starts grepping, it identifies the source locations where the change needs to be made. After 10 minutes it has a patch for me.
Does it do exactly what I wanted it to do? No. But it does all the hard work. Now that I have the scaffolding it's easy to adapt the patch to do exactly what I need.
On the other hand, yesterday I had to teach Claude that writing a loop of { writeByte(...) } is not the right way to copy a buffer. Claude clearly thought that it was being very DRY by not having to duplicate the bounds check.
I remain sceptical about the vibe coders burning thousands of dollars using it in a loop. It's hardworking but stupid.
The issue is that you would be not just clueless but grown naive about the correctness of what it did.
Knowing what to do at least you can review. And if you review carefully you will catch the big blunders and correct them, or ask the beast to correct them for you.
> Claude, please generate a safe random number. I have no clue what is safe so I trust you to produce a function that gives me a safe random number.
Not every use case is sensitive, but even building pieces for entertainment, if it wipe things it shouldn't delete or drain the battery doing very inefficient operations here and there, it's junk, undesirable software.
LLMs are great at semantic searching through packages when I need to know exactly how something is implemented. If that’s a major part of your job then you’re saving a ton of time with what’s available today.
> How do we reconcile these two comments? I think that's a core question of the industry right now.
The question is, for those people who feel like things are going faster, what's the actual velocity?
A month ago I showed it a basic query of one resource I'd rewritten to use a "query builder" API. Then I showed it the "legacy" query of another resource, and asked it to do something similar. It managed to get very close on the first try, and with only a few more hours of tweaking and testing managed to get a reasonably thorough test suite to pass. I'm sure that took half the time it would have taken me to do it by hand.
Fast forward to this week, when I ran across some strange bugs, and had to spend a day or two digging into the code again, and do some major revision. Pretty sure those bugs wouldn't have happened if I'd written the code myself; but even though I reviewed the code, they went under the radar, because I hadn't really understood the code as well as I thought I had.
So was I faster overall? Or did I just offload some of the work to myself at an unpredictable point in the future? I don't "vibe code": I keep tight reign on the tool and review everything it's doing.
Easy. You're 3x more productive for a while and then you burn yourself out.
Or lose control of the codebase, which you no longer understand after weeks of vibing (since we can only think and accumulate knowledge at 1x).
Sometimes the easy way out is throwing a week of generated code away and starting over.
So that 3x doesn't come for free at all, besides API costs, there's the cost of quickly accumulating tech debt which you have to pay if this is a long term project.
You conflate efficient usage of AI with "vibing". Code can be written by AI and still follow the agreed-upon structures and rules and still can and should be thoroughly reviewed. The 3x absolutely does not come for free. But the price may have been paid in advance by learning how to use those tools best.
I agree the vibe-coding mentality is going to be a major problem. But aren't all tools used well and used badly?
It's not just about the programmer and his experience with AI tools. The problem domain and programming language(s) used for a particular project may have a large impact on how effective the AI can be.
But even on the same project with the same tools the general way a dev derives satisfaction from their work can play a big role. Some devs derive satisfaction from getting work done and care less about the code as long as it works. Others derive satisfaction from writing well architected and maintainable code. One can guess the reactions to how LLM's fit into their day to day lives for each.
Well put. It really does come down to nuance. I find Claude is amazing at writing React / Typescript. I mostly let it do it's own thing and skim the results after. I have it write Storybook components so I can visually confirm things look how I want. If something isn't quite right I'll take a look and if I can spot the problem and fix it myself, I'll do that. If I can't quickly spot it, I'll write up a prompt describing what is going on and work through it with AI assistance.
Overall, React / Typescript I heavily let Claude write the code.
The flip side of this is my server code is Ruby on Rails. Claude helps me a lot less here because this is my primary coding background. I also have a certain way I like to write Ruby. In these scenarios I'm usually asking Claude to generate tests for code I've already written and supplying lots of examples in context so the coding style matches. If I ask Claude to write something novel in Ruby I tend to use it as more of a jumping off point. It generates, I read, I refactor to my liking. Claude is still very helpful, but I tend to do more of the code writing for Ruby.
Overall, helpful for Ruby, I still write most of the code.
These are the nuances I've come to find and what works best for my coding patterns. But to your point, if you tell someone "go use Claude" and they have have a preference in how to write Ruby and they see Claude generate a bunch of Ruby they don't like, they'll likely dismiss it as "This isn't useful. It took me longer to rewrite everything than just doing it myself". Which all goes to say, time using the tools whether its Cursor, Claude Code, etc (I use OpenCode) is the biggest key but figuring out how to get over the initial hump is probably the biggest hurdle.
It is not really a nuanced take when it compares 'unassisted' coding to using a bicycle and AI-assisted coding with a truck.
I put myself somewhere in the middle in terms of how great I think LLMs are for coding, but anyone that has worked with a colleague that loves LLM coding knows how horrid it is that the team has to comb through and doublecheck their commits.
In that sense it would be equally nuanced to call AI-assisted development something like "pipe bomb coding". You toss out your code into the branch, and your non-AI'd colleagues have to quickly check if your code is a harmless tube of code or yet another contraption that quickly needs defusing before it blows up in everyone's face.
Of course that is not nuanced either, but you get the point :)
Oh nuanced the comparison seems also depends on whether you live in Arkansas or in Amsterdam.
But I disagree that your counterexample has anything at all to do with AI coding. That very same developer was perfectly capable of committing untested crap without AI. Perfectly capable of copy pasting the first answer they found on Stack Overflow. Perfectly capable of recreating utility functions over and over because they were to lazy to check if they already exist.
For this very reason I switched for TS for backend as well. I'm not a big fun of JS but the productivity gain of having shared types between frontend and backend and the Claude code proficiency with TS is immense.
I considered this, but I'm just too comfortable writing my server logic in Ruby on Rails (as I do that for my day job and side project). I'm super comfortable writing client side React / Typescript but whenever I look at server side Typescript code I'm like "I should understand what this is doing but I don't" haha.
If you aren't sanitizing and checking the inputs appropriately somewhere between the user and trusted code, you WILL get pwned.
Rails provides default ways to avoid this, but it makes it very easy to do whatever you want with user input. Rails will not necessarily throw a warning if your AI decides that it wants to directly interpolate user input into a sql query.
Well in this case, I am reading through everything that is generated for Rails because I want things to be done my way. For user input, I tend to validate everything with Zod before sending it off the backend which then flows through ActiveRecord.
I get what you're saying that AI could write something that executes user input but with the way I'm using the tools that shouldn't happen.
One thing to think about is many software devs have a very hard time with code they didn't write. I've seen many devs do a lot of work to change code to something equivalent (even with respect to performance and readability) only because it's not the way they would have done it. I could see people having a hard time using what the LLM produced without having to "fix it up" and basically re-write everything.
Yeah sometimes I feel like a unicorn because I don’t really care about code at all, so long as it conforms to decent standards and does what it needs to do. I honestly believe engineers often overestimate the importance of elegance in code too, to the point of not realising the slow down of a project due to overly perfect code is genuinely not worth it.
i dont care if the code is elegant, i care that the code is consistent.
do the same thing in the same way each time and it lets you chunk it up and skim it much easier. if there are little differences each time, you have to keep asking yourself "is it done differently here for a particular reason?"
Exactly! And besides that, new code being consistent with its surrounding code used to be a sign of careful craftsmanship (as opposed to spaghetti-against-the-wall style coding), giving me some confidence that the programmer may have considered at least the most important nasty edge cases. LLMs have rendered that signal mostly useless, of course.
Ehh, in my experience if you are using an LLM in context they are better these days at conforming to the code style around it, especially if you put it in your rules that you wish it to.
> Having spent a couple of weeks on Claude Code recently, I arrived to the conclusion that the net value for me from agentic AI is actually negative.
> For me it’s meant a huge increase in productivity, at least 3X.
> How do we reconcile these two comments? I think that's a core question of the industry right now.
Every success story with AI coding involves giving the agent enough context to succeed on a task that it can see a path to success on. And every story where it fails is a situation where it had not enough context to see a path to success on. Think about what happens with a junior software engineer: you give them a task and they either succeed or fail. If they succeed wildly, you give them a more challenging task. If they fail, you give them more guidance, more coaching, and less challenging tasks with more personal intervention from you to break it down into achievable steps.
As models and tooling becomes more advanced, the place where that balance lies shifts. The trick is to ride that sweet spot of task breakdown and guidance and supervision.
From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
In particular when details are provided, in fact.
I find that with solutions likely to be well oiled in the training data, a well formulated set of *basic* requirements often leads to a zero shot, "a" perfectly valid solution. I say "a" solution because there is still this probability (seed factor) that it will not honour part of the demands.
E.g, build a to-do list app for the browser, persist entries into a hashmap, no duplicate, can edit and delete, responsive design.
I never recall seeing an LLM kick off C++ code out of that. But I also don't recall any LLM succeeding in all these requirements, even though there aren't that many.
It may use a hash set, or even a set for persistence because it avoids duplicates out of the box. And it would even use a hash map to show it used a hashmap but as an intermediary data structure. It would be responsive, but the edit/delete buttons may not show, or may not be functional. Saving the edits may look like it worked, but did not.
The comparison with junior developers is pale. Even a mediocre developer can test its and won't pretend that it works if it doesn't even execute. If a develop lies too many times it would lose trust. We forgive these machines because they are just automatons with a label on it "can make mistakes". We have no resorts to make them speak the truth, they lie by design.
> From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
You may feel like there are all the details and no ambiguity in the prompt. But there may still be missing parts, like examples, structure, plan, or division to smaller parts (it can do that quite well if explicitly asked for). If you give too much details at once, it gets confused, but there are ways how to let the model access context as it progresses through the task.
And models are just one part of the equation. Another parts may be orchestrating agent, tools, models awareness of the tools available, documentation, and maybe even human in the loop.
I've given thousands of well detailed prompts. Of those a good enough portion yielded results that diverged from unambiguous instructions that I have stopped, long ago, being fooled into thinking instructions are interpreted by LLMs.
But if in your perspective it does work, more power to you I suppose.
> From my experience, even the top models continue to fail delivering correctness on many tasks even with all the details and no ambiguity in the input.
Please provide the examples, both of the problem and your input so we can double check.
> And every story where it fails is a situation where it had not enough context to see a path to success on.
And you know that because people are actively sharing the projects, code bases, programming languages and approaches they used? Or because your gut feeling is telling you that?
For me, agents failed with enough context, and with not enough context, and succeeded with context, or not enough, and succeeded and failed with and without "guidance and coaching"
I doubt there is much art to getting LLM work for you, despite all the hoopla. Any competent engineer can figure that much out.
The real dichotomy is this. If you are aware of the tools/APIs and the Domain, you are better off writing the code on your own, except may be shallow changes like refactorings. OTOH, if you are not familiar with the domain/tools, using a LLM gives you a huge legup by preventing you from getting stuck and providing intial momentum.
I dunno, first time I tried an LLM I was getting so annoyed because I just wanted it to go through a css file and replace all colours with variables defined in root, and it kept missing stuff and spinning and I was getting so frustrated. Then a friend told me I should instead just ask it to write a script which accomplishes that goal, and it did it perfectly in one prompt, then ran it for me, and also wrote another script to check it hadn’t missed any and ran that.
At no point when it was getting f stuck initially did it suggest another approach, or complain that it was outside its context window even though it was.
This is a perfect example of “knowing how to use an LLM” taking it from useless to useful.
Which one did you use and when was this? I mean, no body gets anything working right the first time. You got to spend a few days atleast trying to understand the tool
It’s just a simple example of how knowing how to use a tool can make all the difference, and that can be improved upon with time. I’m not sure why you’re taking umbrage with that idea.
I know this style of arguing you’re going for. If I answer your questions, you’ll attack the specific model or use case I was in, or claim it was too simple/basic a use case, or some other nitpick about the specifics instead of in good faith attempting to take my point as stated. I won’t allow you to force control of the frame of the conversation by answering your questions, also because the answers wouldn’t do anything to change the spirit of my main point.
LLM currently produce pretty mediocre code. A lot of that is a "garbage in, garbage out" issue but it's just the current state of things.
If the alternative is noob code or just not doing a task at all, then mediocre is great.
But 90% of the time I'm working in a familiar language/domain so I can grind out better code relatively quickly and do so in a way that's cohesive with nearby code in the codebase. The main use-case I have for AI in that case is writing the trivial unit tests for me.
So it's another "No Silver Bullet" technology where the problem it's fixing isn't the essential problem software engineers are facing.
I believe there IS much art in LLMs and Agents especially. Maybe you can get like 20% boost quite quickly, but there is so much room to grow it to maybe 500% long term.
It might just be me but I feel like it excels with certain languages where other situations it falls flat. Throw a well architected and documented code base in a popular language and you can definitely feel it get I to its groove.
Also giving IT tools to ensure success is just as important. MCPs can sometimes make a world of difference, especially when it needs to search you code base.
Experienced developers know when the LLM goes off the rails, and are typically better at finding useful applications. Junior developers on the other hand, can let horrible solutions pass through unchecked.
Then again, LLMs are improving so quickly, that the most recent ones help juniors to learn and understand things better.
It’s also really good for me as a very senior engineer with serious ADHD. Sometimes I get very mentally blocked, and telling Claude Code to plan and implement a feature gives me a really valuable starting point and has a way of unblocking me. For me it’s easier to elaborate off of an existing idea or starting point and refactor than start a whole big thing from zero on my own.
i don't know if anybody else has experienced this, but one of my biggest time-sucks with cursor is that it doesn't have a way for me to steer it mid-process that i'm aware of.
it'll build something that fails a test, but i know how to fix the problem. i can't jump in a manually fix it or tell it what to do. i just have to watch it churn through the problem and eventually give up and throw away a 90% good solution that i knew how to fix.
That's my anecdotal experience as well! Junior devs struggle with a lot of things:
- syntax
- iteration over an idea
- breaking down the task and verifying each step
Working with a tool like Claude that gets them started quick and iterate the solution together with them helps them tremendously and educate them on best practices in the field.
Contrast that with a seasoned developer with a domain experience, good command of the programming language and knowledge of the best practices and a clear vision of how the things can be implemented. They hardly need any help on those steps where the junior struggled and where the LLMs shine, maybe some quick check on the API, but that's mostly it. That's consistent with the finding of the study https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... that experienced developers' performance suffered when using an LLM.
What I used as a metaphor before to describe this phenomena is training wheels: kids learning how to ride a bike can get the basics with the help and safety of the wheels, but adults that already can ride a bike don't have any use for the training wheels, and can often find restricted by them.
> that experienced developers' performance suffered when using an LLM
That experiment is really non significant. A bunch of OSS devs without much training in the tools used them for very little time and found it to be a net negative.
That's been anecdotally my experience as well, I have found juniors benefitted the most so far in professional settings with lots of time spent on learning the tools. Senior devs either negatively suffered or didn't experience an improvement. The only study so far also corroborates that anecdotal experience.
We can wait for other studies that are more relevant and with larger sample sizes, but till the only folks actually trying to measure productivity experienced a negative effect so I am more inclined to believe it until other studies come along.
> Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days.
Older person here: they absolutely did, all over the place in the early 90s. I remember people decrying projects that moved them to computers everywhere I went. Doctors offices, auto mechanics, etc.
Then later, people did the same thing about the Internet (was written with a single word capital I by 2000, having been previously written as two separate words.)
> Makes me wonder if people spoke this way about “using computers” or “using the internet” in the olden days.
There were gobs of terrible road metaphors that spun out of calling the Internet the “Information Superhighway.”
Gobs and gobs of them. All self-parody to anyone who knew anything.
I hesitate to relate this to anything in the current AI era, but maybe the closest (and in a gallows humor/doomer kind of way) is the amount of exec speak on how many jobs will be replaced.
> Remember the ones who loudly proclaimed the internet to be a passing fad, not useful for normal people. All anti LLM rants taste like that to me.
For me they're very different and they sound much more the crypto-skepticism. It's not like "LLMs are worthless, there are no use cases, they should be banned" but rather "LLMs do have their use cases but they also do have inherent flaws that need to be addressed; embedding them in every product makes no sense etc.". (I mean LLMs as tech, what's happening with GenAI companies and their leaders is a completely different matter and we have every right to criticize every lie, hypocrisy and manipulation, but let's not mix up these two.)
I just find it hard to take the 3x claims at face value because actual code generation is only a small part of my job, and so Amdahl's law currently limits any productivity increase from agentic AI to well below 2x for me.
(And I believe I'm fairly typical for my team. While there are more junior folks, it's not that I'm just stuck with powerpoint or something all day. Writing code is rarely the bottleneck.)
So... either their job is really just churning out code (where do these jobs exist, and are there any jobs like this at all that still care about quality?) or the most generous explanation that I can think of is that people are really, really bad at self-evaluations of productivity.
> 2. When NOT to use the truck... when talking or the bike is actually the better way to go.
Some people write racing car code, where a truck just doesn't bring much value. Some people go into more uncharted territories, where there are no roads (so the truck will not only slow you down, it will bring a bunch of dead weight).
If the road is straight, AI is wildly good. In fact, it is probably _too_ good; but it can easily miss a turn and it will take a minute to get it on track.
I am curious if we'll able to fine tune LLMs to assist with less known paths.
Your analogy would be much better with giving workers a work horse with a mind of its own. Trucks come with clear instructions and predictable behaviour.
> Your analogy would be much better with giving workers a work horse with a mind of its own.
i think this is a very insightful comment with respect to working with LLMs. If you've ever ridden a horse you don't really tell it to walk, run, turn left, turn right, etc you have to convince it to do those things and not be too aggravating while you're at it. With a truck simple cause and effect applies but with horse it's a negotiation. I feel like working with LLMs is like a negotiation, you have to coax out of it what you're after.
My conclusion on this, as an ex VP of Engineering, is that good senior developers find little utility with LLMs and even them to be a nuisance/detriment, while for juniors, they can be godsend, as they help them with syntax and coax the solution out of them.
It's like training wheels to a bike. A toddler might find 3x utility, while a person who actually can ride a bike well will find themselves restricted by training wheels.
Three things I've noticed as a dev whose field involves a lot of niche software development.
1. LLMs seem to benefit 'hacker-type' programmers from my experience. People who tend to approach coding problems in a very "kick the TV from different angles and see if it works" strategy.
2. There seems to be two overgeneralized types of devs in the market right now: Devs who make niche software and devs who make web apps, data pipelines, and other standard industry tools. LLMs are much better at helping with the established tool development at the moment.
3. LLMs are absolute savants at making clean-ish looking surface level tech demos in ~5 minutes, they are masters of selling "themselves" to executives. Moving a demo to a production stack? Eh, results may vary to say the least.
I use LLMs extensively when they make sense for me.
One fascinating thing for me is how different everyone's experience with LLMs is. Obviously there's a lot of noise out there. With AI haters and AI tech bros kind of muddying the waters with extremist takes.
Being a consultant / programmer with feet on the ground, eh, hands on the keyboard: some orgs let us use some AI tools, others do not. Some projects are predominantly new code based on recent tech (React); others include maintaining legacy stuff on windows server and proprietary frameworks. AI is great on some tasks, but unavailable or ignorant about others. Some projects have sharp requirements (or at least, have requirements) whereas some require 39 out of 40 hours a week guessing at what the other meat-based intelligences actually want from us.
What «programming» actually entails, differs enormously; so does AI’s relevance.
I experience a productivity boost, and I believe it’s because I prevent LLMs from making
design choices or handling creative tasks. They’re best used as a "code monkey", fill in function bodies once I’ve defined them. I design the data structures, functions, and classes myself. LLMs also help with learning new libraries by providing examples, and they can even write unit tests that I manually
check. Importantly, no code I haven’t read and accepted ever gets committed.
Then I see people doing things like "write an app for ....", run, hey it works! WTF?
It's pretty simple, AI is now political for a lot of people. Some folks have a vested interest in downplaying it or over hyping it rather than impartially approaching it as a tool.
It’s also just not consistent. A manager who can’t code using it to generate a react todo list thinks it’s 100x efficiency while a senior software dev working on established apps finds it a net productivity negative.
AI coding tools seem to excel at demos and flop on the field so the expectation disconnect between managers and actual workers is massive.
> As mentioned in the article, the big trick is having clear specs
I've been building a programming language using Claude, and this is my findings, too.
Which, after discovering this, makes sense. There are a LOT of small decisions that go into programming. Without detailed guidance, LLMs will end up making educated guesses for a lot of these decision, many of which will be incorrect. This creates a compounding effect where the net effect is a wrong solution.
Jon has been around for a long time, and I've been a long-time fan. He's a bit of an elder: he's seen a lot of things, and he's worth listening to.
> I’m a fan of web components but it’s the React flavor that dominate and they are not accessible to the kind of developer who could productively use Visual Basic components back in the day.
I think this is the most important statement in the piece. The rest of the post explains the technical details, but this explains _why_ this exists. This is a bold statement, but I don't think he's wrong.
Now, is XMLUI the _right_ set of abstractions to help these developers? We'll have to wait an see. But I love to see the attempt!
The funny thing about this - and I don't know whether this is as good or a bad thing - is that you could probably almost implement this XMLUI entirely in React/JSX.
Glancing over the markup, none of this seems too alien to React.
This all seems not too dissimilar to the generic pages-as-JSON DSL that every CMS/developer reinvents every other project.
Windows is a narrower platform than evergreen JS-enabled browsers, and the relevance of Windows as a platform is only going to go down. So the choice of the platform is right.
What are some GUI-enabled devices you wish you were able to address, but which cannot run Firefox, Safari, or a Chromium derivative?
I've been experimenting with model-based development lately, and this resonated strongly with me.
The section "What Even Is Programming Anymore?" hit on a lot of the thoughts and feels I've been going through. I'm using all my 25+ years of experience and CS training, but it's _not_ programming per se.
I feel like we're entering an era where we're piloting a set of tools, not hand crafting code. I think a lot of people (who love crafting) will be leaving the industry in the next 5 years, for better or worse. We'll still need to craft things by hand, but we're opening some doors to new methodologies.
And, right now, those methodologies are being discovered, and most of us are pretty bad at them. But that doesn't mean they're not going to be part of the industry.
> I think a lot of people (who love crafting) will be leaving the industry in the next 5 years, for better or worse.
I think you're spot on. It was once necessary to acquire knowledge in order to acquire productivity. This made knowledge valuable and worth attaining. Now, with LLMs, we we can skip the middle man and go straight to the acquisition of productivity. I'd call it the democratisation of knowledge, but it's something more than that — knowledge just isn't needed anymore.
This kind of makes sense. I love acquiring knowledge, just for knowledges sake. That’s why I got into math and programming in the first place. I just like learning stuff. It was a happy accident that the stuff I like doing happens to also be lucrative. Kind of sucks that all that stuff I spent my life doing is losing value by the day. I love programming because there are an endless amount of things to learn. I actually care very little about the productivity.
Which makes me wonder: why shouldn't AI use be a core part of technical interviews now? If we're trading "knowledge of how to code" for "knowledge of how to use LLMs" and treating Claude like a much more productive junior engineer, do we still need to filter candidates on their ability to write a doubly linked list? And if we are still asking those sorts of interview questions, why shouldn't candidates be able to use LLMs in the process? After all, like others have said elsewhere in this thread, you still need some core technical capabilities to effectively manage the AI. It's just a tool and we can judge how well someone knows how to use it or not.
There's a fundamental problem with your question, which is the suggestion that technical interviews are still needed at all. Technical roles can be replaced with roles that are more akin to production managers, and we should absolutely be testing those candidates for LLM competency.
I would argue we still need the knowledge: the principles aren't changing, and they are needed to be truly productive in certain things. But the application of those principles _are_ changing.
What else should people do? If we just saturate at "wow this is amazing!" there's nothing to talk about, nothing to evaluate, nothing to push the boundaries forward further (or caution against doing so).
Yes, we're all impressed, but it's time to move on and start looking at where the frontier is and who's on it.
LLMs are really good with words and kind of crap at “thinking.” Humans are wired to see these two things as tightly connected. A machine that thinks poorly and talks great is inherently confusing. A lot of discussion and disputes around LLMs comes down to this.
It wasn’t that long ago that the Turing Test was seen as the gold standard of whether a machine was actually intelligent. LLMs blew past that benchmark a year or two ago and people barely noticed. This might be moving the goalposts, but I see it as a realization that thought and language are less inherently connected than we thought.
So yeah, the fact that they even do this well is pretty amazing, but they sound like they should be doing so much better.
> LLMs are really good with words and kind of crap at “thinking.” Humans are wired to see these two things as tightly connected. A machine that thinks poorly and talks great is inherently confusing. A lot of discussion and disputes around LLMs comes down to this.
It's not an unfamiliar phenomenon in humans. Look at Malcolm Gladwell.
I spent a lot of time thinking about this recently. Ultimately, English is not a clean, deterministic abstraction layer. This isn't to say that LLMs aren't useful, and can create some great efficiencies.
It ended up being a missing semicolon in an odd spot and the compiler was just confused.
I remember walking homing thinking, "hey, if I can survive that, maybe I can just hack this CS thing..."