I use Augment with Claud Opus 4.5 every day at my job. I barely ever write code by hand anymore. I don't blindly accept the code that it writes, I iterate with it. We review code at my work. I have absolutely found a lot of benefit from my tools.
I've implemented several medium-scale projects that I anticipate would have taken 1-2 weeks manually, and took a day or so using agentic tools.
A few very concrete advantages I've found:
* I can spin up several agents in parallel and cycle between them. Reviewing the output of one while the others crank away.
* It's greatly improved my ability in languages I'm not expert in. For example, I wrote a Chrome extension which I've maintained for a decade or so. I'm quite weak in Javascript. I pointed Antigravity at it and gave it a very open-ended prompt (basically, "improve this extension") and in about five minutes in vastly improved the quality of the extension (better UI, performance, removed dependencies). The improvements may have been easy for someone expert in JS, but I'm not.
Here's the approach I follow that works pretty well:
1. Tell the agent your spec, as clearly as possible. Tell the agent to analyze the code and make a plan based on your spec. Tell the agent to not make any changes without consulting you.
2. Iterate on the plan with the agent until you think it's a good idea.
3. Have the agent implement your plan step by step. Tell the agent to pause and get your input between each step.
4. Between each step, look at what the agent did and tell it to make any corrections or modifications to the plan you notice. (I find that it helps to remind them what the overall plan is because sometimes they forget...).
5. Once the code is completed (or even between each step), I like to run a code-cleanup subagent that maintains the logic but improves style (factors out magic constants, helper functions, etc.)
This works quite well for me. Since these are text-based interfaces, I find that clarity of prose makes a big difference. Being very careful and explicit about the spec you provide to the agent is crucial.
This. I use it for coding in a Rails app when I'm not a Ruby expert. I can read the code, but writing it is painful, and so having the LLM write the code is beneficial. It's definitely faster than if I was writing the code, and probably produces better code than I would write.
I've been a professional software developer for >30 years, and this is the biggest revolution I've seen in the industry. It is going to change everything we do. There will be winners and losers, and we will make a lot of mistakes, as usual, but I'm optimistic about the outcome.
Agreed. In the domains where I'm an expert, it's a nice productivity boost. In the domains where I'm not, it's transformative.
As a complete aside from the question of productivity, these coding tools have reawakened a love of programming in me. I've been coding for long enough that the nitty gritty of everyday programming just feels like a slog - decrypting compiler errors, fixing type checking issues, factoring out helper functions, whatever. With these tools, I get to think about code at a much higher level. I create designs and high level ideas and the AI does all the annoying detail work.
I'm sure there are other people for whom those tasks feel like an interesting and satisfying puzzle, but for me it's been very liberating to escape from them.
No, I'm quite confident that I'm very strong in these languages. Certainly not world-class but I write very good code and I know well-written code when I see it.
If you'd like some evidence, I literally just flipped a feature flag to change how we use queues to orchestrate workflows. The bulk of this new feature was introduced in a 1300-line PR, touching at least four different services, written in Golang and Python. It was very much AI agent driven using the flow I described. Enabling the feature worked the first time without a hiccup.
(To forestall the inevitable quibble, I am aware that very large PRs are against best practice and it's preferable to use smaller, stacked PRs. In this case for clarity purposes and atomicity of rollbacks I judged it preferable to use a single large PR.)
You are welcome to use whatever definition of "small/medium/large" you like. Like you, 1-2 weeks is also far from the largest project I've worked on. I don't think that's particularly relevant to the point of my post.
The point that I'm trying to emphasize is that I've had success with it on projects of some scale, where you are implementing (e.g.) multiple related PRs in different services. I'm not just using it on very tightly scoped tasks like "implement this function".
The observation I was trying to make is that at the scope of one week, there's very little you actually get done, and it's likely mostly mechanical work. Given that, I suppose I'm unsurprised LLMs are proving useful. Seems like that's the type of thing they're excelling at.
That's not my experience. I agree that a project of any real size takes quite a bit longer than a week. But it's composed of lots of, well, week or two long subprojects. And if the AI coding tool is condensing week long projects into a day, that's a huge benefit.
Concretely speaking (well as concretely as I feel like being without piercing pseudonymity), at my last job I worked on a multi year rewrite of one of our core services. Within that rewrite were ton of much smaller projects that were a few weeks to a month long - refactor this algorithm, improve the load balancing, add a new sharding strategy, etc. An AI tool would definitely not have sped up the whole process. It's not going to, say, speed up figuring out and handling intra-team dependencies or figuring out product design. But speeding up those smaller coding subprojects would have been a huge benefit.
I'm not making any strong claims in my post. I don't have the experience of AI projects allowing me to one shot large projects. But OP asked if anyone has concrete experience with AI coding tools speeding up development, and the answer is yes, I do.
1. And 2. I.e. creating a spec which is the source of truth (or spec driven development) is key to getting anything production grade from our experience.
Yes. This was the key thing I learned that let me set the agents loose on larger tasks. Before I started iterating on specs with them, I mostly had them doing very small scale, refactor-this-function style tasks.
The other advice I've read that I haven't yet internalized as much is to use an "adversarial" approach with the LLMs: i.e. give them a rigid framework that they have to code against. So, e.g., generate tests that the code has to work against, or sample output that the code has to perfectly match. My agents do write tests as part of their work, and I use them to verify correctness, but I haven't updated my flow to emphasize that the agents should start with those, and iterate on them before working on the main implementation.
Interesting. What would make the workflow "agentic" in your mind? The AI implementing the task fully autonomously, never getting any human feedback?
To me "agentic" in this context essentially that the LLM has the ability to operate autonomously, so execute tools on my behalf, etc. So for example my coding agents will often run unit tests, run code generation tools, etc. I've even used my agents to fix issues with git pre-commit hooks, in which case they've operated in a loop, repeatedly trying to check in code and fixing errors they see in the output.
So in that sense they are theoretically capable of one-shot implementing any task I set them to, their quality is just not good enough yet to trust them to. But maybe you mean something different?
IMHO, agentic workflow is the autonomous execution of a detailed plan. Back-and-forth between LLM and developer is fine in the planning stage. Then, the agent is supposed to overcome any difficulties or devise solutions to unplanned situations. Otherwise, Cursor had been able to develop in a tight loop of writing and running tests, followed by fixing bugs, before “agentic” became a buzzword.
Perhaps “agentic” initially referred to this simple loop, but the milestone was achieved so quickly that the meaning shifted. Regardless, I could be wrong.
Yeah, I have no idea what the consensus definition of the term is, and I suppose I can't say for sure what OP meant. I haven't used Cursor. My understanding was that it exercises IDE functions but does not execute arbitrary shell commands, maybe I'm wrong. I've specifically had good experiences with the tools being able to run arbitrary commands (like the git debugging example I mentioned).
In my experience reading discussions like this, people seem to be saying that they don't believe that Claude Code and similar tools provide much of a productivity boost on relatively open ended domains (i.e. the AI is driving the writing of the code, not just assisting you in writing your own code faster). And that's certainly not my experience.
I agree with you that success with the initial milestone ("agent operates in a self-contained loop and can execute arbitrary commands") was achieved pretty quickly. But in my experience a lot of people don't believe this. :-)
> Tell the agent your spec, as clearly as possible.
I have recently added a step before that when beginning a project with Claude Code: invoke the AskUserQuestionTool and have it ask me questions about what I want to do and what approaches I prefer. It helps to clarify my thinking, and the specs it then produces are much better than if I had written them myself.
I should note, though, that I am a pure vibe coder. I don't understand any programming language well enough to identify problems in code by looking at it. When I want to check whether working code produced by Claude might still contain bugs, I have Gemini and Codex check it as well. They always find problems, which I then ask Claude to fix.
None of what I produce this way is mission-critical or for commercial use. My current hobby project, still in progress, is a Japanese-English dictionary:
Great idea! That's actually the very next improvement I was planning on making to my coding flow: building a sub agent that is purely designed to study the codebase and create a structured implementation plan. Every large project I work on has the same basic initial steps (study the codebase, discuss the plan with me, etc) so it makes sense to formalize this in an agent I specialize for the purpose.
:-) I feel you. Perhaps I should have ended my post with "Would you like me to construct a good prompt for your planning agent?" to really drive us into the uncanny valley?
(My writing style is very dry and to the point, you may have noticed. I looked at my post and thought, "Huh, I should try and emotionally engage with this poster, we seem like we're having a shared experience." And so I figured, heck, I'll throw in an enthusiastic interjection. When I was in college, my friends told me I had "bonsai emotions" and I suppose that still comes through in my writing style...)
The OP is "quite weak at JavaScript" but their AI "vastly improved the quality of the extension." Like, my dude, how can you tell? Does the code look polished, it looks smart, the tests pass, or what?! How can you come forward and be the judge of something you're not an expert in?
I mean, at this point, I'm beginning to be skeptical about half the content posted online. Anybody can come up with any damn story and make it credible. Just the other day I found out about reddit engagement bots, and I've seen some in the wild myself.
I'm waiting for the internet bubble to burst already so we can all go back to our normal lives, where we've left it 20 years or so ago.
How can I tell? Yes, the code looks quite a bit more polished. I'm not expert enough in JS to, e.g., know the cleanest method to inspect and modify the DOM, but I can look at code that does and tell if the approach it's using is sensible or not. Surely you've had the experience of a domain where you can evaluate the quality of the end product, even if you can't create a high quality product on your own?
Concretely in this case, I'd implemented an approach that used jQuery listeners to listen for DOM updates. Antigravity rewrote it to an approach that avoided the jQuery dependency entirely, using native MutationObservers. The code is sensible. It's noticeably more performant than the approach I crafted by hand. Antigravity allowed me to easily add a number of new features to my extension that I would have found tricky to add by hand. The UI looks quite a bit nicer than before I used AI tools to update it. Would these enhancements have been hard for an expert in Chrome extensions to implement? Probably not. But I'm not that expert, and AI coding tools allowed me to do them.
That was not actually the main thrust of my post, it's just a nice side benefit I've experienced. In the main domain where I use coding tools, at work, I work in languages where I'm quite a bit more proficient (Golang/Python). There, the quality of code that the AI tools generate is not better than I write by hand. The initial revisions are generally worse. But they're quite a bit faster than I write by hand, and if I iterate with the coding tools I can get to implementations that are as good as I would write by hand, and a lot faster.
I understand the bias towards skepticism. I have no particular dog in this fight, it doesn't bother me if you don't use these tools. But OP asked for peoples' experiences so I thought I'd share.
JavaScript isn't the only programming language around. I'm not the strongest around with JS either but I can figure it out as necessary -- knowing C/C++/Java/whatever means you can still grok "this looks better than that" for most cases.
Yep. I have plenty of experience in languages that use C-style syntax, enough to easily understand code written in other languages that occur nearby in the syntactical family tree. I'm not steeped in JS enough to know the weird gotchas of the type system, or know the standard library well, etc. But I can read the code fine.
If I'd asked an AI coding tool to write something up for me in Haskell, I would have no idea if it had done a good job.
This doesn't sound right to me. If someone who were expert in JS looked at a relatively simple C++ program, I think they could reasonably well tell if the quality of code were good or not. They wouldn't be able to, e.g., detect bugs from default value initialization, memory leaks, etc. But so long as the code didn't do any crazy templating stuff they'd be able to analyze it at a rough "this algorithm seems sensible" level".
Analogously I'm quite proficient at C++, and I can easily look at a small JS program and tell if it's sensible. But if you give me even a simple React app I wouldn't be able to understand it without a lot of effort (I've had this experience...)
I agree with your broad point: C/C++/Java are certainly much more complex than JS and I would expect someone expert in them to have a much easier time picking up JS than the reverse. But given very high overlap in syntax between the four I think anyone who's proficient in one can grok the basics of the others.
I've never had a job where writing Javascript has been the primary language (so far it's been C++/Java/Golang). The JS Chrome Extension is a fun side project. Using Augment in a work context, I'm primarily using it for Golang and Python code, languages where I'm pretty proficient but AI tools give me a decent efficiency boost.
I understand the emotional satisfaction of letting loose an easy snarky comment, of course, but you missed the mark I'm afraid.
> If you are any good with those four languages, you are leagues ahead of anyone who does Javascript full time.
That is a priggish statement, and comes across as ignorant.
I’ve been paid to program in many different languages over the years. Typescript is what I choose for most tasks these days. I haven’t noticed any real difference between my past C#, C++, C, Java, Ruby, etc programming peers and my current JavaScript ones.
A cursory glance at the definition of "prig" shows that what I wrote there is categorically not. You should at least try to look up that word and if you look it up and still don't get it then what you have is a reading comprehension issue.
> Typescript is what I choose for most tasks these days.
So you're smart on this, at least. Cantrill said it really well, Typescript brought "fresh water" to Javascript.
> haven’t noticed any real difference between my past C#, C++, C, Java, Ruby, etc programming peers and my current JavaScript ones.
You might still be on their level. I see that you didn't mention Rust or at least GoLang. Given the totality of your responses, you're certainly not writing any safe C (not ever).
A datapoint for you to consider. I recently started taking Naltrexone, which is an opioid antagonist used to treat alcohol abuse. It reduces your body's endorphin response, making alcohol less pleasant.
I wouldn't call myself alcoholic, but before Naltrexone I would have evenings where I would go out for drinks with friends and have trouble sticking to limits I set myself (I would set myself a limit of three drinks and end up having four or five).
Taking Naltrexone, I have no problem at all. It's trivially easy to regulate my drinking habits, it requires no effort whatsoever.
The experience has very much made me open to the idea that some people are biologically predisposed to alcoholism (even if, like you, it's not always inherited). Very easy to imagine that people with a heightened endorphin response might have more problems with alcoholism.
Interestingly I had an almost identical experience with smoking and Wellbutrin (different mechanism of action). I was smoking one cigarette a day and using willpower to keep myself from smoking more. Immediately after starting Wellbutrin: immediately lost all interest in smoking, haven't had one since.
This matches my experience exactly. #3 is the one I've found most surprising, and it can work outside the context of just analyzing your own code. For example I found a case where an automated system we use started failing due to syntax changes, despite no code changes on our part. I gave Claude the error message and the context that we had made no code changes, and it immediately and correctly identified the root cause as a version bump from an unpinned dependency (whoops) that introduced breaking syntax changes. The version bump had happened four hours prior.
Could I have found this bug as quickly as Claude? Sure, in retrospect the cause seems quite obvious. But I could just as easily rabbit holed myself looking somewhere else, or taken a while to figure out exactly which dependency caused the issue.
It's definitely the case that you cannot blindly accept the LLM's output, you have to treat it as a partner and often guide it towards better solutions. But it absolutely can improve your productivity.
This is a niche case, but I spent months trying to upgrade one of our services from one LTS version to the next (I forget which). We encountered a weird bug where services running on the latest JRE would mysteriously corrupt fields when deserializing thrift messages, but only after running for a little while.
After an enormously unpleasant debugging cycle, we realized that the JIT compiler was incorrectly eliminating a call to System::arrayCopy, which meant that some fields were left uninitialized. But only when JIT compiled, non-optimized code ran fine.
This left us with three possible upgrade paths:
* Upgrade thrift to a newer version and hope that JIT compilation works well on it. But this is a nightmare since A) thrift is no longer supported, and B) new versions of thrift are not backwards compatible so you have to bump a lot of dependent libraries and update code for a bunch of API changes (in a LARGE number of services in our monorepo...). With no guarantee that the new version would fix the problem.
* File a bug report and wait for a minor version fix to address the issue.
* Skip this LTS release and hope the JIT bug is fixed in the next one.
* Disable JIT compilation for the offending functions and hope the performance hit is negligible.
I ultimately left the company before the fix was made, but I think we were leaning towards the last option (hopefully filing a bug report, too...).
There's no way this is the normal reason companies don't bump JRE versions as soon as they come out, but it's happened at least once. :-)
In general there's probably some decent (if misguided) bias towards "things are working fine on the current version, why risk some unexpected issues if we upgrade?"
I encountered a weird bug with deserializing JSON in a JRuby app during an OpenJDK upgrade - it would sporadically throw a parse error for no apparent reason. I was upgrading to OpenJDK 15, but another user experienced the same regression with an LTS upgrade from 8 to 11.
The end result of my own investigation led to this quite satisfying thread on hotspot-compiler-dev, in which an engineer starts with my minimal reproduction of the problem and posts a workaround within 24 hours: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2021...
There's also a tip there: try a fastdebug build and see if you can convert it into an assertion failure you can look up.
My basic mental model is that my body is constantly consuming chemicals that it doesn't use, be it particulates in the air or substances in the foods I eat or drink. My body is very good at incorporating the substances that it finds useful and ignoring (or killing) the ones that it doesn't, so I don't find myself worrying too much about any particular substance it might take up.
To your point, I don't have any particular reason to think that trace quantities of plastics going through my digestive system are going to cause any harm to me. Obviously if I have specific reason for concern I'll be careful (I'm not, like, going to eat off of asbestos plates...) but otherwise I'm not going to think about it much.
Exactly. Few things can cross the intestine wall, for example we don't have a problem with the absolutely insane amount of bacteria that process food inside of us and produce, well, shit. If you were to make a small cut in the intestine somewhere, well, then chances of getting septic (i.e. the bacteria overpowering your immune system) are high.
So I'm not worried about things larger than a bacteria, unless proven otherwise.
The other issue is with quantity. We all have lead or radioactive isotopes embedded in the body. Or, we can breath in small quantities of carbon monoxide, form example from a passing by car. That doesn't immediately cause problems, the quantity matters.
> My body is very good at incorporating the substances that it finds useful and ignoring (or killing) the ones that it doesn't
What are you basing this off of? From my perspective, the body is decidedly not good at ignoring substances. There are thousands of poisonous or cancerous chemicals. Things like mercury or lead build up in the body to the point of causing severe harm. Anything that can be made small enough will enter your body, and the body was only “designed” to tolerate a narrow band of substances that are encountered in the natural world.
I just ate a dinner off of a plate that's been sitting on my shelf for a few days, using utensils that have been sitting in my drawer for a few days, accumulating whatever residual particles have been floating through the air settling on them.
Like most people I lick my fingers after eating chips and don't sterilize my hands in advance, hands covered with whatever I've recently been touching (including the plastic bag that the chips came in...).
I have an air purifier at home, but there's all sorts of detritus kicked up in the air from materials around the house that my body is constantly inhaling (not to mention the exhaust and tire particulates that I get from cars driving by me or whatever is being produced in the construction site down the street).
I absolutely agree that there are plenty of things out there that are bad for you, at least in sufficient concentration. But every breath you take and every bite you eat is consuming at least some trace amounts of particles that your body doesn't need, from all sorts of sources.
It may well be that nanoparticles of plastics are particularly bad for you (in the concentrations that are normally consumed) but my heuristic is to not worry too much about things like this unless given concrete reason to.
Yeah, I'm always wondering why are people so concerned about what they eat while they are breathing in particulate poisons that have direct access to their blood.
Most things my body consumes (even the ones it doesn't use) don't fall in to that category. And most of what you're describing our bodies are very good at ignoring. I'm bathed in radiation constantly and I'm fine (unless I stay out in the sun too long without sunscreen...). I wouldn't want to live in a house with asbestos or lead paint, but I wouldn't be too bothered visiting a friend who lived in one. And so on.
It may absolutely be that nanoparticles of plastic are so harmful that eating off of plastic plates is a really bad idea, but my heuristic would be to not worry too much about it.
I like Alfred but I'm not a huge power user. It does have a bunch of other functionality. Out of the box it lets you do a bunch of stuff like filesystem manipulation, clipboard history management, a simple calculator, and stuff like that.
It's also extensible. For example there's a very useful add-on workflow [0] that makes it really quick to open up the next Zoom meeting you have scheduled in your calendar.
I've implemented several medium-scale projects that I anticipate would have taken 1-2 weeks manually, and took a day or so using agentic tools.
A few very concrete advantages I've found:
* I can spin up several agents in parallel and cycle between them. Reviewing the output of one while the others crank away.
* It's greatly improved my ability in languages I'm not expert in. For example, I wrote a Chrome extension which I've maintained for a decade or so. I'm quite weak in Javascript. I pointed Antigravity at it and gave it a very open-ended prompt (basically, "improve this extension") and in about five minutes in vastly improved the quality of the extension (better UI, performance, removed dependencies). The improvements may have been easy for someone expert in JS, but I'm not.
Here's the approach I follow that works pretty well:
1. Tell the agent your spec, as clearly as possible. Tell the agent to analyze the code and make a plan based on your spec. Tell the agent to not make any changes without consulting you.
2. Iterate on the plan with the agent until you think it's a good idea.
3. Have the agent implement your plan step by step. Tell the agent to pause and get your input between each step.
4. Between each step, look at what the agent did and tell it to make any corrections or modifications to the plan you notice. (I find that it helps to remind them what the overall plan is because sometimes they forget...).
5. Once the code is completed (or even between each step), I like to run a code-cleanup subagent that maintains the logic but improves style (factors out magic constants, helper functions, etc.)
This works quite well for me. Since these are text-based interfaces, I find that clarity of prose makes a big difference. Being very careful and explicit about the spec you provide to the agent is crucial.
reply