Building an interactive shell inside their CLI seems like a very odd technical solution. I can’t think of any use case where the same context gathering couldn’t be gleaned by examining the file/system state after the session ended, but maybe I’m missing something.
On the other hand, now that I’ve read this, I can see how having some hooks between the code agent CLIs and ghostty/etc could be extremely powerful.
LLMs in general struggles with numbers, it's easy to tell with the medium sized models that struggle with line replacement commands where it has to count, it usually takes a couple of tries to get right.
I always imagined they'd have an easier time if they could start a vim instance and send search/movement/insert commands instead, not having to keep track of numbers and do calculations, but instead visually inspect the right thing happening.
I haven't tried this new feature yet, but that was the first thing that came to mind when seeing it, it might be easier for LLMs to do edits this way.
Personally haven't had that happen to me, been using Codex (and lots of other agents) for months now. Anecdote, but still. I wrote up a summary of how I see the current difference between the agents right now: https://news.ycombinator.com/item?id=45680796
Still a toss-up for me which one I use. For deep work Codex (codex-high) is the clear winner, but when you need to knock out something small Claude Code (sonnet) is a workhorse.
Also CC tool usage is so much better! Many, many times I’ve seen Codex writing a python script to edit a file which seems to bypass the diff view so you don’t really know what’s going on.
I would add to the list of the vibe engineer’s tasks:
Knowing when the agent has failed and it’s time to roll back. After four or five turns of Claude confidently telling you the feature is done, but things are drifting further off course, it’s time to reset and try again.
This “article” is clickbait. Controversial title with no substance, asking “why are companies investing heavily in a technology that works for some (limited but valuable) use cases, when they could invest in pure R&D for something that might be better someday”.
Wow, I thought they would feel some pricing pressure from GPT5 API costs, but they are doubling down on their API being more expensive than everyone else.
I think it's the right approach, the cost of running these things as coding assistants is negligable compared to the benefit of even a slight model improvement.
GPT5 API uses more tokens for answers of the same quality as previous versions. Fell into that trap myself. I use both Claude and OpenAI right now. Will probably drop OpenAI since they are obviously not to be trusted considering the way they do changes.
https://worksonmymachine.ai/p/solving-amazons-infinite-shelf...
reply