More

jessep · 2025-11-05T15:11:13 1762355473

Really beautiful. I bet Ramanujan just “saw” and felt these.

jessep · 2025-10-30T16:29:19 1761841759

That doesn't make sense. The free part is the marketing, the more people like it, the faster it spreads. I run a freemium business and all the motivation internally is to increase growth by improving the free product. Once you achieve a good conversion to pro, any more will slow down growth. At that point, all you care about is improving the product for free users to generate word of mouth, and building features that will do so.

jessep · 2025-06-03T01:04:37 1748912677

I absolutely love this.

pizzooid · 2025-06-05T20:37:34 1749155854

jessep · 2025-04-18T02:17:25 1744942645

I have had a few epic refactoring failures with Gemini relative to Claude.

For example: I asked both to change a bunch of code into functions to pass into a `pipe` type function, and Gemini truly seemed to have no idea what it was supposed to do, and Claude just did it.

Maybe there was some user error or something, but after that I haven’t really used Gemini.

I’m curious if people are using Gemini and loving it are using it mostly for one-shotting, or if they’re working with it more closely like a pair programmer? I could buy that it could maybe be good at one but bad at the other?

Asraelite · 2025-04-18T06:19:43 1744957183

This has been my experience too. Gemini might be better for vibe coding or architecture or whatever, but Claude consistently feels better for serious coding. That is, when I know exactly how I want something implemented in a large existing codebase, and I go through the full cycle of implementation, refinement, bug fixing, and testing, guiding the AI along the way.

It also seems to be better at incorporating knowledge from documentation and existing examples when provided.

int_19h · 2025-04-18T07:19:59 1744960799

My experience has been exactly the opposite - Sonnet did fine on trivial tasks, but couldn't e.g. fix a bug end-to-end (from bug description in the tracker to implementing the fix and adding tests) properly because it couldn't understand how the relevant code worked, whereas Gemini would consistently figure out the root cause and write decent fix & tests.

Perhaps this is down to specific tools and their prompts? In my case, this was Cursor used in agent mode.

Or perhaps it's about the languages involved - my experiments were with TypeScript and C++.

Asraelite · 2025-04-18T07:26:01 1744961161

> Gemini would consistently figure out the root cause and write decent fix & tests.

I feel like you might be using it differently to me. I generally don't ask AI to find the cause of a bug, because it's quite bad at that. I use it to identify relevant parts of the code that could be involved in the bug, and then I come up with my own hypotheses for the cause. Then I use AI to help write tests to validate these hypotheses. I mostly use Rust.

int_19h · 2025-04-18T07:40:39 1744962039

I used to use them mostly in "smart code completion" mode myself until very recently. But with all the AI IDEs adding agentic mode, I was curious to see how well that fares if I let it drive.

And we aren't talking about trivial bugs here. For TypeScript, the most impressive bug it handled to date was an async race condition due to missing await causing a property to be overwritten with invalid value. For that one I actually had to do some manual debugging and tell it what I observed, but given that info, it was able to locate the problem in the code all by itself and fix it correctly and come up with a way to test it as well.

For C++, the codebase in question was gdb, the bug was a test issue, and it correctly found problematic code based solely on the test log (but I had to prod it a bit in the right direction for the fix).

I should note that this is Gemini Pro 2.5 specifically. When I tried Google's models previously (for all kinds of tasks), I was very unimpressed - it was noticeably worse than other SOTA models, so I was very skeptical going into this. Indeed, I started with Sonnet precisely because my past experience indicated that it was the best option, and I only tried Gemini after Sonnet fumbled.

Asraelite · 2025-04-18T08:10:00 1744963800

I use it for basically everything I can, not just code completion, including end-to-end bug fixes when it makes sense. But most of the time even the current Gemini and Claude models fail with the hard things.

It might be because most bugs that you would encounter in other languages don't occur in the first place in Rust because of the stronger type system. The race condition one you mentioned wouldn't be possible for example. If something like that would occur, it's a compiler error and the AI fixes it while still in the initial implementation stage by looking at the linter errors. I also put a lot of effort into trying to use coding patterns that do as much validation as possible within the type system. So in the end all that's left are the more difficult bugs where a human is needed to assist (for now at least, I'm confident that the models are only going to get better).

int_19h · 2025-04-18T12:35:33 1744979733

Race conditions can span across processes (think async process communication).

That said I do wonder if the problems you're seeing are simply because there isn't that much Rust in the training set for the models - because, well, there's relatively little of it overall when you compare it to something like C++ or JS.

elcritch · 2025-04-19T11:54:13 1745063653

I've found that I need to point it to the right bit of logs or test output and narrow its attention by selectively adding to it's context. Claude 3.7 at least works well this way. If you don't it'll fumble around. Gemini hasn't worked as well for me though.

I partly wonder if different peoples prompt styles will lead to better results with different models.

jessep · on March 27, 2024

Two things interest me about Claude being better than GPT-4:

1) We are all breathless that it is better. But a year has passed since GPT4. It’s like we’re excited that someone beat Usain Bolt’s 100 meter time from when he was 7. Impressive, but … he’s twenty now, has been training like a maniac, and we’ll see what happens when he runs his next race.

2) It’s shown AI chat products have no switching costs right now. I now use mostly Claude and pay them money. Each chat is a universe that starts from scratch, so … very easy to switch. Curious if deeper integrations with my data, or real chat memory, will change that.

tempusalaria · on March 27, 2024

The current version of GPT-4 is 3 months old not 1 year old. Anthropic are legitimately ahead on performance for cost right now. But their API latency I don’t think matches OpenAI

We’ll see what GPT4.5 looks like in the next 6 months.

lynx23 · on March 27, 2024

Did you mean two or four months? Because 3 months is somewhere in december, and there were no updates around that time.

HarHarVeryFunny · on March 27, 2024

I don't think it's just that Claude-3 seems on par with GPT-4, but rather the development timescales involved.

Anthropic as a company was only created, with some of the core LLM team members from OpenAI, around the same time GPT-3 came out (Anthropic CEO Dario Amodei's name is even on the GPT-3 "few-shot learners" paper). So, roughly speaking, in same time it took OpenAI (big established company, with lots of development momentum) to go from GPT-3 to GPT-4, Anthropic have gone from start-up with nothing to Claude-3 (via 1 & 2) which BEATS GPT-4. Clearly the pace of development at Anthropic is faster than that at OpenAI, and there is no OpenAI magic moat in play here.

Sure GPT-4 is a year old at this point, and OpenAI's next release (GPT-4.5 or 5) is going to be better than GPT-4 class models, but given Anthropic's momentum, the more interesting question is how long it will take Anthropic to match it or take the lead?

Inference cost is also an interesting issue... OpenAI have bet the farm on Microsoft, and Anthropic have gone with Amazon (AWS), who have built their own ML chips. I'd guess Athropic's inference cost is cheaper, maybe a lot cheaper. Can OpenAI compete with the cost of Claude-3 Haiku, which is getting rave reviews? It's input tokens are crazy cheap - $300 to input every word you'll ever speak in your entire life!

theturtletalks · on March 27, 2024

Claude may be beat GPT-4 right now, but I remember ChatGPT in March 2023 being leagues better. Over the past year, it’s gotten regressive, but faster.

Claude is also lacking web browsing and code interpreter. I’m sure those will come, but where will GPT be by then? ChatGPT also offers an extensive free tier with voice. Claude’s free plan caps you as a few messages every few hours.

HarHarVeryFunny · on March 27, 2024

Of course GPT-next should take the lead for a while, but with Anthropic, from a standing start, putting out 3 releases in same time it took OpenAI to put out 1, then how long is this lead going to last ?

It'll be interesting to see if Anthropic choose to match OpenAI feature-for-feature or just follow their own path.

jessep · on March 27, 2024

Yeah, it's a good point, but I think that our intuitions are different on this one. I don't have a horse in this race, but my assumption is that the next OpenAI release will be a massive leap, that makes GPT 4/Claude 3 Opus look like toys. Perhaps you're right though, and Anthropic's curves with bend upward even more quickly, so that they get to that they'll start catching up more quickly, until eventually be they're ahead.

HarHarVeryFunny · on March 27, 2024

Honestly who knows, but outside of Q-star rumors there's no indication that either company is doing anything much different from the other one, so I'd not expect any long-lasting difference in capability to open up. Maybe it will, though!

FWIW, Sam Altman has fairly recently said that the jump from GPT-4 to GPT-5 will be similar to that from GPT-3 to GPT-4, and also (recent Lex Fridman interview) that their goal is explicitly NOT to have releases that are shocking - but rather they want to have ones of incremental capability to give society time to adapt. Could be misdirection - who knows.

Amodei for his part has said that what Anthropic will release in 2024 will be a "sharper, more refined" (or words to that effect) version of what they have now, and not a "reality bender" (which he seemed to be implying maybe is coming, but not for a year or two).

londons_explore · on March 27, 2024

They're comparing against gpt-4-0125-preview, which was released at the end of January 2024. So they really are beating the market leader for this test.

humansareok1 · on March 27, 2024

Model Updates != New Models.

GPT5 will be substantially better than even the latest GPT4 update.

nicce · on March 27, 2024

What matters here is that what I can use today. I can either use Claude 3 or GPT 4. If the Claude is better, it is best on the market. Let’s see what the story is tomorrow.

humansareok1 · on March 27, 2024

Go ahead, no one is saying to stay with GPT4. But its disingenuous to compare a gpt-4-march-update to a completely new pretrained model like Claude 3 Opus.

nicce · on March 27, 2024

It is not that disingenuous. We can only make claims based on the current data.

There can be even bigger competitors in the market, but because they stay quiet and do not publish results, we do not know about their capabilities. Who knows what Apple has been doing all this time? They sure have capabilities. Even if they make some random comments about the use of Gemini.

Until the data and proof has been provided, it is accurate to claim "the best model on the market". Everything else is hypothetical.

humansareok1 · on March 28, 2024

So you think whatever process produces a GPT4 update is completely equivalent to pretraining and RLHF'ing a brand new model with new architecture, more data, etc??

worldsayshi · on March 27, 2024

ChatGPT does have at least a year head start so this doesn't seem surprising. This proves that OpenAI doesn't really have any secret sauce that others can't reproduce.

I suppose size will become the moat eventually but atm it looks like it could become anyone's game.

CuriouslyC · on March 27, 2024

Size is absolutely not going to become the moat unless there's some hardware revolution that makes running big models very very cheap, but that requires a very large up-front capital cost to deploy. Big models are inefficient, and as smaller models improve there will be very few use cases where the big models are worth the compute.

falcor84 · on March 27, 2024

I imagine that going forward, the typical approach would be a multi-level LLM, such that there's a relatively small and quick model in front of the user, which can in turn decide to consult an "expert" larger model as part of its "system 2".

CuriouslyC · on March 27, 2024

Absolutely, that is 100% the way things are going to go. What's going to happen is that eventually there will be an online model directory that a local agent knows how to query to identify other models to call in order to build up an answer. Local agents will be empowered with online learning since it won't be possible to pre-train on the model catalog.

heyjamesknight · on March 27, 2024

And then at the top of that stack, we’ll have a single, master model controlling everything.

We could call it the Master Control Program.

worldsayshi · on March 27, 2024

> as smaller models improve there will be very few use cases where the big models are worth the compute

I see very little evidence of this so far. The use cases I'm interested in just barely works on GPT-4 and lesser models give mostly garbage. I.e. function calling and inferring stuff like SQL queries. If there are smaller models that can do passable work on such use cases I'd be very interested to know.

CuriouslyC · on March 28, 2024

Claude Haiku can do a LOT of the things you'd think you need GPT4 for. It's not as good at complex code and really tricky language use/abstractions, but it's very close for more superficial things, and you can call haiku like 60 times for each gpt4 call.

I bet you could do multiple prompt variations with haiku and then do answer combining to compete with GPT4-T/Opus at a fraction of the price.

worldsayshi · on April 2, 2024

Interesting! I just discovered that Anthropic indeed officially support commercial API access in (at least) some EU countries. They just don't support GUI access in all those countries:

https://www.anthropic.com/supported-countries

mikkom · on March 27, 2024

Anthropic is ex-openai so even if there is a secret sauce that openai uses, they might know it.

worldsayshi · on March 27, 2024

Yeah you might be right but Google and Mistral aren't that far behind either:

https://twitter.com/NickADobos/status/1772764680639148285

Alifatisk · on March 27, 2024

> We are all breathless that it is better. But a year has passed since GPT4. It’s like we’re excited that someone beat Usain Bolt’s 100 meter time from when he was 7.

Sounds like some sort of siding with closedAI (openAi), when I need to use an llm, I use whatever performs the best at the moment. It doesn’t matter who’s behind it to me, at the moment it is Claude.

I am not going to stick to ChatGPT because closedAi have been pioneers or because their product was one of the best.

I hope I didn’t sound too harsh, excuse me in that case.

danielbln · on March 27, 2024

> closedAI (openAi)

Is this supposed to be clever? It's like saying M$ back in the 90s. Yeah, OpenAI doesn't deserve its own name, but maybe we can let that dead horse rest.

Alifatisk · on March 27, 2024

No, it's not supposed to be clever. Just something I use whenever I mention them.

There is an extensions that does something similar, https://addons.mozilla.org/en-US/firefox/addon/openai-is-not...

artdigital · on March 27, 2024

Claude has way too many safeguards for what it believes is correct to talk about and what isn’t. Not saying ChatGPT is better, it also got dumbed down a lot, but Claude is very heavy on being politically correct on everything.

Ironically the one I find the best for responses currently is Gemini Advanced.

I agree with you that there is no switching cost currently, I bounce between them a lot

bredren · on March 27, 2024

Does this matter in pure software dev?

llm_trw · on March 27, 2024

Yes, remember when github decided to rename master branches to main branches?

>I'm afraid I can't answer a question about slavery.

I'm getting refusals similar in idiocy to the above in production right now.

_huayra_ · on March 27, 2024

Perhaps if one is a minor: https://news.ycombinator.com/item?id=39632959

j45 · on March 27, 2024

Hard things aren’t easy to do.

Openai is not only faster at updating, the updates deliver. Then things like sora out of nowhere.

It’s great to see other models keeping up or getting ahead because a year ago the gap was bigger

lenerdenator · on March 27, 2024

What's a good way to have access to many chatbots in one place?

longnguyen · on March 27, 2024

If you’re on macOS, give BoltAI[0] a try. Other than supporting multiple AI services and models, BoltAI also allows you yo create your own AI tools. Highlight the text, press a shortcut key then run a prompt against that text.

Disclaimer: I build it :D

[0]: https://boltai.com

gorbypark · on March 27, 2024

I use an app called MindMac for macOS that works with nearly "all" of the APIs. I currently am using OpenAI, Anthropic and Mistral API keys with it, but it seems to support a ton of others as well.

budududuroiu · on March 27, 2024

Not affiliated in any way, but I use openrouter.ai to pay per token rather than have monthly subscriptions

mupuff1234 · on March 27, 2024

MSFT trying to hedge their bets makes it seems like there's a decent chance OpenAI might have hit a few roadblocks (either technical or organizational)

JKCalhoun · on March 27, 2024

Billion-dollar corporation hedging its bets is standard practice and I personally wouldn't read anything into it.

foobar_______ · on March 27, 2024

I agree with your analogy. Also, there is a quite a bit of "standing on the shoulders of giants" kind of thing going on. Every company's latest release will/should be a bit better than the models released before it. AI enthusiasts are getting a bit annoying - "we got a new leader boys!!!!*!" for each new model released.

jessep · on July 18, 2023

Ummmm ... what is grub? Didn't see any links to it from the three random grub2 repos I looked at.

i80and · on July 18, 2023

GRUB is a common bootloader for Linux systems: it gives you a menu of boot options when you turn on your machine, and boots whichever installed operating system you choose.

So with this theme, that menu for choosing which OS to boot looks like the Minecraft menu!

jessep · on July 18, 2023

Thanks!

jessep · on March 15, 2023

"The Art of Action" It has changed how I approach everything, not just how I run my business. Is applying approaches used in militaries to organize action around the leader's intent, taking action in the right general direction, and delegating the "what" (the intent) but not the "how" of the way it is actually achieved.

jessep · on Aug 5, 2022

My best friend's dad was high up in Woodstock 99 corporate, which means he was responsible for making much of it run (yeah, I guess there were some issues ). We had backstage passes, a house, and a ... weird experience. Did some stupid things. Some highlights:

- We were on stage in the famous closing Red Hot Chili Peppers show (like in the wings, looking out on the audience from behind the band). That was surreal. We could see the fires start in the distance, stuff getting ripped down, things starting to sort of go to hell. All while, the most amazing band of the moment is playing the most amazing songs 30 feet away from us. Quite a contrast between awesomeness and ... whatever the strange mix of emotions riots give.

- We learned that there was no Mountain Dew in the whole event because Coke got the contract, so we went out and bought many cases, drove it in through our special entrance, and basically auctioned it off. People were willing to pay $5 a can. We made a lot of money. Pretty sketchy.

- Worse/weirder, we had a backpacks full of ice to hold the Mountain Dew in while we walked around the crowd selling it, and people started to offer to buy ice from us to cool off. So once we ran out of Mountain Dew, we started yelling "Ice is nice! We got ice!" and selling it. That ... well, I feel like that was me experiencing a real microcosm of capitalism and the allure of artificial scarcity ... and not acting the way I would hope . Give away the damn ice man.

- We kind of just walked around backstage and there were so many super famous people that none of them felt very special, and they'd just kind of talk to you while you were in line for food. Ice Cube had some cool sneakers that my friend chatted with him about. George Clinton was chill. I think we talked with Erykah Badu for a while at some point. (Stars are not at all like this backstage at a normal concert, btw, which we also did a lot because of his father. A their normal concerts, these same performers are the center of the universe, and don't have time to chat with a bunch of high schoolers running around.)

Anyway, it was a very strange event, and we had a very strange vantage point.

exolymph · on Aug 5, 2022

The scarcity wasn't artificial. People genuinely didn't have ice and you were compensated for making it available. I don't see a problem with this.

jessep · on Aug 5, 2022

True! I meant "artificial scarcity" in the economic sense of monopolies reduce supply of an abundant material to drive up prices.

jessep · on April 15, 2022

This kind of attitude is how bad products get made.

blindmute · on April 15, 2022

Honestly, no. Your type of attitude is how bad products get made: products that cater to the absolute lowest tech illiterate user. Products that hide and simplify every useful tool to the point of annoyance and disfunction. Github's warnings are plenty enough. It doesn't matter what Github did; this post would still be made, and you people would still be thinking of even more absurd ways to make the user not do something. At a certain point, a tool has to do the function you asked it to do.

jessep · on March 31, 2022

It was really fun, but maybe throw a few combos closer in weight in? I played for maybe 5 minutes and didn't get any wrong. I would have played longer if I got some wrong.

last_one_in · on March 31, 2022

You're much better at it than me then :-)