This is a fun thought experiment. I believe that we are now at the $5 Uber (2014) phase of LLMs. Where will it go from here?
How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?
Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!
Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?
While I remember $5 Ubers fondly, I think this situation is significantly more complex:
- Models will get cheaper, maybe way cheaper
- Model harnesses will get more complex, maybe way more complex
- Local models may become competitive
- Capital-backed access to more tokens may become absurdly advantaged, or not
The only thing I think you can count on is that more money buys more tokens, so the more money you have, the more power you will have ... as always.
But whether some version of the current subsidy, which levels the playing field, will persist seems really hard to model.
All I can say is, the bad scenarios I can imagine are pretty bad indeed—much worse than that it's now cheaper for me to own a car, while it wasn't 10 years ago.
If the electric grid cannot keep up with the additional demand, inference may not get cheaper. The cost of electricity would go up for LLM providers, and VCs would have to subsidize them more until the price of electricity goes down, which may take longer than they can wait, if they have been expecting LLM's to replace many more workers within the next few years.
This is a super interesting dynamic! The CCP is really good at subsidizing and flooding global markets, but in the end, it takes power to generate tokens.
In my Uber comparison, it was physical hardware on location... taxis, but this is not the case with token delivery.
This is such a complex situation in that regard, however, once the market settles and monopolies are created, eventually the price will be what market can bear. Will that actually create an increase in gross planet product, or will the SOTA token providers just eat up the existing gross planet product, with no increase?
I suppose whoever has the cheapest electricity will win this race to the bottom? But... will that ever increase global product?
___
Upon reflection, the comment above was likely influenced by this truly amazing quote from Satya Nadella's interview on the Dwarkesh podcast. This might be one of the most enlightened things that I have ever heard in regard to modern times:
> Us self-claiming some AGI milestone, that's just nonsensical benchmark hacking to me. The real benchmark is: the world growing at 10%.
With optimizations and new hardware, power is almost a negligible cost that $5/month would be sufficient for all users, contrary to people's belief. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing even if you exclude architecture/model improvements. The thing is currently Nvidia is swallowing up a massive revenue which China could possible solve by investing in R and D.
I can run Minimax-m2.1 on my m4 MacBook Pro at ~26 tokens/second. It’s not opus, but it can definitely do useful work when kept on a tight leash. If models improve at anything like the rate we have seen over the last 2 years I would imagine something as good as opus 4.5 will run on similarly specced new hardware by then.
I appreciate this, however, as a ChatGPT, Claude.ai, Claude Code, and Windsurf user... who has tried nearly every single variation of Claude, GPT, and Gemini in those harnesses, and has tested all the those models via API for LLM integrations into my own apps... I just want SOTA, 99% of the time, for myself, and my users.
I have never seen a use case where a "lower" model was useful, for me, and especially my users.
I am about to get almost the exact MacBook that you have, but I still don't want to inflict non-SOTA models on my code, or my users.
This is not a judgement against you, or the downloadable weights, I just don't know when it would be appropriate to use those models.
BTW, I very much wish that I could run Opus 4.5 locally. The best that I can do for my users is the Azure agreement that they will not train on their data. I also have that setting set on my claude.ai sub, but I trust them far less.
Disclaimer: No model is even close to Opus 4.5 for agentic tasks. In my own apps, I process a lot of text/complex context and I use Azure GPT 4.1 for limited llm tasks... but for my "chat with the data" UX, Opus 4.5 all day long. It has tested so superior.
The last I checked, it is exactly equivalent per token to direct OpenAI model inference.
The one thing I wish for is that Azure Opus 4.5 had json structured output. Last I checked that was in "beta" and only allowed via direct Anthropic API. However, after many thousands of Opus 4.5 Azure API calls with the correct system and user prompts, not even one API call has returned invalid json.
In my comment history can be found a comment much like yours.
Then Opus 4.5 was released. I had already had my CC cluade.md, and Windsurf global rules + workspace rules set up. Also, my main money making project is React/Vite/Refine.dev/antd/Supabase... known patterns.
My point is that given all that, I can now deploy amazing features that "just work," and have excellent ux in a single prompt. I still review all commits, but they are now 95% correct on front end, and ~75% correct on Postgres migrations.
Is it magic? Yes. What's worse is that I believe Dario. In a year or so, many people will just create their own Loom or Monday.com equivalent apps with a one page request. Will it be production ready? No. Will it have all the features that everyone wants? No. But it will do that they want, which is 5% of most SaaS feature sets. That will kill at least 10% of basic SaaS.
If Sonnet 3.5 (~Nov 2024) to Opus 4.5 (Nov 2025) progress is a thing, then we are slightly fucked.
"May you live in interesting times" - turns out to be a curse. I had no idea. I really thought it was a blessing all this time.
Yeah, same here. Also, I can't recall a time since back when I used to make music that I got actually jealous of someone else's abilities, but here I am. :)
I keep seeing posts about how ~"the volume of AI scrapers is making hosting untenable."
There must a ton of new full-web datasets out there, right?
What are the major hurdles that prevent the owners of these datasets from providing them to third parties via API? Is it the quality of SERP, or staleness? Otherwise, this seems like a potentially lucrative pivot/side hustle?
> There must a ton of new full-web datasets out there, right?
Sadly, no. There's CommonCrawl (https://commoncrawl.org/) which still, sadly, far removed from "full-web dataset."
So everyone runs their own search instead, hammering the sites, going into gray areas (you either ignore robots.txt or your results suck), etc. It's a tragedy of the commons that keeps Google entrenched: https://senkorasic.com/articles/ai-scraper-tragedy-commons
> the volume of AI scrapers is making hosting untenable
Aside from that potential, it's also not true.
A Pentium Pro or PIII SSE with circa 1998-99 Apache happily delivers a billion hits a month w/o breaking a sweat unless you think generating pages for every visit is better than generating pages when they change.
I think it is true that it is a real problem (EDIT: but doesn't necessarily make "hosting untenable"), but you are correct to point out that modern pages tend to be horribly optimized (and that's the source of the problem). Even "dynamic" pages using React/Next.js etc. could be pre-rendered and/or cached and/or distributed via CDNs. A simple cache or a CDN should be enough to handle pretty much any scrapping traffic unless you need to do some crazy logic on every page visit – which should almost never be the case on public-facing sites. As an example, my personal site is technically written in React, but it's fully pre-rendered and doesn't even serve JS – it can handle huge amounts of bot/scrapping traffic via its CDN.
OK, I agree with both of you. I am an old who is aware of NGINX and C10k. However, my question is: what are the economic or technical difficulties that prevent one of these new web-scale crawlers from releasing og-pagerank-api.com? We all love to complain about modern Google SERP, but what actually prevents that original Google experience from happening, in 2026? Is it not possible?
Or, is that what orgs like Perplexity are doing, but with an LLM API? Meaning that they have their own indexes, but the original q= SERP API concept is a dead end in the market?
Tone: I am asking genuine questions here, not trying to be snarky.
What prevents it is that the web in 2026 is very different than it was when OG pagerank became popular (because it was good). Back then, many pages linked to many other pages. Now a significant amount of content (newer content, which is often what people want) is either only in video form, or in a walled garden with no links, neither in or out of the walls. Or locked up in an app, not out on the general/indexable/linkable web. (Yes, of course, a lot of the original web is still there. But it's now a minority at best.)
Also, of course, the amount of spam-for-SEO (pre-slop slop?) as a proportion of what's out there has also grown over time.
IOW: Google has "gotten worse" because the web has gotten worse. Garbage in, garbage out.
Thanks for the reply. I mentioned tech, but forgot about time. Yeah, that makes solid sense.
> Or locked up in an app...
I believe you may have at least partially meant Discord, for which I personally have significant hate. Not really for the owners/devs, but why in the heck would any product owner want to hide the knowledge of how to user their app on a closed platform? No search engine can find it, no LLM can learn from it(?). Lost knowledge. I hate it so much. Yes, user engagement, but knowledge vs. engagement is the battle of our era, and knowledge keeps losing.
r/anything is so much better than a Discord server, especially in the age of "Software 3.0"
This might be the most blunt and significant speech of our time. I am stunned by the intelligent honesty. Also, kudos for the difficult follow-up questions.
Related is the PM of Canada’s speech at Davos today. I don’t think that I have heard such a blunt assessment of the past and future from a politician, ever.
I have been thinking about this speech and your comment for a couple days now.
He should be pissed. Not as a Canadian, but as someone who understands the benefits of a global system of rules that has made everyone who cooperates rich, which has included himself.
What we have witnessed under 47 is that a small group of the world's rich ideologues were so maniacal and myopic that they screwed it up for everyone.
As a relative "poor", this pisses me off as well, as peace and prosperity will likely, at least temporarily, devolve for us all as Pax/Oeconomia Americana unwinds. Utter insanity.
Maybe recheck how exposed your country is. It's really bad for Canada, since ~77% of all Canadian goods exports are headed to the US. It will take a long time for them to diversify. Same with Mexico and Taiwan, and to a lesser extent Vietnam and Ireland.
Honestly as an American I'm just happy to see some open talk and movement by world leaders. This breakdown has been clear for at least 10 years, maybe 20 if you're smart, but nobody did anything. I hope Europe can learn to work with China here. I don't think the whole system is dead, they just need to find other dance partners.
I've been think about these broad critiques of Capitalism, and while I sometimes find myself nodding in at least partial agreement, I worry that it's far too blunt a critique.
If you look at Soviet or Chinese Communism, they also stifled innovation, and they also destroyed entire ecosystems. They also had extreme concentrations of power, which allowed psychopathic leaders to commit atrocities.
If we want to come up with real long-term solutions, maybe we need to be honest about underlying human traits, and address those via systematic controls. Otherwise, it feels like we are going to keep bouncing from extreme to extreme. That tendency towards extremes seems like another easily exploited human trait that needs to be identified and addressed.
I guess my point here is that maybe it's not entirely specific systems at fault here, as much as it is universal human traits and group dynamics.
Disclaimer: I thought we had already found the beginnings of an answer, and it was Social Democracy with a regulated market economy. However, this system appears not to be extreme enough for many people to get excited about it.
How much will a synthetic mid-level dev (Opus 4.5) cost in 2028, after the VC subsidies are gone? I would imagine as much as possible? Dynamic pricing?
Will the SOTA model labs even sell API keys to anyone other than partners/whales? Why even that? They are the personalized app devs and hosts!
Man, this is the golden age of building. Not everyone can do it yet, and every project you can imagine is greatly subsidized. How long will that last?
reply