More

Taek · 2025-12-21T09:59:35 1766311175

We seem to be moving the goalposts on AGI, are we not? 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess. If you wanted that, you had to do a new training run with new training data.

But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.

How does that not fit the definition of "General Intelligence"? It's literally as capable as a high school student for almost any general task you throw it at.

oidar · 2025-12-21T14:37:57 1766327877

I think the games tasks are worth exploring more. If you look at that recent Pokemon post - it's not as capable as a high school student - it took a long, long time. I have a private set of tests, that any 8 year old could easily solve that any LLM just absolutely fails on. I suspect that plenty of the people claiming AGI isn't here yet have similar personal tests.

krackers · 2025-12-21T23:02:54 1766358174

Arc-Agi 3 is coming soon, I'm very excited for that because it's a true test of multimodality, spatial reasoning, and goal planning. I think there was some preliminary post somewhere that did show that current models basically try to brute-force their way through and don't actually "learn the rules of the game" as efficiently as humans do.

oidar · 2025-12-21T23:16:41 1766359001

How do you think they are training for the spatial part of the tests? It doesn’t seem to lend itself well to token based “reasoning”. I wonder if they are just synthetically creating training data and hope a new emergent spatial reason ability appears.

krackers · 2025-12-22T00:08:57 1766362137

>think they are training for the spatial part of the tests

I'm not sure the party that "they" is referring to here, since arc-agi-3 dataset isn't released yet and labs probably have not begun targeting it. For arc-agi-2, possibly just synthetic data might have been enough to saturate the benchmark, since most frontier models do well on it yet we haven't seen any corresponding jump in multimodal skill use, with maybe the exception of "nano banana".

>lend itself well to token based “reasoning”

One could perhaps do reasoning/COT with vision tokens instead of just text tokens. Or reasoning in latent space which I guess might be even better. There have been papers on both, but I don't know if it's an approach that scales. Regardless gemini 3 / nano banana have had big gains on visual and spatial reasoning, so they must have done something to get multimodality with cross-domain transfer in a way that 4o/gpt-image wasn't able to.

For arc-agi-3, the missing pieces seem to be both "temporal reasoning" and efficient in-context learning. If they can train for this, it'd have benefits for things like tool-calling as well, which is why it's an exciting benchmark.

lxgr · 2025-12-21T10:03:15 1766311395

I think we're noticing that our goalposts for AGI were largely "we'll recognize it when we see it", and now as we are getting to some interesting places, it turns out that different people actually understood very different things by that.

zahlman · 2025-12-22T01:19:01 1766366341

> 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess.

No; that was one, extremely limited example of a broader idea. If I point out that your machine is not a general calculator because it gives the wrong answer for six times nine, and then you fix the result it gives in that case, you have not refuted me. If I now find that the answer is incorrect in some other case, I am not "moving goalposts" by pointing it out.

(But also, what lxgr said.)

> But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.

The AI systems that do most of these things are not "LLMs".

> It's literally as capable as a high school student for almost any general task you throw it at.

And yet embarrassing deficiencies are found all the time ("how many r's in strawberry", getting duped by straightforward problems dressed up to resemble classic riddles but without the actual gotcha, etc.).

Taek · 2025-12-22T03:38:23 1766374703

> The AI systems that do most of these things are not "LLMs".

Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself. I have ChatGPT and Gemini help me find flight routes, tutor me in Spanish (Gemini 3 is really good at this), write poetry and code, solve professional math problems (usually related to finance and trading), help me fix technical issues with my phone and laptop, etc etc.

If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.

zahlman · 2025-12-22T17:47:54 1766425674

> Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself.

LLMs are at most one component of the systems you refer to. Reasoning models and agents are something larger.

> If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.

Where AI has been thrust at me (search engines and YouTube video and chat summaries) it has been for the sort of thing where I'd expect it to excel, yet I've been underwhelmed. The one time I consciously invoked the "AI assist" on a search query (to do the sort of thing I might otherwise try on Wolfram Alpha) it committed a basic logical error. The project READMEs that Show HN has exposed me to this year have been almost unfailingly abominable. (Curiously, I'm actually okay with AI art a significant amount of the time.)

But none of that experience is even a hundredth as annoying as the constant insinuation from AI proponents that any and all opposition is in some way motivated by ego protection.

Taek · 2025-12-09T14:13:45 1765289625

I used the $200/mo OpenAI subscription for a while, but cancelled when Gemini 3 came out. It was useful for the deep research credits until the Web search gpt got sufficiently good on it's own

Taek · 2025-12-07T16:17:31 1765124251

Oh for sure. Why are movies scattered all over oblivion? Because there's no simple marketplace for licensing movies, it's a closed market that requires doing lots of behind-the-scenes deals. Healthcare? Only specific providers can make medical equipment, tons of red tape, opaque billing structures, insurance locked out in weird ways, etc.

To understand how healthy a market is, ask 'how easily could a brand new startup innovate in this area'. If the answer is 'not easy at all' - then that thing is going to be expensive, rent seeking, and actively distorting incentives to make itself more money.

Taek · 2025-12-06T17:49:53 1765043393

No, ads are not the same thing as free speech at all. "Free speech" is the right to say anything to anyone *who is willing to listen*. You don't have a right to come into my home and tell me your ideas about immigration policy - though you do have a right to talk about immigration policy in other places!

The government has to guarantee that there are places for people to say things. But the government does not have to guarantee that there are places for people to say things *in my own home*. And similarly, I think most public spaces should be free from ads and other 'attention pollution'. If a company wants to write about their own product, that's fine, but they must do so in a place where other people are free to seek them out, as opposed to doing so in a way that forces the writing upon others without consent.

Taek · 2025-12-05T18:43:58 1764960238

I'm not sure how many people would recognize 524,288 as a power of 2, but probably many fewer than the number of people who would recognize 512 as a power of 2

inopinatus · 2025-12-05T19:34:50 1764963290

I recommend having instant recognition of all the powers up to 2^24, this has proven very useful over the years e.g. when skimming quickly through a log searching for anomalies, flag patterns etc. If you recite them in sequence twice a day for a couple of weeks, then they’ll stick in your mind for decades. I can say from experience this method also works for the NATO phonetic alphabet, Hamlet’s soliloquies, ASCII, and mum’s lemon drizzle cake recipe. It fails however for the periodic table, ruined forever by Tom Lehrer.

kachapopopow · 2025-12-05T20:05:07 1764965107

can confirm it is very useful, same for common constants in crypto algorithms

quesera · 2025-12-05T18:55:36 1764960936

Ever the quandary: satisfy some people completely, or a larger number but incompletely.

I concur with the suggestion of 2^19, because even though fewer people would recognize it immediately, many of them would question the significance, and their eventual realization would be more satisfying.

ghurtado · 2025-12-05T20:24:42 1764966282

> and it the eventual realization would be more satisfying.

I think you might be overestimating the curiosity of the average person.

I'm regularly baffled / saddened by how many people care so little about learning anything new, no matter how small.

Is it a woe of modern times? Or has it always been this way?

quesera · 2025-12-05T20:39:07 1764967147

> I think you might be overestimating the curiosity of the average person.

Oh absolutely. But I like to optimize for the others. :)

Also, the audience for consideration here is pretty ... rarified. 0.0% of people in the world, to a first approximation, have heard of Zig. Those that have, are probably pretty aware of powers-of-two math, and maybe curious enough to wonder about a value that seems odd but is obviously deliberately chosen.

> Is it a woe of modern times? Or has it always been this way?

I suspect it's always been this way. People are busy, math is hard, and little games with numbers are way less engaging than games with physical athleticism or bright lights.

ghurtado · 2025-12-05T22:47:40 1764974860

> and little games with numbers are way less engaging than games with physical athleticism or bright lights.

In a different place, at a different time, I would have used the same exact wording.

I think we would be very good friends IRL :D

Taek · 2025-12-01T02:27:23 1764556043

I don't really like the name. When you say 'Hacklore' I think of the hackers at MIT and such. That stuff is really cool and shouldn't be stopped or suppressed!

But the message, absolutely on board with it.

Taek · 2025-11-28T12:39:19 1764333559

It does? I have never once seen this in my life.

sofixa · 2025-11-28T12:46:08 1764333968

Might be a mobile app / EU-based account only thing, but I've seen it numerous times and I'm almost certain I've seen it on the web version of Gmail too.

ghc · 2025-11-28T13:11:30 1764335490

American here. Seen it a number of times on both mobile and web.

ghc · 2025-11-28T13:17:11 1764335831

There's a pretty long waiting period before it does this. I'm not sure what the time required is, but I think it's at least 3 months. And it only does this if you don't interact with the newsletter at all.

Just speculation, but it's possible if you also use a non-web/non-gmail-app client it might suppress these notifications.

Taek · 2025-11-24T16:05:19 1764000319

Yes but do the books make more money and get more distribution? Quality is not the critical factor here

MichaelZuo · 2025-11-25T02:18:40 1764037120

Good or bad seems like its about quality?

cholantesh · 2025-11-25T14:39:49 1764081589

Yes, of the publishing method at giving authors ROI. This should be pretty clear from the context.

MichaelZuo · 2025-12-02T01:50:52 1764640252

Giving authors ROI?

Why would anyone, excluding authors, even want low quality books to have any ROI?

Taek · 2025-11-20T16:55:37 1763657737

I would consider that a major clarification

Taek · 2025-11-19T03:28:16 1763522896

No, and also the other page was pure HTML and CSS. This clock is using React and Javascript, so it's not a fair comparison.