We seem to be moving the goalposts on AGI, are we not? 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess. If you wanted that, you had to do a new training run with new training data.
But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
How does that not fit the definition of "General Intelligence"? It's literally as capable as a high school student for almost any general task you throw it at.
I think the games tasks are worth exploring more. If you look at that recent Pokemon post - it's not as capable as a high school student - it took a long, long time. I have a private set of tests, that any 8 year old could easily solve that any LLM just absolutely fails on. I suspect that plenty of the people claiming AGI isn't here yet have similar personal tests.
Arc-Agi 3 is coming soon, I'm very excited for that because it's a true test of multimodality, spatial reasoning, and goal planning. I think there was some preliminary post somewhere that did show that current models basically try to brute-force their way through and don't actually "learn the rules of the game" as efficiently as humans do.
How do you think they are training for the spatial part of the tests? It doesn’t seem to lend itself well to token based “reasoning”. I wonder if they are just synthetically creating training data and hope a new emergent spatial reason ability appears.
>think they are training for the spatial part of the tests
I'm not sure the party that "they" is referring to here, since arc-agi-3 dataset isn't released yet and labs probably have not begun targeting it. For arc-agi-2, possibly just synthetic data might have been enough to saturate the benchmark, since most frontier models do well on it yet we haven't seen any corresponding jump in multimodal skill use, with maybe the exception of "nano banana".
>lend itself well to token based “reasoning”
One could perhaps do reasoning/COT with vision tokens instead of just text tokens. Or reasoning in latent space which I guess might be even better. There have been papers on both, but I don't know if it's an approach that scales. Regardless gemini 3 / nano banana have had big gains on visual and spatial reasoning, so they must have done something to get multimodality with cross-domain transfer in a way that 4o/gpt-image wasn't able to.
For arc-agi-3, the missing pieces seem to be both "temporal reasoning" and efficient in-context learning. If they can train for this, it'd have benefits for things like tool-calling as well, which is why it's an exciting benchmark.
I think we're noticing that our goalposts for AGI were largely "we'll recognize it when we see it", and now as we are getting to some interesting places, it turns out that different people actually understood very different things by that.
> 5 years ago, the argument that AGI wasn't here yet was that you couldn't take something like AlphaGo and use it to play chess.
No; that was one, extremely limited example of a broader idea. If I point out that your machine is not a general calculator because it gives the wrong answer for six times nine, and then you fix the result it gives in that case, you have not refuted me. If I now find that the answer is incorrect in some other case, I am not "moving goalposts" by pointing it out.
(But also, what lxgr said.)
> But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
The AI systems that do most of these things are not "LLMs".
> It's literally as capable as a high school student for almost any general task you throw it at.
And yet embarrassing deficiencies are found all the time ("how many r's in strawberry", getting duped by straightforward problems dressed up to resemble classic riddles but without the actual gotcha, etc.).
> The AI systems that do most of these things are not "LLMs".
Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself. I have ChatGPT and Gemini help me find flight routes, tutor me in Spanish (Gemini 3 is really good at this), write poetry and code, solve professional math problems (usually related to finance and trading), help me fix technical issues with my phone and laptop, etc etc.
If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.
> Uh, every single example that I listed except for the 'playing video games' example is something that I regularly use frontier models to do for myself.
LLMs are at most one component of the systems you refer to. Reasoning models and agents are something larger.
> If you say to yourself, "hey this thing is a general intelligence, I should try to throw it at problems I have generally", you'll find yourself astonished at the range of tasks with which it can outperform you.
Where AI has been thrust at me (search engines and YouTube video and chat summaries) it has been for the sort of thing where I'd expect it to excel, yet I've been underwhelmed. The one time I consciously invoked the "AI assist" on a search query (to do the sort of thing I might otherwise try on Wolfram Alpha) it committed a basic logical error. The project READMEs that Show HN has exposed me to this year have been almost unfailingly abominable. (Curiously, I'm actually okay with AI art a significant amount of the time.)
But none of that experience is even a hundredth as annoying as the constant insinuation from AI proponents that any and all opposition is in some way motivated by ego protection.
I used the $200/mo OpenAI subscription for a while, but cancelled when Gemini 3 came out. It was useful for the deep research credits until the Web search gpt got sufficiently good on it's own
Oh for sure. Why are movies scattered all over oblivion? Because there's no simple marketplace for licensing movies, it's a closed market that requires doing lots of behind-the-scenes deals. Healthcare? Only specific providers can make medical equipment, tons of red tape, opaque billing structures, insurance locked out in weird ways, etc.
To understand how healthy a market is, ask 'how easily could a brand new startup innovate in this area'. If the answer is 'not easy at all' - then that thing is going to be expensive, rent seeking, and actively distorting incentives to make itself more money.
No, ads are not the same thing as free speech at all. "Free speech" is the right to say anything to anyone *who is willing to listen*. You don't have a right to come into my home and tell me your ideas about immigration policy - though you do have a right to talk about immigration policy in other places!
The government has to guarantee that there are places for people to say things. But the government does not have to guarantee that there are places for people to say things *in my own home*. And similarly, I think most public spaces should be free from ads and other 'attention pollution'. If a company wants to write about their own product, that's fine, but they must do so in a place where other people are free to seek them out, as opposed to doing so in a way that forces the writing upon others without consent.
I'm not sure how many people would recognize 524,288 as a power of 2, but probably many fewer than the number of people who would recognize 512 as a power of 2
I recommend having instant recognition of all the powers up to 2^24, this has proven very useful over the years e.g. when skimming quickly through a log searching for anomalies, flag patterns etc. If you recite them in sequence twice a day for a couple of weeks, then they’ll stick in your mind for decades. I can say from experience this method also works for the NATO phonetic alphabet, Hamlet’s soliloquies, ASCII, and mum’s lemon drizzle cake recipe. It fails however for the periodic table, ruined forever by Tom Lehrer.
Ever the quandary: satisfy some people completely, or a larger number but incompletely.
I concur with the suggestion of 2^19, because even though fewer people would recognize it immediately, many of them would question the significance, and their eventual realization would be more satisfying.
> I think you might be overestimating the curiosity of the average person.
Oh absolutely. But I like to optimize for the others. :)
Also, the audience for consideration here is pretty ... rarified. 0.0% of people in the world, to a first approximation, have heard of Zig. Those that have, are probably pretty aware of powers-of-two math, and maybe curious enough to wonder about a value that seems odd but is obviously deliberately chosen.
> Is it a woe of modern times? Or has it always been this way?
I suspect it's always been this way. People are busy, math is hard, and little games with numbers are way less engaging than games with physical athleticism or bright lights.
I don't really like the name. When you say 'Hacklore' I think of the hackers at MIT and such. That stuff is really cool and shouldn't be stopped or suppressed!
Might be a mobile app / EU-based account only thing, but I've seen it numerous times and I'm almost certain I've seen it on the web version of Gmail too.
There's a pretty long waiting period before it does this. I'm not sure what the time required is, but I think it's at least 3 months. And it only does this if you don't interact with the newsletter at all.
Just speculation, but it's possible if you also use a non-web/non-gmail-app client it might suppress these notifications.
But now, we have LLMs that can reliably beat video games like Pokemon, without any specialized training for playing video games. And those same LLMs can write code, do math, write poetry, be language tutors, find optimal flight routes from one city to another during the busy Christmas season, etc.
How does that not fit the definition of "General Intelligence"? It's literally as capable as a high school student for almost any general task you throw it at.
reply