That may be true, but if you look at the distribution it puts out for this, it definitely smells funny. It looks like a very steep normal distribution, centered at 0 (ish). Seems like it should have two peaks? But maybe those are just getting compressed into one because of resolution of buckets?
Why so negative? Given that the served population is conservatively a quarter of either of those areas, doesn't seem like a fair comparison.
More to the point, I've been favorably impressed with the transit options since moving here, and in terms of reliability it's been better than NYC, though obviously there are fewer trains/branches.
I'd love to see BART open later, like NYC, but even Tokyo trains stop at midnight.
It's fair. NYC is 8.8 million, bay area is 7 million. Tokyo is about double that.
Not being negative. Being realistic. It's unfortunate that being realistic often is negative. Transit here is garbage. You either luck out and live and work near transit or you're like most people and have to drive.
A couple million in energy savings doesn't mean anything compared to the amount wasted by cars.
Those are not the right population metrics to compare. If you're talking full Bay Area, you might as well talk NYC metro area (MTA claims to serve 15.3 million [1]). Tokyo's even trickier, but I think 36 million [2] seems closer to right.
It's probably not worth arguing about too much, because ultimately I agree with you that there's a lot more to be done to reduce car ridership. But pointing at those places and saying "copy them" misses a lot of structural differences.
Personally, I think it's fair to call them "very easy". If a person I otherwise thought was intelligent was unable to solve these, I'd be quite surprised.
Thanks! I've analyzed some easy problems that o3 failed at. They involve spatial intelligence including connection and movement. This skill is very hard to learn from textual and still image data.
I believe this sort of core knowledge is learnable through movement and interaction data in a simulated world and it will not present a very difficult barrier to cross.
(OpenAI purchased a company behind a Minecraft clone a while ago. I've wondered if this is the purpose.)
> I believe this sort of core knowledge is learnable through movement and interaction data in a simulated world and it will not present a very difficult barrier to cross.
Maybe! I suppose time will tell. That said, spatial intelligence (connection/movement included) is the whole game in this evaluation set. I think it's revealing that they can't handle these particular examples, and problematic for claims of AGI.
> This skill is very hard to learn from textual and still image data.
I had the same take at first, but thinking about it again, I'm not quite sure?
Take the "blue dots make a cross" example (the second one). The inputs only has four blue dots, which makes it very easy to see a pattern even in text data: two of them have the same x coordinate, two of them have the same y (or the same first-tuple-element and second-tuple-element if you want to taboo any spatial concepts).
Then if you look into the output, you can notice that all the input coordinates are also in the output set, just not always with the same color. If you separate them into "input-and-output" and "output-only", you quickly notice that all of the output-only squares are blue and share a coordinate (tuple-element) with the blue inputs. If you split the "input-and-output" set into "same color" and "color changed", you can notice that the changes only go from red to blue, and that the coordinates that changed are clustered, and at least one element of the cluster shares a coordinate with a blue input.
Of course, it's easy to build this chain of reasoning in retrospect, but it doesn't seem like a complete stretch: each step only requires noticing patterns in the data, and it's how a reasonably puzzle-savvy person might solve this if you didn't let them draw the squares on papers. There are a lot of escape games with chains of reasoning much more complex and random office workers solve them all the time.
The visual aspect makes the patterns jump to us more, but the fact that o3 couldn't find them at all with thousands of dollars of compute budget still seems meaningful to me.
EDIT: Actually, looking at Twitter discussions[1], o3 did find those patterns, but was stumped by ambiguity in the test input that the examples didn't cover. Its failures on the "cascading rectangles" example[2] looks much more interesting.
Senior software engineer with experience all over the stack; special interest in sustainable technology/maps/making web applications that are fun to use, but most of all interested in working for you, reader. Currently seeking freelance opportunities, or the right full-time job.
Senior software engineer with experience all over the stack; special interest in sustainable technology/maps/making web applications that are fun to use, but most of all interested in working for *you*, reader. Currently seeking freelance opportunities, or the right full-time job.
SELECT
date,
wban,
stn,
year,
mo,
da,
temp,
count_temp,
dewp,
count_dewp,
slp,
count_slp,
stp,
count_stp,
visib,
count_visib,
wdsp,
count_wdsp,
mxpsd,
gust,
max,
flag_max,
min,
flag_min,
prcp,
flag_prcp,
sndp,
fog,
rain_drizzle,
snow_ice_pellets,
hail,
thunder,
tornado_funnel_cloud,
usaf,
name,
country,
state,
call,
lat,
lon,
elev,
begin,
end,
point_gis,
fake_date
FROM
`fh-bigquery.weather_gsod.all_geoclustered`
WHERE
lat IS NOT NULL
AND lon IS NOT NULL
AND lat != 0
AND lon != 0
AND lat != 1
AND lon != 1
AND lat != -1
AND lon != -1
AND lat != 2
AND lon != 2
AND lat != -2
AND lon != -2
AND lat != 3
AND lon != 3
AND lat != -3
AND lon != -3
AND lat != 4
AND lon != 4
AND lat != -4
AND lon != -4
AND lat != 5
AND lon != 5
AND lat != -5
AND lon != -5
AND lat != 6
AND lon != 6
AND lat != -6
AND lon != -6
AND lat != 7
AND lon != 7
AND lat != -7
AND lon != -7
AND lat != 8
AND lon != 8
AND lat != -8
AND lon
Yes, the number is "global" - but the actual part of the "globe" that's affected is remarkably small ..making its rate far worse than it otherwise appears