More

lswainemoore · 2026-01-14T23:41:24 1768434084

lswainemoore · 2025-04-16T03:09:06 1744772946

That may be true, but if you look at the distribution it puts out for this, it definitely smells funny. It looks like a very steep normal distribution, centered at 0 (ish). Seems like it should have two peaks? But maybe those are just getting compressed into one because of resolution of buckets?

blackbear_ · 2025-04-16T19:39:33 1744832373

It does indeed have two peaks: https://en.m.wikipedia.org/wiki/Inverse_distribution#Recipro...

As the mean of this distribution does not exist (it's "infinite") the Monte Carlo estimates aren't reliable

lswainemoore · on Jan 25, 2025

Why so negative? Given that the served population is conservatively a quarter of either of those areas, doesn't seem like a fair comparison.

More to the point, I've been favorably impressed with the transit options since moving here, and in terms of reliability it's been better than NYC, though obviously there are fewer trains/branches.

I'd love to see BART open later, like NYC, but even Tokyo trains stop at midnight.

ninetyninenine · on Jan 25, 2025

It's fair. NYC is 8.8 million, bay area is 7 million. Tokyo is about double that.

Not being negative. Being realistic. It's unfortunate that being realistic often is negative. Transit here is garbage. You either luck out and live and work near transit or you're like most people and have to drive.

A couple million in energy savings doesn't mean anything compared to the amount wasted by cars.

lswainemoore · on Jan 27, 2025

Those are not the right population metrics to compare. If you're talking full Bay Area, you might as well talk NYC metro area (MTA claims to serve 15.3 million [1]). Tokyo's even trickier, but I think 36 million [2] seems closer to right.

It's probably not worth arguing about too much, because ultimately I agree with you that there's a lot more to be done to reduce car ridership. But pointing at those places and saying "copy them" misses a lot of structural differences.

[1] https://mta.info/about [2] https://storymaps.arcgis.com/stories/0056b6a98b8b4a48869f822...

verteu · on Jan 25, 2025

Surely the relevant metric is population density?

ninetyninenine · on Jan 25, 2025

You have the causal arrow backwards. A high density train network enables population density and high energy efficiency.

At current population density all of our lives would improve if a network as dense as the tokyo metro appeared in the bay area over night.

astrange · on Jan 25, 2025

NYC is more or less the only system in the world that runs all night like that. It makes it very hard to do maintenance.

frosted-flakes · on Jan 25, 2025

And NYC can do that because most of it's lines have three or four tracks.

lswainemoore · on Dec 20, 2024

They're in the original post. Also here: https://x.com/fchollet/status/1870172872641261979 / https://x.com/fchollet/status/1870173137234727219

Personally, I think it's fair to call them "very easy". If a person I otherwise thought was intelligent was unable to solve these, I'd be quite surprised.

nopinsight · on Dec 20, 2024

Thanks! I've analyzed some easy problems that o3 failed at. They involve spatial intelligence including connection and movement. This skill is very hard to learn from textual and still image data.

I believe this sort of core knowledge is learnable through movement and interaction data in a simulated world and it will not present a very difficult barrier to cross.

(OpenAI purchased a company behind a Minecraft clone a while ago. I've wondered if this is the purpose.)

lswainemoore · on Dec 20, 2024

> I believe this sort of core knowledge is learnable through movement and interaction data in a simulated world and it will not present a very difficult barrier to cross.

Maybe! I suppose time will tell. That said, spatial intelligence (connection/movement included) is the whole game in this evaluation set. I think it's revealing that they can't handle these particular examples, and problematic for claims of AGI.

MVissers · on Dec 21, 2024

Probably just not trained on this kind of data. We could create a benchmark about it, and they'd shatter it within a year or so.

I'm starting to really see no limits on intelligence in these models.

sungho_ · on Dec 21, 2024

Doesn't the fact that it can only accomplish tasks with benchmarks imply that it has limitations in intelligence?

qup · on Dec 21, 2024

> Doesn't the fact that it can only accomplish tasks with benchmarks

That's not a fact

PoignardAzur · on Dec 21, 2024

> This skill is very hard to learn from textual and still image data.

I had the same take at first, but thinking about it again, I'm not quite sure?

Take the "blue dots make a cross" example (the second one). The inputs only has four blue dots, which makes it very easy to see a pattern even in text data: two of them have the same x coordinate, two of them have the same y (or the same first-tuple-element and second-tuple-element if you want to taboo any spatial concepts).

Then if you look into the output, you can notice that all the input coordinates are also in the output set, just not always with the same color. If you separate them into "input-and-output" and "output-only", you quickly notice that all of the output-only squares are blue and share a coordinate (tuple-element) with the blue inputs. If you split the "input-and-output" set into "same color" and "color changed", you can notice that the changes only go from red to blue, and that the coordinates that changed are clustered, and at least one element of the cluster shares a coordinate with a blue input.

Of course, it's easy to build this chain of reasoning in retrospect, but it doesn't seem like a complete stretch: each step only requires noticing patterns in the data, and it's how a reasonably puzzle-savvy person might solve this if you didn't let them draw the squares on papers. There are a lot of escape games with chains of reasoning much more complex and random office workers solve them all the time.

The visual aspect makes the patterns jump to us more, but the fact that o3 couldn't find them at all with thousands of dollars of compute budget still seems meaningful to me.

EDIT: Actually, looking at Twitter discussions[1], o3 did find those patterns, but was stumped by ambiguity in the test input that the examples didn't cover. Its failures on the "cascading rectangles" example[2] looks much more interesting.

[1]: https://x.com/bio_bootloader/status/1870339297594786064

[2]: https://x.com/_AI30_/status/1870407853871419806

lswainemoore · on April 25, 2023

One reason you might not want to run it in production is if it's not a read-only query.

lswainemoore · on April 4, 2023

Location: US (NYC currently)

Remote: sure

Willing to relocate: yes

Technologies: Python (Flask, SQLAlchemy, Django), JS (React, Next, AngularJS, Typescript), SQL (Postgres, PostGIS), HTML/CSS

Resume/CV: https://lincoln.swaine-moore.is/bragging

Email: lswainemoore@gmail.com

About me:

Senior software engineer with experience all over the stack; special interest in sustainable technology/maps/making web applications that are fun to use, but most of all interested in working for you, reader. Currently seeking freelance opportunities, or the right full-time job.

Check out my website: https://lincoln.swaine-moore.is/blue.

lswainemoore · on April 3, 2023

SEEKING WORK | US | remote or in-person (potentially willing to relocate)

Technologies: Python (Flask, SQLAlchemy, Django), JS (React, Next, AngularJS, Typescript), SQL (Postgres, PostGIS), HTML/CSS

Resume/CV: https://lincoln.swaine-moore.is/bragging

Email: lswainemoore@gmail.com

About me:

Senior software engineer with experience all over the stack; special interest in sustainable technology/maps/making web applications that are fun to use, but most of all interested in working for *you*, reader. Currently seeking freelance opportunities, or the right full-time job.

Check out my website: https://lincoln.swaine-moore.is/blue.

lswainemoore · on March 15, 2022

lswainemoore · on Dec 17, 2021

/* When it rains does it pour? */

SELECT date, wban, stn, year, mo, da, temp, count_temp, dewp, count_dewp, slp, count_slp, stp, count_stp, visib, count_visib, wdsp, count_wdsp, mxpsd, gust, max, flag_max, min, flag_min, prcp, flag_prcp, sndp, fog, rain_drizzle, snow_ice_pellets, hail, thunder, tornado_funnel_cloud, usaf, name, country, state, call, lat, lon, elev, begin, end, point_gis, fake_date FROM `fh-bigquery.weather_gsod.all_geoclustered` WHERE lat IS NOT NULL AND lon IS NOT NULL AND lat != 0 AND lon != 0 AND lat != 1 AND lon != 1 AND lat != -1 AND lon != -1 AND lat != 2 AND lon != 2 AND lat != -2 AND lon != -2 AND lat != 3 AND lon != 3 AND lat != -3 AND lon != -3 AND lat != 4 AND lon != 4 AND lat != -4 AND lon != -4 AND lat != 5 AND lon != 5 AND lat != -5 AND lon != -5 AND lat != 6 AND lon != 6 AND lat != -6 AND lon != -6 AND lat != 7 AND lon != 7 AND lat != -7 AND lon != -7 AND lat != 8 AND lon != 8 AND lat != -8 AND lon

lswainemoore · on April 23, 2021

Unless I'm misreading the link, those numbers (~600k/yr) are global. Still, staggeringly high, and concentrated in a region.

warrenm · on May 5, 2021

Yes, the number is "global" - but the actual part of the "globe" that's affected is remarkably small ..making its rate far worse than it otherwise appears