More

osti · 2026-02-05T17:50:16 1770313816

Somehow regresses on SWE bench?

lkbm · 2026-02-05T18:01:44 1770314504

I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

SubiculumCode · 2026-02-05T18:11:04 1770315064

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

usaar333 · 2026-02-05T17:59:27 1770314367

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%

Squarex · 2026-02-05T18:04:43 1770314683

it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

usaar333 · 2026-02-05T18:17:27 1770315447

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.

osti · 2025-12-27T17:33:21 1766856801

Social "science" be social science.

osti · 2025-11-20T19:00:51 1763665251

I saw many complaints about assetto corsa evo early access about the slow pace of development after EA release. So I'm not sure if I wanna "beta test" this one.

alexachilles90 · 2025-11-20T19:59:21 1763668761

AC rally is developed by another studio (Supernova) though, not Kunos. From online and youtube reviews, it seems to be pretty good in terms of visuals, realism and FFB. Might have less bugs.

culopatin · 2025-11-20T20:43:34 1763671414

I saw the suspension behavior and I can’t necessarily agree with the realism statement. Some mild bumps that a million dollar rally car would absorb no problem send the car flying as if it was a 1995 Civic DX going down the road.

I own the original AC and I just can’t get over how bad the audio is. Sounds like a simple pitch change on a static mp3 file that’s not even that accurate at idle. An E30 sounds like a synth.

I thought ok, it’s an old game now, I get it, they must have fixed that in ACC, nope. They must have fixed it in AC Evo… based on the videos I’ve seen, nope.

Maybe rally does it better? Does anyone know?

osti · 2025-11-15T19:50:24 1763236224

That's why reading comments about geopolitics on the Internet is largely useless. Big news! A country's population supports its own country on international stage! If you go on Chinese social media, it'll be mostly about how awful the Americans are, and vice versa if you are on Reddit for example. So what is even the point of reading them, anywhere..

mh- · 2025-11-15T20:01:06 1763236866

I think you and I are on very different Reddits, if you're using it as an example of pro-American social media.

Fully agree that reading either for geopolitical opinions is useless.

osti · 2025-11-09T20:37:27 1762720647

That's the biggest problem I have with the recommendation to buy indices as if indices grow at >8% annually is an natural law.

Many (most) indices of countries in the world performed way less than 8%. US performed exceptionally well over almost a century so people are starting to take it as a natural law. If I buy US index, I'm still putting a directional bet on US stock market performing at an exceptional rate.

throw0101a · 2025-11-09T21:38:54 1762724334

One can buy "all-in-one" index-of-index funds that have all US equities, all EU, etc. In Canada (which sub-thread stated with), see VEQT or XEQT (100% equities), VGRO/XGRO (80/20), VBAL/XBAL (60/40), VCNS/XCNS (40/60).

You can probably find an 'asset allocation' fund in most countries; e.g., in the US:

* https://investor.vanguard.com/investment-products/mutual-fun...

There are also (more dynamic) 'target date' funds, where the bond allocation increases over time.

osti · 2025-11-09T22:25:30 1762727130

Yeah, and those have underpermed historically and it's definitely not recommended by most people.

throw0101a · 2025-11-09T22:46:10 1762728370

> Yeah, and those have underpermed historically […]

Huh? Underperformed what, exactly? A globally-diversified portfolios of stocks have underperformed …a globally-diversified portfolios of stocks? …tech stocks? …consumer staples? …utilities? …Treasuries?

1/3/5/10/20-year annualized returns are available at:

* https://canadianportfoliomanagerblog.com/model-etf-portfolio...

> […] and it's definitely not recommended by most people.

Again: huh? Who is not recommending index funds for most people? And what is recommended "by most people" if not index funds?

osti · 2025-11-12T08:34:09 1762936449

Look at IXUS or VEU for example, in the past 5-10 years, or even longer, they have significantly underperformed US indices.

osti · 2025-11-06T17:01:08 1762448468

Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.

SamDc73 · 2025-11-06T17:08:44 1762448924

Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?

Alifatisk · 2025-11-06T17:11:00 1762449060

Ohhh, so Qwen3 235B-A22B-2507 is still better?

osti · 2025-11-06T19:39:52 1762457992

I wouldn't say that, but just that qwen 3 max thinking definitely underperforms relative to its size.

osti · 2025-09-27T22:17:17 1759011437

Apple chips are fastest in single core in most benchmarks, not just passmark.

osti · 2025-09-05T18:35:45 1757097345

What do you mean by imaginary patterns from Codeforces?

coolThingsFirst · 2025-09-06T07:41:09 1757144469

The problems are really low quality.

Topcoder/ICPC/CodeJam have the best problem statements.

Check out this for example: https://codeforces.com/contest/2136/problem/C

Why would I care if some array can be obtained by combining some other patterns?

pxx · 2025-09-06T11:43:15 1757158995

This is a pretty neat problem. Maybe this isn't the activity for you...

coolThingsFirst · 2025-09-06T13:43:41 1757166221

No, sorry. You have a bad taste.

osti · 2025-08-28T13:20:56 1756387256

This kinda tracks with the latest estimate of power usage of llm inference published by google https://news.ycombinator.com/item?id=44972808. If inference isnt that power hungry like people thought, they must be able to make good money from those subscriptions.

jeffbee · 2025-08-28T15:09:20 1756393760

> power hungry like people thought

The only people who thought this were non-practitioners.

osti · 2025-08-20T23:23:18 1755732198

In here https://blog.google/products/gemini/gemini-2-5-deep-think/, the professor google worked with also claimed proving some previously unproven conjecture.