Hacker Newsnew | past | comments | ask | show | jobs | submit | osti's commentslogin

Somehow regresses on SWE bench?

I don't know how these benchmarks work (do you do a hundred runs? A thousand runs?), but 0.1% seems like noise.

That benchmark is pretty saturated, tbh. A "regression" of such small magnitude could mean many different things or nothing at all.

i'd interpret that as rounding error. that is unchanged

swe-bench seems really hard once you are above 80%


it's not a great benchmark anymore... starting with it being python / django primarily... the industry should move to something more representative

Openai has; they don't even mention score on gpt-5.3-codex.

On the other hand, it is their own verified benchmark, which is telling.


Social "science" be social science.


I saw many complaints about assetto corsa evo early access about the slow pace of development after EA release. So I'm not sure if I wanna "beta test" this one.


AC rally is developed by another studio (Supernova) though, not Kunos. From online and youtube reviews, it seems to be pretty good in terms of visuals, realism and FFB. Might have less bugs.


I saw the suspension behavior and I can’t necessarily agree with the realism statement. Some mild bumps that a million dollar rally car would absorb no problem send the car flying as if it was a 1995 Civic DX going down the road.

I own the original AC and I just can’t get over how bad the audio is. Sounds like a simple pitch change on a static mp3 file that’s not even that accurate at idle. An E30 sounds like a synth.

I thought ok, it’s an old game now, I get it, they must have fixed that in ACC, nope. They must have fixed it in AC Evo… based on the videos I’ve seen, nope.

Maybe rally does it better? Does anyone know?


That's why reading comments about geopolitics on the Internet is largely useless. Big news! A country's population supports its own country on international stage! If you go on Chinese social media, it'll be mostly about how awful the Americans are, and vice versa if you are on Reddit for example. So what is even the point of reading them, anywhere..


I think you and I are on very different Reddits, if you're using it as an example of pro-American social media.

Fully agree that reading either for geopolitical opinions is useless.


That's the biggest problem I have with the recommendation to buy indices as if indices grow at >8% annually is an natural law.

Many (most) indices of countries in the world performed way less than 8%. US performed exceptionally well over almost a century so people are starting to take it as a natural law. If I buy US index, I'm still putting a directional bet on US stock market performing at an exceptional rate.


One can buy "all-in-one" index-of-index funds that have all US equities, all EU, etc. In Canada (which sub-thread stated with), see VEQT or XEQT (100% equities), VGRO/XGRO (80/20), VBAL/XBAL (60/40), VCNS/XCNS (40/60).

You can probably find an 'asset allocation' fund in most countries; e.g., in the US:

* https://investor.vanguard.com/investment-products/mutual-fun...

There are also (more dynamic) 'target date' funds, where the bond allocation increases over time.


Yeah, and those have underpermed historically and it's definitely not recommended by most people.


> Yeah, and those have underpermed historically […]

Huh? Underperformed what, exactly? A globally-diversified portfolios of stocks have underperformed …a globally-diversified portfolios of stocks? …tech stocks? …consumer staples? …utilities? …Treasuries?

1/3/5/10/20-year annualized returns are available at:

* https://canadianportfoliomanagerblog.com/model-etf-portfolio...

> […] and it's definitely not recommended by most people.

Again: huh? Who is not recommending index funds for most people? And what is recommended "by most people" if not index funds?


Look at IXUS or VEU for example, in the past 5-10 years, or even longer, they have significantly underperformed US indices.


Qwen 3 max has been getting rather bad reviews around the web (both on reddit and chinese social media), and from my own experience with it. So I wouldn't expect this to be worse.


Also, my experience with it wasn't that good; but it was looking good on benchmarks ..

It seems benchmark maxing, what you do when you're out of tricks?


Ohhh, so Qwen3 235B-A22B-2507 is still better?


I wouldn't say that, but just that qwen 3 max thinking definitely underperforms relative to its size.


Apple chips are fastest in single core in most benchmarks, not just passmark.


What do you mean by imaginary patterns from Codeforces?


The problems are really low quality.

Topcoder/ICPC/CodeJam have the best problem statements.

Check out this for example: https://codeforces.com/contest/2136/problem/C

Why would I care if some array can be obtained by combining some other patterns?


This is a pretty neat problem. Maybe this isn't the activity for you...


No, sorry. You have a bad taste.


This kinda tracks with the latest estimate of power usage of llm inference published by google https://news.ycombinator.com/item?id=44972808. If inference isnt that power hungry like people thought, they must be able to make good money from those subscriptions.


> power hungry like people thought

The only people who thought this were non-practitioners.


In here https://blog.google/products/gemini/gemini-2-5-deep-think/, the professor google worked with also claimed proving some previously unproven conjecture.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: