More

tyfighter · 2026-01-04T23:50:20 1767570620

Another amazing piece of gaming and art by an incredible, dedicated community. I'm thoroughly enjoying it so far, and I've still got a long way to go.

tyfighter · 2025-12-18T18:57:08 1766084228

I haven't made a website of any kind since a C&C: Red Alert fan site somewhere on GeoCities in the late 90s.

I work on graphics drivers. They're hard write and even harder to debug. You have to be a huge nerd about graphics to get very far. It's a relatively rare skill set, but new, younger, nerdier people keep on coming. Most people in graphics are quiet and are just keeping the industry functioning (me). It's applied computer architecture in a combination of continuous learning and intuition from experience.

ex-aws-dude · 2025-12-18T19:25:26 1766085926

That is interesting, do you ever find bugs in the hardware itself?

Is there some big spec document or ISA that you follow when implementing the driver?

Also I'm curious is it easier to write a driver for the modern "lower level" APIs like vulkan/dx12?

tyfighter · 2025-12-18T19:42:15 1766086935

Hardware bugs can be found during chip bring-up within the first couple of months back from the fab, but since I've worked in this area I've never actually seen a bug that couldn't be worked around. They happen, but they're rare and I've never experienced a chip needing a respin because of a bug.

There is documentation, but it's not as well organized as you might imagine. Documentation is usually only necessary when implementing new features, and the resulting code doesn't change often. There are also multiple instruction sets as there are a bunch of little processors you need to control.

Vulkan/DX12 aren't really "low-level" APIs. They're "low overhead", and honestly, no. Their code base is just as large and complicated, if not more so, than OpenGL/DX11.

d-lisp · 2025-12-19T21:01:17 1766178077

I wish I was doing your job. How do I do so ?

tyfighter · 2025-10-16T20:30:30 1760646630

This is something I heard through the grape vine years ago, but when you're a very large corporation negotiating CPU purchasing contracts in quantities of millions, you can get customizations that aren't possible outside of gigantic data centers. Things like enabling custom microcode (and development support) for adding new instructions for the benefit of your custom JIT-ed server infrastructure. The corporate entity here is likely a hyperscaler that everyone knows.

jeffbee · 2025-10-16T20:34:30 1760646870

Some of the public x86 ISA extensions were things that hyperscalers specifically requested.

clausecker · 2025-10-17T00:26:39 1760660799

Such as?

kijiki · 2025-10-17T01:08:02 1760663282

Most of the Intel cache partitioning things were driven primarily by Google. The holy grail was to colocate latency-sensitive tasks with bulk background tasks to increase cluster utilization.

jeffbee · 2025-10-17T03:12:55 1760670775

I guess technically CAT and RDT are not ISA extensions because they are managed by MSRs. I was thinking of aspects of BMI, but I am sure that large-scale buyers had input into things like vector extensions, PMU features, and the things you mentioned as well.

dboreham · 2025-10-16T23:13:21 1760656401

Historically the large buyer that could do this was NSA. Men in black would show up and tell you to add a bit population count instruction to your CPU..

jeffbee · 2025-10-16T23:35:37 1760657737

I think it's doubtful that around the time that POPCNT was added to CPUs the NSA was all that influential. Their big scary data center, which is actually tiny, wasn't built until 2014, while players like Google and Meta had much larger data centers years earlier and were undoubtedly larger buyers of AMD Barcelona / Intel Westmere where POPCNT first emerged.

aleph_minus_one · 2025-10-17T11:01:54 1760698914

Here is an article about the popcnt instruction:

https://vaibhavsagar.com/blog/2019/09/08/popcount/

The author of the article believes that while popcnt was indeed used for cryptographical analysis in the 60s, but the fact that popcnt disappeared from instruction sets is seen as evidence that this usage became a lot less important over time. So the author considers the reason for the reappearance of popcnt that there simply exist lots of other useful applications of popcnt that become evident over these decades.

A German article about the same topic:

https://nickyreinert.medium.com/ne-49-popcount-ea62aa304f88

dboreham · 2025-10-17T12:44:55 1760705095

Oh, when I saw this happen first-hand, it was probably 1986.

Note that the first "data center" I know of was built at Bletchley Park in the 1940s.

Maxious · 2025-10-17T00:09:08 1760659748

eg. "custom Intel Xeon 6 processors, available only on AWS." https://aws.amazon.com/blogs/aws/best-performance-and-fastes...

renewiltord · 2025-10-17T00:50:27 1760662227

Oracle Cloud used to boast this as something they had. Tuned for OracleDB with more cache, different core count.

And every homelabber has had one of the 7B13 or 9654-variant processors

tyfighter · 2025-09-08T23:13:54 1757373234

Actually, the reason Transmeta CPUs were so slow was that they didn't have an x86 instruction hardware decoder. Every code cache (IIRC it was only 32 MB) miss resulted in a micro-architectural trap which translated x86 instructions to the underlying uops in software.

tyfighter · 2025-08-10T01:49:31 1754790571

How is anyone just supposed to know that? It's not hard to find vim, but no one says, "You need to be running this extra special vim development branch where people are pushing vim to the limits!" Yes, it's fragmented, and changing fast, but it's not reasonable to expect people just wanting a tool to be following the cutting edge.

CityOfThrowaway · 2025-08-10T03:45:44 1754797544

I agree that it might not be reasonable to expect people to keep up with the latest.

For this specific thing (LLM-assisted coding), we are still in nerd territory where there are tremendous gains to be had from keeping up and tinkering.

There's a lot of billions dollars being invested to give devs who don't want to do this the right tools. We aren't quite there yet, largely because the frontier is moving so fast.

I made my original comment because it was so far from my experience, and I assumed it was because I am using a totally different set of tools.

If somebody really doesn't want to be left behind, the solution is to do the unreasonable: read hacker news everyday and tinker.

Personally, I enjoy that labor. But it's certainly not for everybody.

dnh44 · 2025-08-10T12:26:58 1754828818

I find that X and Discord are more useful than HN for trying to keep up to date. Which is a shame I think but it is what it is.

dbalatero · 2025-08-10T02:08:36 1754791716

I agree with your comment, but I also chuckled a bit, because Neovim _is_ a fast changing ecosystem with plugins coming out to replace previous plugins all the time, and tons of config tweakers pushing things to the limit. That said… one does not have to replace their working Neovim setup just because new stuff came out. (And of course, minimalist vim users don't use any plugins!)

astrange · 2025-08-10T04:20:39 1754799639

That's what people always seemed to say about emacs, that you haven't used it unless you've learned 300 incredibly complicated key bindings and have replaced half the program with a mail reader.

tyfighter · 2025-08-10T01:39:51 1754789991

You're (they're?) not alone. This mirrors every experience I've had trying to give them a chance. I worry that I'm just speaking another language at this point.

EDIT: Just to add context seeing other comments, I almost exclusively work in C++ on GPU drivers.

almostgotcaught · 2025-08-10T02:12:04 1754791924

Same - I work on a cpp GPU compiler. All the LLMs are worthless. Ironically the compiler I work on is used heavily for LLM workloads.

thrown-0825 · 2025-08-10T09:14:47 1754817287

it really only works for problem domains saturated with medium blogspam and youtube tutorials.

notjoemama · 2025-08-10T17:05:05 1754845505

That's a bingo! Christoph Waltz is just a great actor.

I'm building an app in my stack with fairly common requirements. There are a few code examples that cover requirements but none that cover our specific scenario. After searching the web myself, I asked 3 different AI models. All they did was regurgitate the closest public GitHub example, lacking the use case I was trying to do. Solving this problem can only be done by understanding the abstraction of the alteration in design.

These things can't actually think. And now they're allowed to be agentic.

In some ways they're just glorified search engines but there's a geopolitical sprint to see who can get them to mock "thinking" enough to fool everybody.

Out of ego and greed, everything will be turned over to this machine, and that will be the end of humanity; not humans...humanity.

nxobject · 2025-08-10T08:00:12 1754812812

There's the market out there for a consultancy that will fine-tune an LLM for your unique platform, stack, and coding considerations of choice – especially with proprietary platforms. (IBM's probably doing it right now for their legacy mainframe systems.) No doubt Apple is trying to figure out how to get whatever frameworks they have cooking up ASAP into OpenAI etc.'s models.

bobsmooth · 2025-08-10T07:50:34 1754812234

I can't imagine there is a lot of GPU driver code in the training data.

tyfighter · 2025-07-03T02:54:48 1751511288

Modern x86 implementations don't even do the XOR. It just renames the register to "zero".

tyfighter · 2025-06-08T00:33:04 1749342784

I don't know which strains credulity more:

1.) In the 1960s, the Air Force developed a top-secret device power enough to simulate the EMP of a nuclear blast, i.e. some form of Non-nuclear Electromagnetic Pulse device that wouldn't be developed into a weapon for a few more decades. Rather than a controlled scientific test, they then decided to secretly drive it up to an operating missile base. They proceed to set it up on a 60 foot tall portable stand without anyone at the site noticing. I guess people at the site had their AirPods in for hours while giant generators run to charge the banks of capacitors necessary to run something that huge. No one noticed anything of this happening until it was hovering over the gate, they had their rifles pointed at it, and the mad scientists behind this plan were justified in their fear that our missile launch facilities were vulnerable to EMP.

2.) In the 1960s, a UFO disabled a missile at a launch facility.

julienchastang · 2025-06-08T13:28:51 1749389331

Exactly. The laws of physics teaches us that being visited by extraterrestrial space crafts is nearly zero. The likelihood that the government (Pentagon/USAF) is lying to themselves and to us to cover up there own inanities; very high.

tyfighter · on Jan 6, 2025

I must not be the target audience for this kind of thing, but I really don't get it. The few times I've opened HN today I've seen this at the top, and the number of points has been higher. I've opened it 3 times, and clicked the button, and some other icons showed up below. The first time I didn't open even bother to mouse over the icons, so I didn't know you'd be buying things. I closed before I got to 50 stimulations. The second time, I did hover over the icons showing up, and clicked a couple, and nothing seemed to happen other than spending stimulation, so I closed before 100. I just did this a third time as it's now over 1200 points, and I really just don't understand what is going on. What am I missing?

risenshinetech · on Jan 6, 2025

I think this is a scenario where if you have to ask, you'll never know. Perhaps, ironically, there just wasn't enough immediate stimulation for you to continue...

tyfighter · on Jan 6, 2025

I don't even know what a clicker game is or why one would be stimulating.

tyfighter · on Oct 8, 2024

> The fundamental problem here is that no standardized shader ISA or even bytecode exists, and there is no material incentive for any vendor to create or agree upon one.

It's not really a material incentive issue. GPU instructions are "microcode". The instructions can contain timing and dependency information that is specific to the underlying microarchitecture. That information is not standardizable, and even if it was, it just kicks the can down the road by adding yet another shader program translation step from something "standard" to hardware specific. CPUs translate (or fetch) ISA to microcode while running, but that's a chip area cost. If every SM/CU needed a hardware ISA to microcode translator, that would be fewer SMs/CUs per GPU. My opinion is that GPUs need custom microinstruction ISAs to stay competitive with each other, and that will always require some form offline shader translation.