AMD openSIL open source firmware proof of concept

seanw444 · on June 14, 2023

Another solid move by AMD. Due to their attitude towards open-source in the GPU department, I haven't used an Nvidia graphics card in years. Their APUs kill pretty much all the competition. Their CPUs are excellent for price/performance, efficiency, and core-count. And now this? If they can prioritize getting more AI/ML and 3D modeling capability into their cards, going all-red would become the new clear meta.

adrian_b · on June 15, 2023

This is indeed good, but for now it is just a small step towards restoring their previous documentation transparency.

Before Zen, AMD published for each CPU the very useful "BIOS and Kernel Developer’s Guide".

Starting with the first Zen, this previously public information has become secret. Hopefully, with OpenSIL equivalent information will become public again.

userbinator · on June 15, 2023

equivalent information will become public again.

I'm not so hopeful, because it seems to have become common for companies to replace information with open source code, which amounts to nothing more than "put this constant in this register" with no detailed explanation of the why. I'd much rather they publish actual documentation than OSS.

delusional · on June 15, 2023

Why not both? Good documentation with a reference implementation seems like the optimal choice. They're going to have to write the reference implementation anyway.

alanfranz · on June 14, 2023

If.

If they can proritize _and_ execute on that vision.

slt2021 · on June 14, 2023

there is also second mover advantage, where they just need to copy-cat the best part of nvidia's ML stack and dont waste time figuring out what works/what doesn't work.

they will get users simply because they will be alternative to nvidia. Add good pricing strategy and it can be immediate success

fatfingerd · on June 14, 2023

I'm actually happier if they deliver a mediocre GPU performance seamlessly. I would be as annoyed as the gamers to be priced out of a great desktop/video setup. If hype more fully occupied Nvidia's fragile race cars I'd have a nicer laptop.

msla · on June 14, 2023

> there is also second mover advantage, where they just need to copy-cat the best part of nvidia's ML stack and dont waste time figuring out what works/what doesn't work.

Seems like patents would stop that.

slt2021 · on June 14, 2023

didn't know nvidia patented matmul and dot product

m00x · on June 14, 2023

There's a lot more involved than matmul and dot products.

bmicraft · on June 14, 2023

Didn't the Google v Oracle lawsuit end up confirming that you can't patent an API?

slt2021 · on June 14, 2023

AMD doesnt have to implement CUDA api, they just need to make sure their compute framework works well with pytorch/tf/MLIR or whatever high level framework is being used.

Cuda itself will change over time, so no reason for AMD to pick cuda, because nobody writes CUDA kernels by hand, they use high level frameworks

woodson · on June 14, 2023

But CUDA kernels are everywhere, not just in high-level frameworks. Look at DeepSpeed, for example, which is used in training LLMs.

slt2021 · on June 15, 2023

So what? They can be replaced by amd kernels if there will be adequate tooling support

orra · on June 14, 2023

No, it confirmed reimplementing an API is not copyright infringement.

The patent claims were rejected simply because the Google implementations were written in such a way the patents did not apply.

justinjlynn · on June 14, 2023

With Ryzen and their other recent products and their rapid pace of improvement, they certainly have developed a decent track record of that.

m00x · on June 14, 2023

They're crushing it on hardware, but being left behind in the firmware/software.

justinjlynn · on June 14, 2023

Getting Intel proprietary advantage in widely used open source packages and systems is something Intel has been very, very good at doing. I use POWER9 systems and... Yeah, the software porting situation has been rather annoying (especially in media processing libraries that rely on things like embree, opencolorio and the like)

gdevenyi · on June 14, 2023

One word.

CUDA

justinjlynn · on June 14, 2023

Valve's Proton/wine... Apple's M5 cpu. Software/hardware layers, especially system software and hardware interfaces that must remain stable and compatible essentially forever, aren't the strong lock in mechanisms everyone thinks they are.

m00x · on June 14, 2023

That absolutely doesn't work in ML training.

jjoonathan · on June 14, 2023

It absolutely does, but AMD's execution on this front has been unmitigated dogshit for the last decade and now every engineer in this niche has a scar or five from giving AMD chance after chance after chance and then limping back to NVDA when the slog is just too much.

Now that they have money, I'm hoping they can turn this around. I hear that they have... but I've always heard that and it has never been true. I need to see to believe, now.

aunty_helen · on June 14, 2023

AMD's valuation is 1/5th of Nvidia now. They've done amazing in the last 5 years in the CPU market but have absolutely missed the boat on a _trillion_ dollar valuation because they didn't put enough effort into the GPGPU market. This is all the while being 1 of 2 companies in the world that was best positioned to capitalise.

I like Dr. Su, she's an amazing CEO and has stuck it to Intel. But Intel wasn't the one they should've been spending all their effort on.

Hopefully they realise this because Nvidia selling GPUs for 40k can only go on for so long.

Drybones · on June 15, 2023

I disagree. The market ready for disruption was the CPU market stagnated by Intel. It was also the perfect opportunity to put the limited financial resources behind to come out with a viable product to compete that didn't need a ton of software ecosystem to work. Datacenters, gamers, Cloud hyperscalers, Supercomputers, they all needed something to replace their aging and inefficient Intel Skylake Xeons and Core CPUs. And their chiplet tech made it possible to make this CPU product span the entire portfolio for cheap.

If they decided to compete with GPU, they would have just lost in gaming GPU sales to NVIDIA due to mindshare and they wouldn't have had the resources to develop the software to compete. And datacenter GPUs were still a niche and rare product.

AI/ML was not realistically something they could predict or bet the house on 7 years ago. And just like Lisa said in the recent presentation, AI is in its early stages and will be a growing and long term business venture.

Additionally, their CPU chiplet developments were critical in producing the talent and experience that would translate to GPU chiplets that AMD is not utilizing on RDNA 3 and CDNA 3, providing a strategic advantage over NVIDIA.

They still have the time to enter it with their MI300s AND now they have the money and resources to develop their software ecosystems more.

AMD absolutely made the right move to focus on Zen and HPC. It's not their fault that investors are blindly overhyped about AI.

AMD's greatest threat in datacenter AI hardware isn't even NVIDIA. It's the biggest tech companies producing their own AI hardware (Google, Meta, Tesla, Amazon) effectively and eventually eliminating the need for AMD or NVIDIA GPU/AI hardware.

aunty_helen · on June 15, 2023

I appreciate your points but you're being generous to AMD. They bought ATI in 2006. I remember in 2007 seeing CUDA for the first time in the password cracking scene and thinking wow they've done something amazing. OpenCL was there too, the "one framework heterogenious processing" sounded amazing but quickly became the ugly cousin.

Then in 2011 with crypto, once again Nvidia was always supported but ATI was that other case that required the different install with only some support.

Then 6 years ago when I started working professionally in ai, it was CUDA only for most of the applications. AMD had some stuff but had pretty much given up on OpenCL and at this point was a distant second. If you chose AMD you were quickly going to be locked out while the cool kids played with CUDA and TF. This was in a time when there may have only been one framework or library to do a particular algo. So it really was a lockout.

So to your point, 16 years ago when I first saw GPGPU, you could bet your house on it becoming something massive. The scientific applications alone were obvious to anyone with a copy of BOINC.

Nvidia have shown a masterclass of building something as a corporate over many years and really dominating all competition. AMD should have jumped onboard with TF and made sure any CUDA enabled algo had a _insert whatever AMD would have used_ equivalent. But they didn't, they couldn't even get linux drivers to work.

beebeepka · on June 15, 2023

Absolutely. I wouldn't say they made all the right choices, whatever that means, but I don't remember Nvidia being stagnant.

Attacking the x86 servers was the best, most obvious thing they could've done. As it turns out,nit saved the company.

Going straight against Nvidia, as people suggest (for completely selfish reasons), would've killed them

Pet_Ant · on June 14, 2023

Are you referring to OpenCL?

imtringued · on June 15, 2023

https://www.phoronix.com/news/OpenCL-On-Mesa-Matrix

wongarsu · on June 14, 2023

ML training sounds like the easiest part. All that would be needed on that front would be AMD engineers writing good AMD backends for pytorch and tensorflow. A much simpler task than offering an optimized general-purpose interface and getting people to use it (whether that's optimized OpenCL, a CUDA-compatible API or something else).

Of course so far AMD has done a laughably bad job at these kinds of things. I guess there's hope

imtringued · on June 15, 2023

It would work if AMD focused on ROCm support for their consumer GPUs. The reason for that is quite simple. Angry consumers using your GPUs produces a lot of data that helps you design future GPUs and improve the drivers.

Fnoord · on June 15, 2023

CUDA is proprietary, you're forced to use Nvidia GPUs/TPUs on it (would Coral work?). I bought a couple of Jetson for this purpose. I'm forced to run Ubuntu 20.04 LTS on them. But it works. My workstation however can run on FOSS AMD64 and AMD/ATI GPU. Of course, the CPU isn't FOSS (RISC-V could be, but GPU not). But at least the drivers are, and they're a better netizen than Intel, and Raptor Engineering is too expensive for me.

gary_0 · on June 14, 2023

This sounds relevant to what the Oxide Computer guys are doing (the Cantrill Crew, the Keepers of the Solaris Flame, etc). They're building AMD EPYC servers open-source style, and it sounded like they were making some headway in getting AMD to allow alternatives to the "proprietary vendor blobs" way of doing hardware.

bcantrill · on June 15, 2023

We are not using OpenSIL directly, but as a strong believer in the need for open source software at the lowest layers of the stack (i.e., silicon initialization and platform enablement), we are hugely supportive of this effort. Providers of silicon have embraced open source to varying degrees; AMD is showing that they understand the importance of open source (unlike several other vendors one might be able to name!) -- and we believe that customers will be the winners.

panick21_ · on June 15, 2023

Would you eventually switch to OpenSIL or do you think staying with the minimal custom solution solution is better for your usecase.

bcantrill · on June 15, 2023

No, we won't: OpenSIL is trying to solve a different problem in that it's still providing for systems that are ultimately booting yet other systems (and therefore have need for interfaces like UEFI, ACPI, etc.) By contrast, we are building a holistic system in which hardware and software are co-designed[0][1] -- we are not seeking to boot arbitrary systems and therefore don't need the abstractions that have been invented for that boundary. We remain strongly supportive of OpenSIL because we believe that this is the kind of innovation that open source platform enablement allows: when the lowest layers of the system are open, we allow for different approaches -- we need not all be confined by PC-era abstractions.

[0] https://www.osfc.io/2022/talks/i-have-come-to-bury-the-bios-...

[1] https://oxide-and-friends.transistor.fm/episodes/holistic-bo...

unnah · on June 15, 2023

This is good news. However, the repo doesn't seem to contain any ARM code for the platform security processor (PSP), so I suppose that part would remain an encrypted binary blob?

JonChesterfield · on June 14, 2023

Sounds great! I don't know what this is but there's a linked page at https://community.amd.com/t5/business/empowering-the-industr... which looks like a reasonable place to find out. Edit: I still don't know what this is. Could someone who does leave a comment to enlighten the rest of us?

wmf · on June 14, 2023

This is code to initialize an Epyc server processor. If you combine openSIL and EDK2 you can create a completely open source "BIOS" for a server.

tester756 · on June 14, 2023

What is the difference between this and https://github.com/tianocore/edk2

panick21_ · on June 15, 2023

This is the UEFI boot flow:

https://raw.githubusercontent.com/tianocore/tianocore.github...

Currently AMD provides AGESA, and that is basically PEI part of the diagram. EDK2 is more like the DXE so on.

OpenSIL will now replace AGESA in all AMD products and make that low level open source and serve as the PEI part.

OpenSIL will also enable not just UEFI flow, but also Coreboot/Linuxboot and other open source boot alternatives.

wmf · on June 14, 2023

This is low-level code specifically to initialize AMD Epyc processors. EDK2 is higher-level processor-independent firmware code. You need both to build usable firmware.