Hacker Newsnew | past | comments | ask | show | jobs | submit | stygiansonic's commentslogin

That paper doesn’t seem to be about security vulnerabilities in MiG but rather using it to improve workload efficiency


Wonder why they haven’t gotten an in house pizzeria yet to reduce the signal on this side channel leak


I have actually had pizza at the Pentagon. True, it was almost 30 years ago, and it tasted like federal-cafeteria pizza, but it was edible and I'm still alive.


I had that pizza every Friday throughout elementary school


I think the signal itself is pretty much just noise. If you're scheming against Pentagon you'd assume they're always working hard anyway.


Because with their budget they can afford to induce artificial demand and thus exert control over the signal, fooling adversaries in the process.


I doubt that's it. Artificial demand just adds noise to the signal, it doesn't eliminate it. It seems more likely that they've just decided that knowing the Pentagon is working on something without additional details isn't a very useful signal for adversaries.


I was about to make this same comment - this data might've been more useful for say, the soviets, when they were the only major threat that the US was actively dealing with, so they could have some guarantee that if they spotted a ton of pizzas being ordered to the pentagon, they could be fairly sure it would be something to relevant to them.


One of the examples was the night before the '91 Desert Storm started. For those that weren't around, there was a huge build up operation called Desert Shield and only became Desert Storm when they started shooting. It's not like that was a secret, and the Iraqis could have seen this data and not be surprised when the bombs start falling immediately after the pizza surge.

If you were bin Laden, maybe you might not have been caught unawares helicopters were about to crash in your garden. It's not like you didn't know they were looking for you.

I can't think of someone that the Pentagon or other agencies that this applies to that their adversaries would not know they were the adversary. This might be more relevant than you might think


Sounds about right.

Plus when it’s time to go Mutually Assured Destruction, ie to those capable of attacking the US mainland, I’m going to assume they would be doing all this from whatever the successor to Mt Weather is.


You have an awful lot of faith in an institution that seems to put its ass out in the wind on a regular basis


why not both


They do[1], but only open on weekdays and closes at 1600.

[1] https://www.mosaicpizzacompany.com/washington-dc/


Sbarro closed


From the article it appears to be something they invented:

> Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage.

Like you I’m also interested in the architectural details. We can speculate but we’ll probably need to wait for some sort of paper to get the details.


Great article and nice explanation. I believe this describes “Algorithm R” in this paper from Vitter, who was probably the first to describe it: https://www.cs.umd.edu/~samir/498/vitter.pdf


That paper says “Algorithm R (which is a reservoir algorithm due to Alan Waterman)” but it doesn’t have a citation. Vitter’s previous paper https://dl.acm.org/doi/10.1145/358105.893 cites Knuth TAOCP vol 2. Knuth doesn’t have a citation.


Knuth also says that "Algorithm R is due to Alan G. Waterman", on TAOCP vol 2 page 144, just below "Algorithm R (Reservoir sampling)". This blog post seems to be a good history of the algorithm: https://markkm.com/blog/reservoir-sampling/ (it was given by Waterman in a letter to Knuth, as an improvement of Knuth's earlier "reservoir sampling" from the first edition).

> All in all, Algorithm R was known to Knuth and Waterman by 1975, and to a wider audience by 1981, when the second edition of The Art of Computer Programming volume 2 was published.


Interesting! If Knuth is not the original author then they’ve been lost to the sands of time


The article mentions this union, not sure if it meets your definition of success: https://www.alphabetworkersunion.org/our-wins


Elseforum, I've debated this. I consider it to be more of an inside company lobbying group than a union. In particular, they have no ability to collectively bargain for a contract. None of their wins are things that have been able to be put into a contract.

Furthermore, some of the issues they've brought up have been things that are... not contractural and rather political. For example https://www.alphabetworkersunion.org/press/ceasefire-demand ... while it is ok for an organization to have opinions, things that are not about the contract that the worker has with the company gets into... well... political issues and that can hinder the ability for the group to get a majority representation and be able to do the things with contracts.


All that union managed to do is get various Google contractors fired for unionizing.


When subtlety proves too constraining, competitors may escalate to overt cyberattacks, targeting datacenter chip-cooling systems or nearby power plants in a way that directly—if visibly—disrupts development. Should these measures falter, some leaders may contemplate kinetic attacks on datacenters, arguing that allowing one actor to risk dominating or destroying the world are graver dangers, though kinetic attacks are likely unnecessary. Finally, under dire circumstances, states may resort to broader hostilities by climbing up existing escalation ladders or threatening non-AI assets. We refer to attacks against rival AI projects as "maiming attacks."


Sorry to hear this

A lot of my teenage years were spent building and playing with PCs and a lot of the knowledge and interest came from reading each and every issue of boot and maximum pc


+1

Jumping into an unknown codebase (which may be a library you depend on) and being able to quickly investigate, debug, and root cause an issue is an extremely invaluable skill in my experience

Acting as if this isn’t useful in the real world won’t help. The real world is messy, documentation is often missing or unreliable, and the person/team who wrote the original code might not be around anymore.


I wrote about something similar, which was motivated by an issue I saw caused by an (incorrect) expectation that a Java hashmap iteration order would be random: https://peterchng.com/blog/2022/06/17/what-iteration-order-c...


Yeah, ops comment makes it seem like they are building racks of RTX 4090s, when this isn’t remotely true. Tensor Core performance is far different on the data center class devices vs consumer ones.


They are building racks of 4090s. Nobody can get H100s in any reasonable volume.

Hell, Microsoft is renting GPUs from Oracle Cloud to get enough capacity to run Bing.


There are apparently some 400 of H100s sitting idle somewhere upthread. Yes, I'm having hard time imagining how's that possible too.


Who is "they"?

RTX 4090s are terrible for this task. Off the top of my head:

- VRAM (obviously). Isn't that where the racks come in? Not really. Nvidia famously removed something as basic as NVLink between two cards from the 3090 to the 4090. When it comes to bandwidth between cards (crucial) even 16 lanes of PCIe 4 isn't fast enough. When you start talking about "racks" unless you're running on server grade CPUs (contributing to cost vs power vs density vs perf) you're not going to have nearly enough PCIe lanes to get very far. Even P2P over PCIe requires a hack geohot developed[0] and needless to say that's umm, less than confidence inspiring for what you would lay out ($$$) in terms of hardware, space, cooling, and power. The lack of ECC is a real issue as well.

- Form factor. Remember PCIe lanes, etc? The RTX 4090 is a ~three slot beast when using air cooling and needless to say rigging up something like the dual slot water cooled 4090s I have at scale is another challenge altogether... How are people going to wire this up? What do the enclosures/racks/etc look like? This isn't like crypto mining where cheap 1x PCIe risers can be used without dramatically limiting performance to the point of useless.

- Performance. As grandparent comment noted 4090s are not designed for this workload. In typical usage for training I see them as 10-20% faster than an RTX 3090 at much higher cost. Compared to my H100 with SXM it's ridiculously slow.

- Market segmentation. Nvidia really knows what they're doing here... There are all kinds of limitations you run into with how the hardware is designed (like Tensor Core performance for inference especially).

- Issues at scale. Look at the Meta post - their biggest issues are things that are dramatically worse with consumer cards like the RTX 4090, especially when you're running with some kind of goofy PCIe cabling issue (like risers).

- Power. No matter what power limiting you employ an RTX 4090 is pretty bad for power/performance ratio. The card isn't fundamentally designed for these tasks - it's designed to run screaming for a few hours a day so gamers can push as many FPS at high res as possible. Training, inference, etc is a different beast and the performance vs power ratio for these tasks is terrible compared to A/H100. Now lets talk about the physical cabling, PSU, etc issues. Yes miners had hacks for this as well but it's yet another issue.

- Fan design. There isn't a single "blower" style RTX 4090 on the market. There was a dual-slot RTX 3090 at one point (I have a bunch of them) but Nvidia made Gigabyte pull them from the market because people were using them for this. Figuring out some kind of air-cooling setup with the fan and cooling design of the available RTX 4090 cards sounds like a complete nightmare...

- Licensing issues. Again, laying out the $$$ for this with a deployment that almost certainly violates the Nvidia EULA is a risky investment.

Three RTX 4090s (at 9 slots) to get "only" 72GB of VRAM, talking over PCIe, using 48 PCIe lanes, multi-node over sloooow ethernet (hitting CPU - slower and yet more power), using what likely ends up at ~900 watts (power limited) for significantly reduced throughput and less VRAM is ridiculous. Scaling the kind of ethernet you need for this (100 gig) comes at a very high per-port cost and due to all of these issues the performance would still be terrible.

I'm all for creativity but deploying "racks" of 4090s for AI tasks is (frankly) flat-out stupid.

[0] - https://github.com/tinygrad/open-gpu-kernel-modules


> The RTX 4090 is a ~three slot beast when using air cooling and needless to say rigging up something like the dual slot water cooled 4090s I have at scale is another challenge altogether... How are people going to wire this up? What do the enclosures/racks/etc look like?

A few years ago, if you wanted a lot of GPU power you would buy something like [1] - a 4/5U server with space for ten dual-slot PCIe x16 cards and quadruple power supplies for 2000W of fully redundant power. And not a PCIe riser in sight.

I share your scepticism about whether it's common to run >2 4090s because nvidia have indeed sought to make it difficult.

But if there was some sort of supply chain issue that meant you had to, and you had plenty of cash to make it happen? It could probably be done.

Some of the more value-oriented GPU cloud suppliers like RunPod offer servers with multiple 4090s and I assume those do something along these lines. With 21 slots in the backplane, you could probably fit 6 air-cooled three-slot GPUs, even if you weren't resorting to water cooling.

[1] https://www.supermicro.com/en/products/system/4U/4028/SYS-40...


> but deploying "racks" of 4090s for AI tasks is (frankly) flat-out stupid.

You seem to be trapped in the delusion that this was anyone's first, second, or third choice.

There is workload demand, you can't get H100s, and if you don't start racking up the cards you can get the company will replace you with someone less opinionated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: