I once bought a far larger supercomputer. It was 1/8 (roughly) of ASCI Blue Mountain. 72 racks. Commissioned in 1998 as #1 or #2 on the TOP500, officially decommissioned in 2004, purchased my 1/8 for $7k in ~2005.
Moving 72 racks was NOT easy. After paying substantial storage fees, I rented a 1500sf warehouse after selling off a few of them and they filled it up. Took a while to get 220V/30A service in there to run just one of them for testing purposes. Installing IRIX was 10x worse than any other OS. Imagine 8 CD's and you had to put them each in 2x during the process. Luckily somebody listed a set on eBay. SGI was either already defunct or just very unfriendly to second hand owners like myself.
The racks ran SGI Origin 2000s with CRAYlink interlinks. Sold 'em off 1-8 at a time, mainly to render farms. Toy Story had been made on similar hardware. The original NFL broadcasts with that magic yellow first down line were synthesized with similar hardware. One customer did the opening credits for a movie with one of my units.
I remember still having half of them around when Bitcoin first came out. It never occurred to me to try to mine with them, though I suspect if I'd been able to provide sufficient electrical service for the remainder, Satoshi and I would've been neck-and-neck for number of bitcoins in our respective wallets.
The whole exercise was probably worthwhile. I learned a lot, even if it does feel like seven lifetimes ago.
Wow, that's ridiculous. I bought two racks of Origin2000 with a friend in high school and that was enough logistic overhead for me! I can't imagine 72 racks!!
Installing IRIX doesn't require CDs; it's much, much easier done over the network. Back in the day it required some gymnastics to set up with a non-IRIX host, now Reanimator and LOVE exist to make IRIX net install easy. There are huge SGI-fan forums still active with a wealth of hardware and software knowledge - SGIUG and SGInet managed to take over from nekochan when it went defunct a few years ago.
I have two Origin 350s with 1Ghz R16ks (the last and fastest of the SGI big-MIPS CPUs) which I shoehorned V12 graphics into for a sort of rack-Tezro. I boot them up every so often to mess with video editing stuff - Smoke/Flame/Fire/Inferno and the old IRIX builds of Final Cut.
I think that by the time Bitcoin came out, Origin2000s would have been pretty majorly outgunned for Bitcoin mining or any kind of compute task. They were interesting machines but weren't even particularly fast compared to their contemporaries; the places they differentiated were big OpenGL hardware (InfiniteReality) with a lot of texture memory (for large-scale rendering and visualization) and single-system-image multiprocessor computing (NUMAlink), neither which would help for coin mining.
Nodes: 4,032 dual socket units configured as quad-node blades
Processors: 8,064 units of E5-2697v4 (18-core, 2.3 GHz base frequency, Turbo up to 3.6GHz, 145W TDP)
Total Cores: 145,152
Memory: DDR4-2400 ECC single-rank, 64 GB per node, with 3 High Memory E-Cells having 128GB per node, totaling 313,344 GB
Topology: EDR Enhanced Hypercube
IB Switches: 224 units
Moving this system necessitates the engagement of a professional moving company. Please note the four (4) attached documents detailing the facility requirements and specifications will be provided. Due to their considerable weight, the racks require experienced movers equipped with proper Professional Protection Equipment (PPE) to ensure safe handling. The purchaser assumes responsibility for transferring the racks from the facility onto trucks using their equipment.
Given that the individual nodes are just x86_64 Xeons and run linux... it would be interesting to part it out for sale as individual, but functional, nodes to people. There are a lot of people would like to have a ~2016 era watercooled 1U server from a supercomputer that was once near the top of the Top500 just to show to people.
Get little commemorative plaques for each one and sell for $200 each or so.
edit:
it seems each motehrboard is a dual CPU board and so there are 4032 nodes, but the nodes are in blades that likely need their rack for power. But I think individual cabinets would be cool to own.
There are 144 nodes per cabinet... so 28 cabinets.
I'd pay a fair amount just to own a cabinet to stick in my garage if I was near there.
The individual servers are not watercooled. The compute racks are air-cooled; the adjacent cooling racks then exchange that heat using the building's chilled water. It's the rack as a whole that is watercooled. If you extract a single node, you won't get any of that. As the other commenters also point out, these are blades; you can't run an individual node by itself.
Going off one listing for a E5-2697v4, $50 with free shipping, 386 already sold.
If you figure after the double-dipping of eBay/Paypal and then shipping fees, that's ~$30 profit per CPU.
8024 x 30 = 241,920 USD. Not too shabby for what's got to be some weeks/months of work. You could probably assume that they can sell or scrap the rest of it for a bit more as well, minus the fees for storage and moving company.
I have a couple servers with this exact CPU that I run for a mixture of practical and sentimental reasons. I bought them off eBay and only after purchase discovered they were a piece of history. They have a second life testing GPU libraries for Debian from a rack in my basement.
For privacy reasons, I won't say who originally owned the servers, but they had a cool custom paint job and were labelled YETI1 and YETI2. If the original owner is on HN, perhaps they will recognize the machines and provide more information.
If somebody has the money, and the resources required to house the system, seems like decent value for the money at the reserve (apparently $100k) The range of, I'd buy if I had the money and a valid use case. Even partial resale like suggested. There's an argument about being behind the "most" advanced. Yet, its a petascale server from 2016 that was good enough for the NSF. It's not exactly an old dog.
The stuff I could quickly find was $1M+ in individual sales (not sure about auction sites and used).
DDR4 RAM $734,500 @ $75 / 32 GB for 313,344 GB
CPUs $484,000 @ $60 / (1) E5-2697v4 for 8,064 units
Figure the hard drives must also be enormous, and probably a huge amount of storage, just not sure on the quantity, and they may not be sold with the system.
They're not kidding on the moving company, that's 10's of tons of computer to haul somewhere. 26 1U Servers @ 2500 lbs each. Couple semi-trucks, since the most they can usually do is 44,000 lbs on US highways. Not sure if the E-Cell weight was 14 @ 1500 lbs in addition to the 26 @ 2500 lbs?
Can you imagine the RAMDisk? Yes, you can. Especially in 20 years when it will be the norm. And also the Windows version that will require half of it in order to run /s
For those curious, Cheyenne is a supercomputer from 2016/2017 that launched on the 20th spot in the top500 super computers. It was decommissioned in 2023 after pandemic lead to a two year operation extension.
It has a peak compute of 5.34 petaflops, 313TB of memory, and gobbles 1.7MW.
In comparison, 18 A100 GPUs would have 5.6 petaflops and 1.4 TB vram, consuming 5.6 kw.
The speed of processing and interconnect is orders of magnitude faster for an A100 cluster - 1 8 gpu pod server will cost around $200k, so around $600k more or less beats the supercomputer performance (price I'm searching seems wildly variable, please correct me if I'm wrong.)
Thanks, glad you guys caught that - could be generous and allow the tensor core tflops, since you'd more than likely be using a100 pods for something cuda optimized, in which case 19.5 tflops fp64 at peak per GPU, roughly 267 would be needed, or 34 pods, at $6.8 million, with 21.76 TB vram and 81 kw power consumption.
I'm an amateur, but I have code that I think could probably dispatch threads pretty efficiently on the Cheyenne thru it's management system simply because it's all xeons distributed. If I can run it on my personal 80-core cluster, I could get it to run on Cheyenne back then.
But hitting the roofline on those AMD GPGPU's? I'd probably get nowhere fucking close.
That is the thing that Cheyenne was built for. People doing CFD research with x86 code that was already nicely parallelized via OpenMPI or whathaveyou.
I used to build small clusters and use supercomputers and I can't imagine it's fun to build a super computer. It requires a massive infrastructure and significant employee base, and individual component failures can take down entire jobs. Finding enough jobs to keep the system loaded 24/7 while also keeping the interconnect (which was 15-20% of the total system cost) busy, and finding the folks who can write such jobs, is not easy. Even then, other systems will be constantly nipping at your heels with newer/cheaper/smaller/faster/cooler hardware.
Thanks for the feedback. You make a lot of good points. I've built a 150,000 GPU system previously, but it was lower end hardware. It was a lot of fun to make it run smoothly with its own challenges.
It doesn't take a lot of employee's, we did the above on essentially two technical people. Those same two are working on this business.
Finding workloads/jobs is definitely going to be an interesting adventure, that said, the need for compute isn't going away. By offering hard to get hardware at reasonable rates and contract lengths, I believe we are in a good position on that front, but time will tell.
We are only buying the best of the best that we can get today. The plan is to continuously cycle out older hardware as well as not pick sides on one over another. This should help us keep pace with other systems.
150K GPU with two people... presumably, 8 GPU/host, you had close to 20K servers.
I can't really see how that's achievable with only two people, given the time to install hardware, maintain it, deal with outages and planned maintainence and testing, etc. Note: I worked at Google and interfaced with hwops so I have some real-world experience to compare to.
Building a 150K GPU system without a well-understood customer base seems a bit crazy to me. You will either become a hyperscale, serve a niche, or go out of business, I fear.
The ASRock BC-250's we deployed were 12 individual blades and those were all PXE booted. We deployed 20,000 of those blades across 2 data centers. This was a massive feat of engineering, especially during covid where I couldn't even access the machine directly. Built a whole dashboard to monitor it all too.
I know, I can't believe we did it either, but we did. Software automation was king. I built a single binary that ran on each individual host and knew how to self configure / optimize everything. Idempotently. Even distributing upgrades to the binary was a neat challenge that I solved perfectly, in very creative ways.
Today, we are starting much smaller. Literally from zero/scratch. Given the cost of MI300x, I doubt we will ever get to 150k GPUs, that's an absurd amount of money, but who knows.
But who did the wiring? Even with blades which consolidate much of the cabling, there's still a tremendous amount of work to build the interconnect. On typical large systems I've seen a small team 3-5 guys working weeks+ to wire a modest DC.
We'd hire the initial deployment out to temporary contractors. It just took a few weeks to get a large deployment out. The hard part was the 12 GPUs needed to be inserted at the DC, which took a bunch of effort. Once it was done we generally had 1-2 people on the ground in the data centers to deal with breakfixes. Either contractors or supplied by the DC.
For this venture, again, we are starting small, so we are just flying to the DC and doing it ourselves. There are also staff there that are technical enough to swap stuff out when we need it. The plan will be to just hire one of their staff as our own.
I don't think we will make it for this next deployment due to time constraints, but ideally in our near future, we will go full L11. Assemble and ship out full racks at the manufacturer/VAR, bolt em down, wire them up and ready to go. That is my dream... we will see if we get there. L11 is hard cause a single missing cable can hold up an entire shipment.
I just realized we had this same conversation on HN before. IIRC I said last time and I'll repeat: if you say that you set up 15K GPUs with 2 people, and I ask who did the wiring, and you say an external company came in and spent a few weeks wiring the network for you, then you can't say that 2 people set up 15K GPUs. You're trying to externalize real costs (both time and money).
I understand your dream (having pursued similar ideas) but I think you have ot be realistic about the effort required, especially when you add picky customers to the mix.
Also, supercomputers usually use general-purpose nodes supported by many standard tools, multiple methods of parallelization, and (for open standards) maybe multi-vendor. I imagine this one is much more flexible than A100’s.
weather forecasting is actually moving to reduced precision. none of the input data is known to more than a few digits, and it's a chaotic system so the numerical error is usually dominated by the modeling and spacial discretization error
It's really hard to find a home for large, old, high-maintenance technology. What do you do with a locomotive, or a Linotype? They need a support facility and staff to be more than scrap. So they're really cheap when available.
The Pacific Locomotive Association is an organization with that problem. About 20 locomotives, stored at Brightside near Sunol. They've been able to get about half of them working. It's all volunteer. Jobs that took days in major shops take years in a volunteer operation.
At the ill fated Portland TechShop I took woodworking classes from a retired gentleman, who professionally was a pattern maker for molding cast metal parts. This made his approach to woodworking really interesting. He had a huge array of freestanding sander machines, including a disc sander with more than a yard diameter.
For anyone unfamiliar, pattern makers would make wooden model versions of parts that were to be cast in metal. The pattern would be used to make the mold. He could use these various sanding machines to get 1/64" precision for complex geometries. It was fascinating to watch how he approached things, especially in comparison to modern CNC.
His major project outside of teaching the classes? Making patterns for a local steam locomotive restoration project. He had all these wooden versions of various parts of a locomotive sitting around.
Does 1/64" precision really mean anything in wood, where small fluctuations in air moisture can cause > 1/64" distortion? I guess it's OK if you stay within a climate controlled area.
So he would build parts by first making an oversize rough blank of bonded layers of marine grade plywood in a big press. Then he'd rough cut it various ways on a big band saw. Then he'd work his way through using all the sanders to slowly approach the net shape. He used precision squares to measure bigger stuff and calipers for smaller stuff.
I can't tell you the exact stability of marine grade plywood, but I know it's about as good as you can get for a wooden material, and I doubt he'd go to the effort of such precise measurements if it didn't matter.
Plywood is good for dimensional stability, but I'm pretty sure all this work must have been done in and around a toolroom with stable moisture content or the part was used immediately and then consumed/destroyed before it "moves" too much. However, he sounds like he's pretty knowledgeable so I'm going to guess this isn't just garage woodworking where 1/32" doesn't really matter when the wood is going to shrink/expand by 5-10% over the course of a year.
So the patterns once done would be taken to a foundry where they'd be used to make molds. I'm not totally up to speed on that process but I know it involves surrounding the pattern with a combination of refractory sand and binder. Where it gets tricky is complex parts that have multiple cores and so on.
And yeah, this guy was retired at the time but he'd been doing it for like 50 years. I'm very sure he knew what did and did not matter.
Funny, just yesterday I saw here in Germany a train where the locomotive was labeled 'rent me'. Apparently there's an organization which bought old locomotives (this particular one looked like 60s, maybe early 70s vintage) from the state-run former monopoly to rent them out (large industrial customers I'd think).
I was surprised to see such an old locomotive in operation, but apparently it's still good (i.e. economic) enough to shuttle cars from factory to port. Guess the air pollution restrictions aren't all that tough for diesel trains.
> the system is currently experiencing maintenance limitations due to faulty quick disconnects causing water spray. Given the expense and downtime associated with rectifying this issue in the last six months of operation, it's deemed more detrimental than the anticipated failure rate of compute nodes.
Even the RAM has aged out...
Very hard to justify running any of this; newer kit pays for itself in reduced power and maintenance quick in comparison.
nah, i already have far more junk computers than i need.
I lusted after a Cray T3E once that I coulda had for $1k and trucking it across TN and NC; but even then I couldn't have run it. I'm two miles away from 3 phase power and even then couldn't have justified the power budget. At the time a slightly less scrap UltraSPARC 6k beat it up on running costs even with higher initial costs so i went with that instead. I did find a bunch of Alphas to do the byte swizzling tho. Ex "Titanic" render farm nodes.
I've been away from needing low budget big compute for a while, but having spent a few years watching the space i still can't help but go "ooo neat" and wonder what i could do with it.
Never understood why can you bid below the reserve price, or more like why the reserve price is hidden because the whole point that they (the seller) have a price in mind they are not willing to go below.
> a price in mind they are not willing to go below
I worked for an auction, and sellers accepted bids below the reserve price all the time. They just want to avoid a situation where an item sells at a “below market” price due to not having enough bidders in attendance - e.g. a single bidder is able to win the auction with a single lowball bid. If they see healthy bidding activity that’s often sufficient to convince them to part with the item below reserve.
Reserve prices are annoying for buyers, but below-reserve bids can provide really useful feedback for sellers.
We even had full-time staff whose job was to contact sellers after the auction ended and try to convince them to accept a below-reserve bid, or try to get the buyer and seller to meet somewhere in the middle. This worked frequently enough to make this the highest ROI group in our call center.
It is playing on the psychology of the bidders. You want the bidders to be invested, to want to win the auction. To compete to win the prize.
Also, consider this: if the reserve is too high, and no one bids on it, then everyone looking at it is going to wonder what it is really worth. If there are several other bidders, then that gives reassurance to the rest for the price they each are bidding.
It's entirely because of human nature - you want people to get invested in it, which having them bid any amount does.
It's the same reason an auction can go above the price/value of the thing, because you get invested in your $x bid, so $x+5 doesn't seem like paying $x+5, but instead "only $5 more to preserve your win" type of thing.
Man, if I had the space, the money, and the means of powering it I would bid on this immediately. It's so damn cool and will likely end up selling for a lot less than its worth due to its size.
I've always been fascinated by the supercomputer space, in no small part because I've been sadly somewhat removed from it; the SGI and Cray machines are a bit before my time, but I've always looked back in wonder, thinking of how cool they might have been to play with back in the 80s and 90s.
The closest I get to that now is occasionally getting to spin up some kind of HPC cluster on a cloud provider, which is fun in its own right, but I don't know, there's just something insanely cool about the giant racks of servers whose sole purpose is to crunch numbers [1].
[1] To the pendants, I know all computers' job is to crunch numbers in some capacity, but a lot of computers and their respective operating systems like to pretend that they don't.
> The Cheyenne supercomputer at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne, Wyoming began operation as one of the world’s most powerful and energy-efficient computers. Ranked in November 2016 as the 20th most powerful computer in the world[1] by Top500, the 5.34-petaflops system[2] is capable of more than triple the amount of scientific computing[3] performed by NCAR’s previous supercomputer, Yellowstone. It also is three times more energy efficient[4] than Yellowstone, with a peak computation rate of more than 3 billion calculations per second for every watt of energy consumed.[5]
My favorite part of SGI computers, like Altix and UV lines, was the NUMA memory with flexible interconnect. NUMA let you program a pile of CPU’s more like a single-node, multithreaded system. Then, the flexibility let you plug in CPU’s, graphics cards, or FPGA’s. That’s right into the low-latency, high-speed, memory bus.
There was a company that made a card that connected AMD servers like that. I don’t know if such tech ever got down to commodity price points. If you had Infiniband, there were also Distributed, Shared Memory (DSM) libraries that simulated such machines on clusters. Data locality was even more important then, though.
Cray XT/SeaStar? iirc the interconnect ASIC pretends to be another peer CPU connected via HyperTransport. HPE Flex is similar, but works via QPI/UPI for Intel CPUs.
Despite being SGI branded hardware, this was after the name was bought by Rackable and IIRC, SGi hardware became essentially rebadged Rackable hardware. Still cool, but not as cool as some custom MIPS hardware from old SGI.
It's just not economical to run these given how power inefficient they are in comparison to modern processors.
This uses 1.7MW, or $6k per day of electricity. It would take only about four months of powering this thing to pay for 2000 5950X processors. Those would have a similar compute power to the 8000 Xeons in Cheyenne but they'd cost 1/4 the power consumption.
If you can get 1.7MW service, then you are paying utility rates, or around 100-150 per MWh, or as you quoted 4-6k per day. In Seattle, running this off peak would cost one about 104$/hr before the other fees.
I would be neat if a subset of this could be made operational and booted once in awhile a computer history museum. I agree that it doesn't make sense to actually run it.
There's a whole sub-industry of people bidding on government auctions in order to part out the stuff. I'd be pretty surprised if the whole cluster got reassembled. But people on a budget will buy those compute nodes, someone trying to keep their legacy IB network will snap up those switches, the racks, etc.
r/homelab will have a field-day getting those nodes up, some people will want just one for practicality, some people will want at least a couple and a IB switch just for the novelty of it.
I can imagine it, some people on there are ridiculous, but yeah in my experience these supercomputer nodes are a lot more integrated/proprietary than most standard server hardware. It's not straightforward to just boot one up without all the support infrastructure. I'd assume they'd mostly be torn down and parted out.
You might be surprised - because they're pretty custom they are often "more open" than you might expect; as long as you have the connectors you can often get things running something. Sometimes they have bog-standard features present on the boards, just not enabled, etc.
It's the commoditized blade servers, etc that are stripped down to what they need to run and nothing more.
Oh I'm speaking from experience with the SGI supercomputer blades. They're pretty wacky, 4x independent, dual cpu boards per blade and all sorts of weird connectors and cooling and management interfaces. Custom, centralized liquid cooling that requires a separate dedicated cooling rack unit and heat exchanger, funky power delivery with 3 phase, odd networking topologies, highly integrated cluster management software to run them etc. I'm not sure if they have any sort of software locks on top of that, but I would bet they do and presumably NCAR wipes all of them so you likely won't have the software/licenses.
I dug up a link to some of the technical documentation https://irix7.com/techpubs/007-6399-001.pdf . Probably someone can get it working, but I imagine whoever is going to go through the hassle of buying this whole many-ton supercomputer is planning to just strip it down and sell the parts.
Yeah the licensing is often the stumbling block, unless you can just run some bog-standard linux on it. It sounds like this might be custom enough that it would be difficult (but I daresay we'll see a post in 5 years from someone getting part of it running after finding it on the side of the road).
Ultimately SGI was running Linux and AFAIK the actual hardware isn't using any secret sauce driver code, so yeah if you can get it powered on without it bursting in flames and get past the management locks you can probably get it working. It's definitely not impossible if you can somehow assemble the pieces.
Crypto is no longer mined commercially with GPU type compute. When ETH switched to PoS, it decimated the entire GPU mining industry. It is no longer profitable. The only people doing it now are hobbyists.
Sure, but you can get (much) better price-performance-power out of a CPU which isn't approaching a decade old, when mining ASIC and GPU-resistant cryptocurrencies like Manero. I don't know it'd be worth the effort to buy E5-2697-v4 CPUs which are running in such a specialized configuration over and above AMD Ryzen or EPYC CPUs in commodity, inexpensive mainboards.
any I've dealt with definitely wouldn't touch the 'you need to hire professional movers costing you $10k's of dollars to get it out of the facility' stipulation - they seem to prefer the 'where's the location of the storage shed' situation.
There's 8,064 E5-2697v4's in this. Those go on ebay for ~$50/ea. That's $400,000 of just CPUs to sell.
If the winning bid is $100k, you spend $40k to move it out of there, another $10k warehousing it while selling everything on ebay, and you're still up $250k on the processors alone.
I presume no one is building new motherboards for those processors either. While there is old stock laying around, you really need to run those systems close to as-is for them to be useful.
These are high spec cpus for the socket though. Lots of room for people with compatible boards that want to upgrade.
There's a lot of low budget hosting with old Xeon systems (I'm paying $30/month for a dual westmere system; but I've seen plenty of offers on newer gear); you can still do a lot with an 18 core Broadwell, if the power is cheap.
It's for the processors alone - a scrapping company dedicated to this stuff would be able to actualize more from other components - and they often have warehouse space available that they already own.
Let's come back and see if the auction failed; I doubt it will.
I ended up reading all the documentation provided alongside the auction because I was debating making a bid. I decided not to in the end.
In total, Cheyenne weighs in around ~95000lbs in weight, they require it be broken down in <= 3 business days on site and only offer 8AM - 4PM access. Therefore you'll need to support unbolting and palletizing all the cabinets for loading into a dry trailer, get a truck to show up, load, then FTL to the storage (or reinstall) location of your choosing for whatever's next.
Requirements for getting facility access seem to mean you bring both labor and any/all specialized equipment including protective equipment to not damage their floors while hauling away those 2400lb racks. Liability insurance of $1-2M across a wide range of categories (this isn't particularly unusual, or expensive, but it does mean you can't just show up with some mates and a truck you rented at Home Depot).
I'd guess you're looking at more like $25k just to move Cheyenne via truck, plus whatever it sells for at auction, unless you are located almost next door to them in Wyoming. Going from WY to WA the FTL costs alone would be $6-7k USD just for the freight cost (driver, dry trailer and fuel surcharges), add a bit if your destination doesn't have a loading dock and needs transloading for the last mile to >1 truck (probably) equipped with a tail lift. All the rest is finding a local contractor with semi-skilled labor and equipment to break it down and get it ready to load on a truck and covering the "job insurance".
Warehouse costs if you're breaking it for parts won't be trivial either and you'd be flooding the used market (unfavorably to you) were you to list 8000 CPUs or 300TB DDR4; it could take months or even to clear the parts without selling at a substantially depressed price.
It will take probably several thousand hours in labor to break this for parts assuming you need to sell them "Tested and Working" to achieve the $50/CPU unit pricing another commenter noticed on eBay, and "Untested; Sold as Seen" won't get anywhere near the same $/unit (for CPUs, DRAM, or anything else) and so even assuming $25/hr for fully burdened relatively unskilled labor, you could well be talking up to $100k in labor to break, test and list Cheyenne's guts on eBay or even sell onward in lots to other recyclers / used parts resellers. I don't think I could even find $25/hr labor in WA capable and willing of undertaking this work, I fear it'd be more like $45-60/hr in this market in 2024 (and this alone makes the idea of bidding unviable).
A lot of the components like the Cooling Distribution Units are, in my opinion, worth little more than scrap metal unless you're really going to try and make Cheyenne live again (which makes no sense, it's a 1.7MW supercomputer costing thousands of dollars per day to power, and which has similar TFLOPs to something needing 100x less power if you'd bought 2024-era hardware instead).
Anything you ultimately can't sell as parts or shift as scrap metal you are potentially going to have to junk (at cost) or send to specialist electronics recycling (at cost); your obligations here are going to vary by location and this could also be expensive -- the leftovers.
If anyone does take the plunge, please do write up your experiences!
The heat's going to be leaving the building in liquid-filled pipes, however you architect it. And with 1.7MW of peak power consumption, a nontrivial amount of liquid.
It's just a question of whether you want to add air and refrigerant into the mix.
It seems they're decommissioning it partly due to "faulty quick disconnects causing water spray" though, so an air cooling stage would have had its benefits...
> The heat's going to be leaving the building in liquid-filled pipes
In the right climate, and the right power density, you can use outside air for cooling, at least part of the time. Unlikely at this scale of machine, but there was a lot of work towards datacenter siting in the 2010s to find places were ambient cooling would significantly reduce the power needed to operate.
Has been really common in HPC for quite a while. I presume the higher interconnect/network of hpc favour the higher density of liquid cooling. Hardware utilization is also higher compared to normal datacenters, so the additional efficiency vs air cooling is more useful.
for large machines the air setup is really less efficient and takes up alot of space. you end up building a big room with a pressurized floor which is completely ringed by large ac units. you have to move alot of air through the floor bringing it up through the cabinets and back through to the acs. its also a big control systems problem, you need to get the air through the cabinets evenly, so you need variable speed fans or controlled ducts..and those need to be adaptive but not oscillate.
with a water cooled setup you can move alot more heat through your pipes just be increasing flow rate. so you need pumps instead of fans. and now your machine room isn't a mini-hurricane, and you can more flexibly deal with the waste heat.
Doesn't matter what conductor you use to move heat, the same amount of energy will have to be dispersed. And watercooling just implies more intermediate steps between the heatshield of the die and air. So I don't believe the heat signatures can really be helped.
The listing says that 1% of the nodes have RAM with memory errors. I assume this means hard errors since soft errors would just be corrected. Is this typical? Does RAM deteriorate over time?
I dunno, I still think the 2011-v3 platform that these Xeons can run in is still a great setup for a homelab. A bit power hungry but if you can build a dual core workstation with 36 cores, and 256GB ram for <$1000, that is a solid server. Sure, a bit more power hungry, but that would still make for a hell of an app/db server. That's basically an m4.16xlarge, same CPU generation, platform etc, (yes, without all the surrounding infra) that will cost you something like $0.10 per hour worth of energy to run.
Take the Dell T7910 for example (I use one of these for my homelab), you can pick up a basic one with low end CPU/RAM for sometimes as little as $300. Dumping all these 18 core E5s and DDR4 ECC on the market should make it even cheaper to spec out. Currently they go for about $100-150 each on the CPUs, and ~$150-200 for the RAM. Not bad IMO.
My answer was a bit tongue-in-cheek, but the reality is that it depends on what you mean by "market value."
For the market of the auction, the selling price is the actual market value. Likewise, it's typically not too far off the value of the item in the wider market, assuming you are comparing it to a similar item in similar condition. The problem is that for most items purchased at auction, there's no similar item, readily available, to compare it to.
I've won multiple items at machine-shop auctions for a small fraction of their "new" price. The problem with the comparison is that e.g., the Starrett dial test indicator that I got for $10, and the new one that retails for around $200 are hard to compare because there's no liquid market for 30-year-old measuring equipment. While it's adequate for my hobby machinist use, it wouldn't be acceptable in a precision shop since it has no calibration history.
If you find an item where you can reasonably compare apples to apples, e.g., a car, you see that the final price of a car at auction is usually pretty close to the price of the same make/model being sold on the open used market. The slightly lower price of the auction car reflects the risk of the repairs that might be needed.
It's exactly in this "now vs later" that resellers and other brokers sit. If they know that X will sell for $Y "eventually" and how long that eventually is, they can work out how much they can pay for it now and still come out ahead.
Cars are very liquid and move quickly, so the now vs later price is close; weird things that nobody has heard of (but when they need it, they need it NOW) will have a much wider variance.
Depends on the terms of the auction. If we take the California legal definition of Fair Market Value for real estate:
> The fair market value of the property taken is the highest price on the date of valuation that would be agreed to by a seller, being willing to sell but under no particular or urgent necessity for so doing, nor obliged to sell, and a buyer, being ready, willing, and able to buy but under no particular necessity for so doing, each dealing with the other with full knowledge of all the uses and purposes for which the property is reasonably adaptable and available.
A 7 day auction on a complex product like this may be a little short to qualify with the necessity clauses, IMHO; there's a bit too much time pressure, and not enough time for a buyer to inspect and research.
I think Auctions exist explicitly to potentially buy or sell an Item with a delta on it's Market Value? I.e. Buyers want the chance to buy below and sellers to sell above. Neither really wants to engage in the transaction at all in the reverse situation or even in the "Market Value" case. You would just make a direct sale and avoid the hassle of an auction.
Moving 72 racks was NOT easy. After paying substantial storage fees, I rented a 1500sf warehouse after selling off a few of them and they filled it up. Took a while to get 220V/30A service in there to run just one of them for testing purposes. Installing IRIX was 10x worse than any other OS. Imagine 8 CD's and you had to put them each in 2x during the process. Luckily somebody listed a set on eBay. SGI was either already defunct or just very unfriendly to second hand owners like myself.
The racks ran SGI Origin 2000s with CRAYlink interlinks. Sold 'em off 1-8 at a time, mainly to render farms. Toy Story had been made on similar hardware. The original NFL broadcasts with that magic yellow first down line were synthesized with similar hardware. One customer did the opening credits for a movie with one of my units.
I remember still having half of them around when Bitcoin first came out. It never occurred to me to try to mine with them, though I suspect if I'd been able to provide sufficient electrical service for the remainder, Satoshi and I would've been neck-and-neck for number of bitcoins in our respective wallets.
The whole exercise was probably worthwhile. I learned a lot, even if it does feel like seven lifetimes ago.