Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Design of the Connection Machine (tamikothiel.com)
74 points by voxadam on Sept 28, 2023 | hide | past | favorite | 35 comments


I'm working on a CM-2 emulator that I plan on releasing next year. I have some Paris (CM-2 'asm') executables and the 'coldboot' program running.

The CM-2 is probably my favorite arch I've come across wrt to 1980s architectures that are truly different experiences to what can be achieved with hardware you can buy today. It's the closest thing I've found to a zachtronics game that someone had the cajones to actually build. Really really really fun to code for.

Each of the four execution units has a 16,384 bit wide, single cycle access data path to RAM. Put that in your HBM and smoke it.

If anyone has any questions I'd love to challenge my understanding and attempt to answer (just take some of my answers with a grain of salt, as I'm still reverse engineering).


I read Hillis' book on the Connection Machine, back when he first published it.

I was studying AI at university at the time, and had just discovered the back-propagation method for training neural nets. Thought CM might be good at that.

But any burly dot-product accelerator would be way more appropriate; that's why GPUs are a good fit for training and inference.

Connection Machine might be ideal for a Gelertner tuple-space memory.

Later, I worked at a biotech company that actually had a real Connection Machine; they used it as a huge sliding-window pattern matcher, to search patent filings for prior art. !! Searching for gene sequences in published literature. Holy moly... These days, a CPU would be fast enough; memory sizes are much greater now.


I was pleased that he called his write up on the industrial design for the computer, a case study.


mail me at gmail if you want to talk paris. I spent a few years mostly writing paris and a little cmis.

edit: oh, and I recently found the super secret cmis programming manual on line at https://www.rcsri.org/library/80s/CMIS-Reference-Manual.pdf


As a practical matter, you also have to include the floating point chips -- each attached to a 32-bit slice of that data path with a custom glue chip (the "SPRINT chip", which also allowed a form of indirect addressing); this hardware may have been nominally optional, but I don't think many machines were sold without it.

There was also lower-level programming than Paris -- CMIS was given to customers and was a compiler target; there was also lower-level microcode.


Obviously, please share when you do. Is the technical information needed readily available? Just curious.

I took a parallel computing class decades ago in college and we programmed a Connection Machine. I think a CM-2. Maybe CM-1? IIRC, we used Fortran with some extensions.

Was so cool and interesting. But never did touch one since…


> Obviously, please share when you do.

Totes, I've time boxed my work to the end of next year for releasing. I basically want to get a good conference talk out of it for ol' CV.

> Is the technical information needed readily available?

It's available if you're willing to dig for it. There are tape images on bitsavers for a sun4 frontend. While being failed reads, I wrote some custom tooling to extract what I can. It looks like the vast majority of the failures were in a couple starlisp images for which I have a newer copy. Beyond that, man pages, header files, prelinked static libraries with a bunch of symbols, microcode images, administration utilities, config files, and source to the kernel driver which exposes the control signals and command fifos to user space are all available.

That's in addition quite a bit of pdf documention.

So the problem is more than tractable, there's just work involved at this point.

The biggest piece missing is probably pictures of the processor, microcontroller, and backplane boards to better answer questions about the microcode. Everyone I've contacted with a CM-2 these days really only has the chassis and the blinkenlights. : /


Somewhere out there is video footage of Hillis and some MIT bods wiring up the CM-1. Pretty sure it appears in Adam Curtis' "All watched over" doco. Looks like they're using wirewrap tools and literally doing point-to-point.


Put me on the mailing list!


I'm curious, what was the typical high level language being used for programming the CM2? I've seen Fortan 90 referenced in mention of programming for the machine, but is there a surviving Fortran compiler available for the CM2 architecture that you have come across?


In general it was C*, but there was CM Fortran as well, which C* could call and which could call C* subroutines.

https://en.wikipedia.org/wiki/C%2A

http://people.csail.mit.edu/bradley/cm5docs/CM-5CStarUsersGu...

http://bitsavers.informatik.uni-stuttgart.de/pdf/thinkingMac...


Did it use Star Lisp as well?


What language are you using? I think about doing the same thing every now and then, no reading about it just daydreaming. I would think a BEAM language like Elixir might do the trick, a BEAM process for each processor, passing messages around.


Do you have starlisp running?


I haven't attempted to yet. Most of my efforts have focused on running from the Cstar environment running on a sun4 frontend since those binaries and headers are a bit more conducive to reverse engineering with modern tooling like ghidra. Or at least that's what fits my RE skill set better.

That being said I have a couple starlisp images for sun4 that should eventually run given the layer at which I'm targeting for the emulator. It's a.goal.to get them running, but hasn't reached the top of the priority queue yet.

FWIW, tensorflow/pytorch are remarkably close to the vague vibe of starlisp.


I agree on the “vibe” there. In the mid-90s when I was in high school I got access to a cm-2 to figure out how to transform sparse matrices using genetic algorithms on that massively parallel architecture. Good times!


Given Peter Norvig opinion on Python vs Lisp, and the way Python has been taken up in AI/ML stuff, in a way it is a kind of Lisp's revenge.

If only Python had the same machine code generation capabilities as Lisp based languages (in the reference implementation).


That was one of the strange highly-parallel machines of the 1980s-1990s. There were the NCube, the BBN Butterfly, the Transputer, and some other machines.

The NCube was another power of 2 machine, and could have up to 1024 CPUS, 64 per meter-square card. Each CPU was conventional, about 1 MIPS, with, I think, 128KB RAM. Stanford had one with one card of 64 CPUs. Well, 63; one was broken. It was a donation from an oil company that found it wasn't useful for seismic data processing. Nobody found a good use for it at Stanford, and it was donated to some other school.

Each CPU had a 0..1023 address. Message sending XORed the address of the source and destination, and the 0 bits told which paths would take the message closer to its destination. In theory there was supposed to be a path for each bit difference, like the Connection Machine, but actually there were some shared bus-type paths, I think. This allowed doing the whole thing with printed circuit cards and a printed circuit backplane, rather than a huge number of discrete wires. So the whole thing fit in a 1M cube.

With all these strange machines, the problem has to fit the machine, or you don't get the parallelism. Trying to fit existing problems into those forms was mostly a flop. Which is why general purpose shared memory multiprocessors with caches won out.

Massive parallelism seems to be problem-first. Graphics has a mostly-forward pipeline, so GPUs have a mostly-forward pipeline. Bitcoin miners have barely any intercommunication needs at all, so they're standalone things with minimal intercommunication. Back-propagation has a very standard form, and now we're seeing hardware that's sort of like a GPU but with lots of short multiply/add hardware.

What's striking is how much compute people have been able to get out of GPUs. They are sometimes the wrong tool for the job, but they're mass produced and affordable. The multi-million dollar one-off parallel machines, though, were mostly dead ends.


Yup, getting the the problem to fit the machine was hard. I thought NCUBE was better than the CM1 for doing actual work.

Some people resorted to just running each node separately, after the initial conditions were set up there was no need for communications. That was one way around having to be clever and adapt your algorithm. They would run for days, sometimes because nobody else was on the machines :)

At one point I got my 2D code running on a CM2 (IIRC), one of the nodes was bad and the Fortran log function always returned 0. So when I made the movie of my simulation there was one tile/rectangle that was just blank. Afterwards they asked me if they could use my program as a diagnostic tool to verify all the nodes were working correctly. :))

I moved one from parallel and supercomputers when I graduated. Don't miss them that much.


It is interesting you mention the log function being what tripped up the broken node. On the CM1 and CM2, log was implemented via Feynman's algorithm for logarithms, which finds factors via a shift and subtract and then adds precomputed logarithms of them via a hit to a table that is shared by all the processors. If one of the nodes had a bad route to that lookup table (however that was actually done), it would result in just that node failing to return a result for a log instruction.


Although there are already a couple of options, e.g. Futhark, compared with those machines, GPGPUs are still too much focused on low level programming practices.

I think we need more expressive ways, like StarLisp has being researched for.

Some of the current CUDA work in C++ libraries, or having a Python/Julia JITs to PTX, are already nice steps into that direction as well.


There is a great video where Danny Hillis describes their thinking behind the design of the CM-2, and how it influenced the design of the then still in development CM-5.

What's crazy is when he talked about "bits in the machine" that are not stored in RAM, but are stored in wires "in flight" across the room sized machine with one big synchronous clock.

https://www.youtube.com/watch?v=Ua-swPZTeX4


I've seen this before, long ago, and it was fun to revisit it.

He talks about how they are working on Teraflop-scale machines in their near future, and a Tflop-capable CM5 would take up 'about the size of this room'. In reality, the CM5/1024 only hit 131 gigaflops, and even that was enough to be the top of the TOP500 list at the time.

Wild that I have something capable of 35+ teraflops of FP32 sitting right here on my desk now.


Tamiko Thiel worked with Feynman and Hillis at thinking machines and is responsible, amongst many other things, for how cool the CM-1 and CM-2 looked.

Also responsible for the very cool CM t-shirts that are an essential for any geeks wardrobe.

https://www.tamikothiel.com/cm/cm-tshirt.html


I thought it (CM1) was a turd when I worked on it as a grad student. Looked cool though.

I would go to conferences where people would proudly present their algorithm getting 1MFLOP. Parallel programming was hard then and it is hard now. SIMD made it even harder.

CM2 was a little better since it was easier to get better performance, but we had some of the first machines and stability was not good. I think multiple sites where told they would have the first and TM shipped to them all in pieces simultaneously, so technically they all had the "first" machine. Smart guys.

Still CM-2 was better than the ETA-10 we had. Cray-2 was great.


It's fun reading this article about Richard Feynman and the connection machine:

https://longnow.org/essays/richard-feynman-connection-machin...


Cm-2 was my favorite computer ever


Would you care to elaborate on why?

I had to work on one and didn't really see what the fuss was about. Performance still sucked compared to a Cray2. I mean it was supposed to be a Supercomputer...

I/O was painfully slow so unless you were not planning on looking at your results or your results were a few numbers it may have been ok. We were doing 2d and 3d simulations and periodically outputting large amounts of data to make movies from the simulations. It would run for a few minutes and then stop for several times that to get the data out.


It couldn’t keep pace with a Cray for any tasks that weren’t optimized for its architecture. If your problem fit the SIMD model like the genetic algorithms I was implementing at the time it was just amazing.


Yeah, ours was a fluid dynamics problem that needed information from neighboring cells. So at some point there needed to be data moving between cells and that really killed performance. There was also a global value that need to be computed at each iteration that controlled the simulation speed to ensure the step sizes were not too big.

Also the file I/O was horribly slow. We where scaling and dumping raw data to files so what we could do post processing to make movies of the simulations.


Plus, beyond the fascinating architecture, it was a beautiful machine. The Cray-1 and its successors may occupy more historical memory (see: Sneakers (1992)) but the CM-1, CM-2, and to a lesser degree (in my ever so humble opinion), the CM-5 blow Cray's most iconic machines out of the water on the sexiness axis.


Physical appearance or architecturally?


Sadly I never even saw the CM-2 I used. Just got in via telnet.


If you're in the American Southeast, the Computer Museum of America in Roswell, GA has one on display. I've visited and seen it, along with other supercomputers.


Labor of love, masterpiece, historic, iconic work




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: