A lot of this era of FPGAs was about how to configure the chip- Nobody wanted to use Xilinx's one time programmable serial ROMs.
I used to use some one shots to generate CCLK pulse delayed from an RS-232 start bit. The idea is you would load one bit at time into the FPGA, one bit for each RS-232 byte. Once the FPGA was configured, a UART implemented in FPGA logic would come alive and allow it to communicate with a PC.
Anyway, I used xc2064s in the mid 90s for a project once (they were obsolete then, but I had a tube of them). I modified a point of sale terminal (a little credit-card machine) so that it could be automated. This was for a methadone clinic: they had to check each recipient's social security number every day to verify their medicaid eligibility. So the FPGA would simulate keypresses and read the vacuum fluerescent display back to the PC. (My consulting fee was much cheaper than the medicaid software interface.)
Later I used an XC4010E for a PCI-bus interface for a video capture card. Again, how do you avoid the serial PROM? In this case, I used timed-out PCI configuration accesses to load the serial data. Even if the PCI device does not respond, the address lines of the configuration access make it through to the target, so you just need a fast flip flop to latch DIN and CCLK. Once the FPGA was loaded, the card would appear on the PCI bus (but long after enumeration). This worked until PC BIOS started to turn off the PCI clock on slots with no cards in them to save power.
Both of these were in the pre-Verilog days. To design the FPGA you drew a schematic into (in my case) OrCAD. Xilinx had a meta-design capability in those days (called xblox I think?), where one wire in the schematic was equivalent to entire bus. The actual width was a parameter on a primitive. Yes, I made a PCI interface in schematics, it was not fun.
The thing I find frustrating about FPGAs is how they are stuck in the "ASIC-replacement" mind-set.
In principle, these things could change their programming on the fly, as fast as everything happening in the regular logic, but everybody thinks they have to be programmed at power-up and left that way.
If the programming bit-stream could be fed, separately, to each section of the die, under control of logic on other parts of the die, a system of totally adaptable logic would become possible. Different programming could be installed according to the needs of the moment, replaced with other logic the moment needs change.
It would be a project to figure out out to usefully program such a system, but we are up to it. Our phones adopt completely different usage each time we tap on an app icon, but the performance of apps running on CPUs is strictly limited.
Active partial reconfiguration is a thing and works pretty much as you say - some parts of the chip continue running while others are swapped out, and an example use case is video encoding, just swap in the encoder engine optimised to a given standard, rather than have every standard in the same design requiring a much larger device.
Thank you. Which devices support active partial configuration? I have not encountered any mention of it, before. The "Three Ages of FPGAs" paper in IEEE has "reconfiguration" in the title of a single reference, and no mention in the text.
The DFX apparatus all looks distressingly proprietary, not to say clumsy. It seems like it will be a long time before this capability leaks out to mainstream use.
After it becomes more accessible, one could hope to compile programmatically generated logic on the fly, and immediately configure some chip area to execute it, perhaps with on-demand / JIT swapping between software and hardware implementation. O Brave New World!
Even for moderately complex designs the synthesis times are way too long for that to be feasible. Unfortunately, a lot of it is inherent to the problems/algorithms involved. So getting rid of the clumsiness of proprietary synthesis tools won't be sufficient, you are going to need another breakthrough.
It is always easy to invent ways for ambitions to be impractical. But, surprise, those are not the things people actually do with new capabilities. Instead they do clever, sensible things that work for them.
Recently @tubetime reverse engineered a Snappy Play video capture device. The mysterious PLAY “HD-1500” at the heart of the device is actually an XC2064.
1999 Pinnacle Studio MP10 is newer, more sophisticated variation on the same design. Inside: EPF6016QC208, 2.5MB EDO DRAM 25MHz 5 x GLT44016-40j4, 32KB SRAM 66MHz GLT725608-15j3, 256KB VRAM 33MHz Nec uPD42280GU-30, clock chip 14.318MHz 3.579MHz 13.5MHz 27MHz 40.5MHz 8-16MHz 2-40MHz 4-80MHz Chrontel ch9081A-S, saa7127h, saa7112h, audio dac TDA1311AT.
I don't know a whole lot about modern FPGA bitstreams, so I'd be interested in anyone has more details. Is the bitstream still essentially a pile of raw bits controlling things, or is there more structure inside?
I would like to also add if your curious about more modern FPGA bit-streams there are these two projects which reverse-engineered bit streams for open source synthesis.
The bitstream is probably decrypted as it enters the chip.
After that at best you might have some simple xoring at each CLB to make someone probing with needles life harder.
Also making sure each data line is not on the top or bottom layer to make non-destructive access harder. At which point you going to need careful application of acid or focused ion beam ablation.
I was not saying anything about that, but if you look at that paper or video you can see the bitstream is passed through an AES decryption stage before programing the logic fabric. This break found a way to read the data after that AES stage.
However, I was just explaining how encryption generally works for these things. Weather or not breaks exist for some chips.
Not an expert but, in modern FPGAs, the CLBs are more complex, there are dedicated resources like multipliers and block RAMs scattered around, and routing resources are much more complex because of the vastly increased size.
Unfortunately, the Spartan6 is way beyond my capabilities. Among other things, it uses 45 nm technology, while the wavelength of light goes down to 400 nm, so my microscope won't show the features.
Not the author, but I recently started down the FPGA rabbit hole myself.
I started with HDLBits [0], which has a bunch of problems that you solve by writing verilog that is then run in a simulator. It starts with simple gates and such, and then builds up from there.
I also purchased a dev board from NANDLand [1] and have been going through the tutorials written for their board. A lot of code is provided for you in the tutorials, but I've been reimplementing it all from scratch as part of the learning process. The later tutorials cover things like UART send/receive and simple VGA.
I'm not affiliated with either site, but have found both to be helpful. Good luck!
Not the author, but my advice would be to grab a cheap devboard and try doing something, anything.
My approach was to implement very simple interfaces without using libraries.
Simulators are nice and all, but you won't feel the magic with these.
There's many cheap options[0] these days, but I'd suggest something based on iCE40[1], as they are cheap and have the most mature support in the open stack thanks to icestorm[2].
Excellent, dirt-cheap options include icestick, tinyfpga bx and blackice mx. These are all <$70.
Honestly, I haven't done a whole lot with FPGAs so others can probably give you a better answer. One thing I did which was fun was generate VGA video signals with an FPGA. This was much easier than I expected and it's rewarding to get immediate visual feedback.
I am pretty sure that I paid something like $300 for the XACT software in 1986. It would be really educational to have students play around with it today. You could edit the bits in the LUTs directly and see the equations change in the bottom pane, or you could edit the equations and see the LUT bits change.
One fun thing was the mode where as you routed a signal it would show the delay so far, so you could backtrack and try a different path to see if it would be faster.
The 2064 had 8x8 = 64 configurable blocks, but many people would assume it would be equivalent to a 64 NAND gate array. So for the next chip they decided that it would be better to use "equivalent gates" in the name, so 2018 would be able to replace an "18" gate array.
Since the bitstream maps directly to the connections, and wrong connections can create short circuits, I wonder how often a corrupted bitstream actually destroys a chip?
Not an expert but from what I recall reading, it used to be possible on the early ones; people would put a finger on the FPGA when testing a new configuration and immediately cut power to the board if it started heating up abnormally. In modern FPGAs, there's thermal protection built in.
Only the most modern FPGAs have thermal protection, but before that they added a CRC check to the bitstream. This leaves open the question of malicious bitstreams...
Interesting question. The currents involved are very small, so it might not be enough to destroy the chip.
One unexpected thing I noticed in the chip is a control signal to pull lines low during startup. CMOS doesn't like floating signals (since you can end up with the pull-up and pull-down transistors both on). But before the configuration is loaded, the routing lines may not be connected to anything. So the FPGA has to pull these lines low until everything is configured, and then release them.
Older parts could be damaged somewhat easily by putting multiple drivers onto an internal tristate bus. If one driver is high and another is low, a current flows and one of the drivers is damaged.
Newer parts (Spartan-3 and later, maybe even earlier) dropped that feature -- it's no longer possible to create internal bus contention. You can still cause issues by driving an external pin that conflicts with an external driver, but that's pretty standard; even microcontrollers have that issue.
Thanks for another cool article! This one has a lot of meat so it will take some time to go through it.
One question, how useful are the patents linked to in the footnotes for your work? (I hesitate to even ask this in fear of stirring yet another patent flame war.)
P.S. Do you have a patreon, or equivalent, page? If not, you really need one. I would happily subscribe.
I would rate those patents as moderately helpful. They go into a lot of detail, but not exactly the details I wanted :-)
For the broader patent question, there is a huge range in patent usefulness. Some patents explain everything clearly and at the end I'm like "Oh, now I understand the problem in detail and how they solved it." Most patents, however, are lawyer-edited mush where I learn nothing from them and it's not clear what they are even describing.
Some Texas Instruments patents are super-detailed, to the point that I could built a calculator simulator from the schematics and source code. The Intel 8086 patents were also extremely helpful (although they covered only half of what I wanted to know).
My opinion on patents is that the examination process should be much different and force patents to clearly describe the specific problem and solution, giving useful background information, more like a conference paper. If a patent doesn't contribute anything useful to the reader, it should be rejected.
As for a Patreon, I don't have one. But CuriousMarc (who I worked on the AGC and other projects with) has one at https://www.patreon.com/curiousmarc
I used to use some one shots to generate CCLK pulse delayed from an RS-232 start bit. The idea is you would load one bit at time into the FPGA, one bit for each RS-232 byte. Once the FPGA was configured, a UART implemented in FPGA logic would come alive and allow it to communicate with a PC.
Anyway, I used xc2064s in the mid 90s for a project once (they were obsolete then, but I had a tube of them). I modified a point of sale terminal (a little credit-card machine) so that it could be automated. This was for a methadone clinic: they had to check each recipient's social security number every day to verify their medicaid eligibility. So the FPGA would simulate keypresses and read the vacuum fluerescent display back to the PC. (My consulting fee was much cheaper than the medicaid software interface.)
Later I used an XC4010E for a PCI-bus interface for a video capture card. Again, how do you avoid the serial PROM? In this case, I used timed-out PCI configuration accesses to load the serial data. Even if the PCI device does not respond, the address lines of the configuration access make it through to the target, so you just need a fast flip flop to latch DIN and CCLK. Once the FPGA was loaded, the card would appear on the PCI bus (but long after enumeration). This worked until PC BIOS started to turn off the PCI clock on slots with no cards in them to save power.
Both of these were in the pre-Verilog days. To design the FPGA you drew a schematic into (in my case) OrCAD. Xilinx had a meta-design capability in those days (called xblox I think?), where one wire in the schematic was equivalent to entire bus. The actual width was a parameter on a primitive. Yes, I made a PCI interface in schematics, it was not fun.