I'm not surprised that going back to 3Gbps works; modern high frequency signaling is so close to the boundaries of what works that even things like the length of the SATA cable and how it's oriented, and what temperature components are at, can mean the difference between a working link and one that fails when the right sequence of bits gets sent. DTLE is a way of tuning the transmitter output so that when the signal gets received at the other end, it's as "clear" as possible.
Even early SDRAM controllers had similar tunable settings for clock delays and such, because different DIMMs may vary slightly --- upon POST, the BIOS would set all the settings to nominal values, then nudge each one in one direction while reading/writing pathological data until errors occurred; then nudge them in the other direction until errors occurred, and finally settle on the average of the two extremes. I suspect a similar process needs to be done here, and if you reverse-engineered the BIOS further you would find the algorithm to do it.
Big kudos to the Purism folks for trying to figure these things out. And big fuckings to Intel for keeping this stuff NDA'd --- it only makes it harder for those trying to buy and use your products (I hope someone leaks it all eventually...)
> Even early SDRAM controllers had similar tunable settings for clock delays and such,
Define "early". F00F-era Pentium, 486, 8086...?
> upon POST, the BIOS would set all the settings to nominal values, then nudge each one in one direction while reading/writing pathological data until errors occurred; then nudge them in the other direction until errors occurred, and finally settle on the average of the two extremes.
Is this the seemingly-pointless "memory test" all computers do?
I always thought that just zeroed RAM. TIL it's doing similar things to modem line training. Wow.
Now I remember - I have an old 400MHz Celeron-based system I used years ago with an AMI BIOS that would occasionally recommend a different "RAS-to-CAS delay" on startup. I'd go and change it and reboot, and then a little while (days/weeks) later it would recommend a slightly different setting. It would always alternate between the two delays. I was pretty sure the memory in the system was on the way out and be sad whenever I saw the message, haha.
> Is this the seemingly-pointless "memory test" all computers do?
I might not be sure what your parent is talking but it is called "DDR memory training and calibration". I used to struggle with this when I was bringing up LPDDR2 on an i.MX6 embedded board.
Wow. Really, really impressive. How far back did BIOSes do this - has it always been done?(!)
I can completely understand it being a struggle now. At some point was I thinking of getting into tinkering with DDR3/4 on FPGAs (to play with some video capture ideas), but I'm beginning to not look forward to it... heheh
Also remember that the training also depends on the environmental temperature.
> I can completely understand it being a struggle now. At some point was I thinking of getting into tinkering with DDR3/4 on FPGAs
Understanding of the different values for the registers which the tool recommends and validating the value is a real pain although some of the vendors such as Micron's datasheets explains things nicely.
Xilinx MIG has a really dense guide and you really need to understand your timing etc. It is really not for the faint hearted.
In case of the i.MX6 SoC I do the training and calibration and the recommended register values obtained is provided to U-boot. So now when U-boot starts up it starts on the SoC inbuilt SRAM which is very less (a few kilobytes). Once the first stage U-boot is booted it sets up the memory controller registers and the boots using this RAM.
When the DDR/LPDDR training happens it is supposed to be left overnight for multiple iterations. Temperature also matters for proper values and hence one of the reasons the multiple iteration testing done overnight.
In my case I had some really incompetent board designers who were just blaming. Finally since there might have been layout issues I was forced to lower the clock speeds (halve the clocks) for the DDR interfacing.
The original Xbox from Microsoft was basically a slightly modified x86 computer (complete with actually-usb-ports-with-a-proprietary-plug controller ports)[0], where the 64mb of DDR SDRAM was soldered directly onto the motherboard.
Now, it's widely known within the Xbox "scene" that the quality of the RAM fluctuates a lot between different machines. The speculation was that Microsoft bought the cheapest bottom-of-the-barrel RAM it could find in bulk, which meant they couldn't hardcode the memory timings.
Instead, on boot, it clocks the RAM at the highest speed and does a quick write-and-read test. If it fails, it clocks it down a step and repeats the test until it finds a stable frequency.
This explains why sometimes the same game would run perfectly on one Xbox while stuttering on the other one, even when swapping DVD drives.
> Is this the seemingly-pointless "memory test" all computers do?
I think that the classic case was that the computer wrote and read to/from every memory address to check that the memory reported present was actually there. I'm not positive, but I think that most machines these days do a cut-down version of that to save time.
I also think that auto-adjusting the timing is probably a separate process.
I have an old 400MHz Celeron-based system I used years ago with an AMI BIOS that would occasionally recommend a different "RAS-to-CAS delay" on startup.
As far as I know, that's roughly the era when this was introduced --- late Pentium/early Pentium II, mid to late-90s.
The "memory test" happens after the tuning (which in my experience only touches limited portions of the whole address space) but if you watch carefully and have multiple, slightly differing (or even failing) memory modules, you may see it repeat once or twice as it encounters an error and retries the tuning.
Regarding the testing only touching limited portions of the address space, that reminds me of my early 90s vintage 486 (which I still have! :D). It would make the PC speaker tick as it counted out RAM. The first few ticks would be a little slow, then the rest would run quicker. It took me a few years to realize that it was slowly counting out the first 640K, then racing to the rest of the installed 8MB.
I remember very well (this was the first computers I owned that I've been able to keep around... somewhere, lol) that there were exactly four "slow" ticks and then the rest were faster. I definitely have to find all the bits for that machine sometime - I just tried to reproduce the ticking sound with `beep` but failed, and Googling to figure out what frequency and duration the ticks might have had was perhaps predictably useless.
One one of our boards, we had intermittent USB failures (sometimes after 5 minutes, sometimes after 13 days of continuous operation) that were ultimately caused by slightly misaligned/misspaced USB D+/D- traces. We ended up having to force usage of USB full speed mode for USB high speed devices via a debug register (not unlike the article's case). Makes me feel slightly better that even people much more competent than I end up having to use such workarounds.
If I were to guess, given the tolerances in timing for USB 2.0, one line was slightly longer than the other on the board. That'd mean that the transition on D- would happen at a different time than the one on D+ causing corrupted data.
Every time I've had to deal with troubleshooting weird firmware or hardware (and good luck sometimes knowing which it is), I've been continually amazed at the giant mess that's being pushed as commercial or "enterprise" equipment. It appears every product enters a special phase of a few weeks to a few months right before shipping where time constraints force whoever is responsible for the code to wildly shovel crap fixes into place until the hardware "works".
An alternate explanation is that most this code is written by electrical engineers that while smart, don't quite have enough experience managing larger code projects yet and are still prone to some of the newbie software developer mistakes, and probably without good mentoring on that front because those that have already weathered this storm and learned enough to sail through smoothly have either jumped ship for greener pastures or been promoted enough to be out of the trenches but not enough to have sway to fix the problem.
It is definitely not lack of experience. Some of the best coders I've known have been hardcore low level EE guys. The issue is organizational priorities and available bandwidth. As soon as the product ships the core people are moved to the next project. Maintenance is done by a few guys whose job it is not to break anything.
If there is something they can't fix, it goes back to the core guys who are already busy working on the new project and can only dedicate so much time to addressing this issue.
About the open phone: Knowing that KaKaRoTo is involved, I start top believe that this will not be vaporware. If I had enough money I would surely put some in this project. Every time somebody talked about open phone I just think "another failure", now I know it has some possibility to really happen.
As you can see by his text, he is ready to solve this puzzle. Anyway, good job Youness, keep going strong.
I was skeptical about those Purism products which shipped with proprietary BIOS and they claims they are working on coreboot and disable Intel ME in the future.
At least, they are trying to keep the promise of coreboot part(It's coreboot not libreboot or librecore so it must contains binary blob firmware... still better than nothing)
You don't need to use Intel's FSP if Intel will tell you how to do what the FSP does. FSP doesn't do anything magical, it just happens to setup a whole lot of early boot things for you like SDRAM configuration, microcode loading (and FSP knows what instructions its not allowed to use before microcode gets loaded), and starting the ME.
For Intel Bay Trail parts (which Intel publicly says you have to use FSP for), I was once told by a BIOS/firmware vendor that they didn't use Intel's FSP because it was too slow and their customers wanted very fast boot times. No idea if they were pulling my leg or not, but Intel's FSP on Bay Trail wasn't exactly super speedy.
It would be nice having a centralized list of the hardware that uses "liberated" CPUs, just in case one pops in front of us at discounted prices, flea markets etc.
Second this! Anyone tried the trackpad? I have hopes it's not completely as terrible as most PC's. Between that, and this (cool) work on open boot processes, it might make my next laptop after my macbook pro ages some more.
That was a very disappointing article. The issue was not solved, and the workaround has no reason to work other than it does, meaning that it could also stop working anytime.
i am not sure how anyone with a clear conscience could ship a device which works by accident
> i am not sure how anyone with a clear conscience could ship a device which works by accident
LOL, welcome to enterprise level "Ship now, the service contract gives us at least a few weeks to work out the kinks" hardware, where understanding how the whole system actually works is a luxury most the engineers don't even have.
Kudos to them for giving it a good shot and continuing to look. A lot of times, unless you have money or clout to throw at some of the component originators, you're out of luck if you want real answers. The few engineers that have them are probably so busy that to devote them to figuring out what's really going on isn't worthwhile for the companies without some higher level intervention..
Even early SDRAM controllers had similar tunable settings for clock delays and such, because different DIMMs may vary slightly --- upon POST, the BIOS would set all the settings to nominal values, then nudge each one in one direction while reading/writing pathological data until errors occurred; then nudge them in the other direction until errors occurred, and finally settle on the average of the two extremes. I suspect a similar process needs to be done here, and if you reverse-engineered the BIOS further you would find the algorithm to do it.
More interesting information on DTLE "discrete time linear equalization" here: http://cc.ee.ntu.edu.tw/~rbwu/rapid_content/course/highspeed...
Big kudos to the Purism folks for trying to figure these things out. And big fuckings to Intel for keeping this stuff NDA'd --- it only makes it harder for those trying to buy and use your products (I hope someone leaks it all eventually...)