Beginner pytorch user here... it looks like it is using only one CPU on my machine. Is it feasible to use more than one? If so, what options/env vars/code change are necessary?
Perhaps try setting `OMP_NUM_THREADS`, for example `OMP_NUM_THREADS=4 torchrun ...`.
But on my machine, it automatically used all 12 available physical cores. Setting OMP_NUM_THREADS=2 for example lets me decrease the number of cores being used, but increasing it to try and use all 24 logical threads has no effect. YMMV.
Yeah, I have (maybe slightly fond?) memories of using Forth to develop a tape drive reader and writer for an undergraduate lab project. It is wonderful for some things, although in this case, where the problem was literally which addresses the instructions got assigned to, it is unclear if it would have made anything better.
Here "allocation" is all fixed size things pulled from a fixed size buffer at startup. Technically malloc is compiled in the firmware, but it isn't used for anything but some C++ runtime initialization confirmed with debugger breakpoints. The only dynamic use of memory is the call stack, which has only fixed size local variables and limited depth recursion.
Similarly, "interrupts" may not mean what you are thinking. The highest priority interrupt is one attached to the PWM timer that operates the primary control loop that operates in interrupt context. As of a few months ago this is slightly more complicated to accommodate some "soft" quadrature decoding, but the principle is still the same that all motor control is performed in an interrupt context and nearly nothing else is.
Everything else, like CAN communication, is performed in a polling manner in the "main" loop.
That is possible, although here the consecutive writes were to different ADC peripherals. The ADC peripherals do share some common configuration and triggering, but I believe are otherwise largely independent.
but they're going over the same peripheral bus (from the CPU, and being synchronised to a subsystem with likely the same ADC clock domain - I'd design that hardware once (metastability stuff is notorious for being hard to get right, especially when you are trying to transfer multiple related bits across clock boundaries at the same time, and you want them all to arrive together)
Yep, there is probably one clock domain for all the ADCs, although there are two different prescalers (one for ADC1/2 and another for 3/4/5).
I could see one of the writes getting lost. In this case though, the ADC enable is what seems to be timing sensitive, however the ADCs always end up enabled properly. It is just that a write that was significantly earlier (the one that sets the prescaler) seems to be lost, despite the register reading back that it was read correctly.
I would expect that if the synchronization failed, reads back would read the wrong value?
In this case, the firmware did wait for the ADRDY flag. It just waited for all 5 to be set, then moved on to enable all 5 ADCs simultaneously. The easy fix was to just do those serially instead.
Actually, if you look at the firmware at the time, the proper procedure was followed as far as I can tell. All of the necessary bits were set and checked with the appropriate delays where required.
For what it is worth, the G0/4 family is relatively new. I'm pretty sure it has unique ADC IP too, since the published errata (which I'm very familiar with) are different from any other ST chip I know of.
The clock should of course have been suspect (as noted in the writeup). The "bad state" in this problem was basically indistinguishable from running the ADC at too high a clock rate. In fact, the default rate when I first encountered this problem does ever so slightly overclock the ADC. It is rated for 60MHz for single ADC operation, but only 26MHz for multiple ADCs. The firmware used to run the ADCs at ~28MHz, purposefully going a tiny bit above that.
I didn't include it in the writeup since it was somewhat of a diversion, but this particular problem occurred even with the ADCs configured to be clocked slower. As mentioned, I think that their clock configuration became mis-set as a result of the underlying problem.
And while poor decoupling is also a likely problem, I'm 95% sure it is about as good as it can get. A high quality cap of appropriate size is immediately next to the chip on every supply pin with vias directly to the ground plane. This is a low pin count QFN part, so the only ground on the chip is the center pad, which is also via'ed directly to the ground plane.
I wonder if it would be possible to create a test jig that turns on the ADCs all at once then samples data through them (perhaps just from a function generator)?
Surprisingly, it is almost certainly genuine. These particular chips likely came from a batch delivered in April of 2021 from Mouser, who isn't known for their shoddy sourcing practices.