jpieper's comments

jpieper · on March 4, 2023

Beginner pytorch user here... it looks like it is using only one CPU on my machine. Is it feasible to use more than one? If so, what options/env vars/code change are necessary?

markasoftware · on March 4, 2023

Perhaps try setting `OMP_NUM_THREADS`, for example `OMP_NUM_THREADS=4 torchrun ...`.

But on my machine, it automatically used all 12 available physical cores. Setting OMP_NUM_THREADS=2 for example lets me decrease the number of cores being used, but increasing it to try and use all 24 logical threads has no effect. YMMV.

jpieper · on Aug 8, 2022

I'm curious what other OSS firmware is using the G4 series?

the__alchemist · on Aug 9, 2022

Betaflight

I also have custom flight controller firmware in Rust that's running fine on G491, but Betaflight needs 47x or 48x.

No luck on getting H7s.

jpieper · on Aug 8, 2022

Yeah, I have (maybe slightly fond?) memories of using Forth to develop a tape drive reader and writer for an undergraduate lab project. It is wonderful for some things, although in this case, where the problem was literally which addresses the instructions got assigned to, it is unclear if it would have made anything better.

jpieper · on Aug 8, 2022

Here "allocation" is all fixed size things pulled from a fixed size buffer at startup. Technically malloc is compiled in the firmware, but it isn't used for anything but some C++ runtime initialization confirmed with debugger breakpoints. The only dynamic use of memory is the call stack, which has only fixed size local variables and limited depth recursion.

Similarly, "interrupts" may not mean what you are thinking. The highest priority interrupt is one attached to the PWM timer that operates the primary control loop that operates in interrupt context. As of a few months ago this is slightly more complicated to accommodate some "soft" quadrature decoding, but the principle is still the same that all motor control is performed in an interrupt context and nearly nothing else is.

Everything else, like CAN communication, is performed in a polling manner in the "main" loop.

zbrozek · on Aug 8, 2022

Are you out of timers? IIRC the STM32 series typically has timer peripherals that can be configured as encoder inputs.

jpieper · on Aug 8, 2022

No, but out of pins connected to timers with the appropriate capabilities.

jpieper · on Aug 8, 2022

That is possible, although here the consecutive writes were to different ADC peripherals. The ADC peripherals do share some common configuration and triggering, but I believe are otherwise largely independent.

Taniwha · on Aug 8, 2022

but they're going over the same peripheral bus (from the CPU, and being synchronised to a subsystem with likely the same ADC clock domain - I'd design that hardware once (metastability stuff is notorious for being hard to get right, especially when you are trying to transfer multiple related bits across clock boundaries at the same time, and you want them all to arrive together)

jpieper · on Aug 8, 2022

Yep, there is probably one clock domain for all the ADCs, although there are two different prescalers (one for ADC1/2 and another for 3/4/5).

I could see one of the writes getting lost. In this case though, the ADC enable is what seems to be timing sensitive, however the ADCs always end up enabled properly. It is just that a write that was significantly earlier (the one that sets the prescaler) seems to be lost, despite the register reading back that it was read correctly.

I would expect that if the synchronization failed, reads back would read the wrong value?

jpieper · on Aug 8, 2022

In this case, the firmware did wait for the ADRDY flag. It just waited for all 5 to be set, then moved on to enable all 5 ADCs simultaneously. The easy fix was to just do those serially instead.

jpieper · on Aug 8, 2022

Actually, if you look at the firmware at the time, the proper procedure was followed as far as I can tell. All of the necessary bits were set and checked with the appropriate delays where required.

https://github.com/mjbots/moteus/blob/dcb900c92ffd5d5c8f5405...

Notably, the datasheet is largely silent on interactions between different ADCs during initialization.

jpieper · on Aug 8, 2022

For what it is worth, the G0/4 family is relatively new. I'm pretty sure it has unique ADC IP too, since the published errata (which I'm very familiar with) are different from any other ST chip I know of.

The clock should of course have been suspect (as noted in the writeup). The "bad state" in this problem was basically indistinguishable from running the ADC at too high a clock rate. In fact, the default rate when I first encountered this problem does ever so slightly overclock the ADC. It is rated for 60MHz for single ADC operation, but only 26MHz for multiple ADCs. The firmware used to run the ADCs at ~28MHz, purposefully going a tiny bit above that.

I didn't include it in the writeup since it was somewhat of a diversion, but this particular problem occurred even with the ADCs configured to be clocked slower. As mentioned, I think that their clock configuration became mis-set as a result of the underlying problem.

And while poor decoupling is also a likely problem, I'm 95% sure it is about as good as it can get. A high quality cap of appropriate size is immediately next to the chip on every supply pin with vias directly to the ground plane. This is a low pin count QFN part, so the only ground on the chip is the center pad, which is also via'ed directly to the ground plane.

exikyut · on Aug 8, 2022

I wonder if it would be possible to create a test jig that turns on the ADCs all at once then samples data through them (perhaps just from a function generator)?

jpieper · on Aug 8, 2022

Order far in advance....? These particular chips arrived more than a year ago.

But yes, the shortage is hitting hard here too. The last tray I received was one year ago, after which all orders have been unfulfilled.

jpieper · on Aug 8, 2022

Surprisingly, it is almost certainly genuine. These particular chips likely came from a batch delivered in April of 2021 from Mouser, who isn't known for their shoddy sourcing practices.

duskwuff · on Aug 8, 2022

It's also a relatively new part which hasn't developed the kind of demand that leads to counterfeiting yet.

Most of the counterfeiters are targeting well-established old parts like the STM32F103.

fxtentacle · on Aug 8, 2022

Agree, I've never had bad experiences with Mouser.