For comparison, Google's Edge TPU (found in the Coral USB accelerator for example) will do 4 INT8 TOPS [0], an Nvidia T4 will do 130 [1], and an A100 or A6000 will do 620 [2]. Fully utilized it could be expected to be radically faster and more efficient than CPU but still of course much slower than workstation/server hardware for these operations.
[0] https://coral.ai/products/accelerator
[1] https://semiengineering.com/tops-memory-throughput-and-infer...
[2] https://www.microway.com/knowledge-center-articles/in-depth-...