Similar or greater inference wins are achieved with speculative decoding which i...

WhitneyLand · on May 9, 2024

I don’t see where similar wins have ever been achieved.

Speculative decoding can reduce latency, but at the cost of using a lot more compute. The amazing thing here is latency and global throughput improvements would be realized because of the increase in efficiency.

From what I understand speculative decoding can also come with more challenges insofar as trying to maintain overall output quality.