Algorithms, Design, Code and more: Speculative Decoding

Wednesday, April 16, 2025

Speculative Decoding

Ensemble of Weak + Strong model
Weak model has a quick first go at generating tokens/ inference (potentials)
Followed by the Strong, but slow model which catches up & uses the outputs of the weak model, samples them, grades them, accepting/ rejecting them to generate the final output
Overall making inferences via LLMs quicker and cheaper

More to follow..

https://pytorch.org/blog/hitchhikers-guide-speculative-decoding/
https://www.baseten.co/blog/a-quick-introduction-to-speculative-decoding/
https://research.google/blog/looking-back-at-speculative-decoding/
https://medium.com/ai-science/speculative-decoding-make-llm-inference-faster-c004501af120

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)