DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

666 points | by aurenvale 10 hours ago ago

255 comments