Currently the on device models such as Parakeet and Whisper are great for English, faster than cloud hosted models a little less accurate - if you switch on the post processing, the ASR output goes through a fine tuned Qwen 3.5 model that improves the accuracy, formatting etc - all of the code is open source feel free to inspect and suggest perf improvements as a PR!
Github: https://github.com/pHequals7/muesli
Looking to add on device CUA and support more models (MSFT Vibevoice, IBM Granite etc)
How are you handling the on device speech pipeline, especially around model size, latency, and accuracy tradeoffs on consumer hardware?
Currently the on device models such as Parakeet and Whisper are great for English, faster than cloud hosted models a little less accurate - if you switch on the post processing, the ASR output goes through a fine tuned Qwen 3.5 model that improves the accuracy, formatting etc - all of the code is open source feel free to inspect and suggest perf improvements as a PR!
Finally something that works local and feels polished!
let me know if you face any issues - and always looking for more collaborators!
Looking for something like this. An OSS on device version where I can store transcripts as markdowns in my file system.
very cool - going to try this out today
love the sly name!
this is now making me hungry