Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
I just learned about Handy in this thread and it looks great!
I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.
This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.
The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.
Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
Is there a tool that preserves the audio? I want both, the transcript and the audio.
Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.
There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.
https://github.com/Beingpax/VoiceInk
+1, my experience improved quite a bit when I switched to the parakeet model, they should definitely use that as the default.
My favorite too. I use the parakeet model
Was searching for this this morning and settled on https://handy.computer/
I use handy as well, and love it.
I just learned about Handy in this thread and it looks great!
I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.
This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.
The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.
As a very happy Handy user, it doesn't do that indeed. It will be interesting to see if it works better, I'll give FreeFlow a shot, thanks!
Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).
Does anyone know of an effective alternative for Android?
Check out the FUTO keyboard or FUTO voice input apps. It only uses the whisper models though so far.
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
Nice! I vibe coded the same this weekend but for OpenAI however less polished https://github.com/sonu27/voicebardictate
Also look into voxtral, their new model is good and half the price if you can live without streaming.
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
Some day!
https://github.com/cjpais/Handy
It’s free and offline
Wow, Handy looks really great and super polished. Demo at https://handy.computer/
MacOS only. May this help you skip a click.
Whispering [0] is Windows compatible and has gotten a lot better on Windows despite being extremely rough around the edges at first.
[0] https://github.com/EpicenterHQ/epicenter