Voice Typing on macOS: The Complete Guide
The average person types about 40 words per minute. The average person speaks at 130–150. That gap is why voice typing exists — and why more Mac users are adopting it every year.
But "voice typing on Mac" can mean very different things. Apple ships its own dictation. Third-party tools range from cloud transcription services to fully on-device speech engines. Some work in every app, some only in specific ones. Some send your audio to servers, some keep everything local.
This guide covers all of it — what's available, how each approach works, and how to decide which one fits your needs.
Apple's built-in dictation
Every Mac has a dictation feature in System Settings → Keyboard → Dictation. Turn it on, press the microphone key (or double-tap Fn), and start speaking. It works in most native text fields.
The limitations become clear quickly:
- Inconsistent app support. Dictation relies on the standard macOS text input system. Electron apps, web-based editors, and many developer tools either don't support it or support it only partially.
- No rewriting or cleanup. What you say is what you get — filler words, false starts, and all.
- Cloud dependency. Enhanced Dictation (the on-device option) was removed in macOS Ventura. Current dictation sends audio to Apple's servers by default.
- No real-time feedback. You speak into a void and wait for the result. There's no streaming transcription display.
For quick notes in Apple's own apps, built-in dictation works fine. For anything more demanding, you'll hit its ceiling fast.
What to look for in a voice typing tool
If you're evaluating third-party options, these are the dimensions that actually matter:
- Where it works. System-wide support means you can dictate into Slack, VS Code, your browser, a terminal — anywhere you'd normally type. Some tools only work in specific apps or their own window.
- Where audio is processed. Cloud processing means your voice leaves your machine. On-device processing keeps everything local. This affects privacy, latency, and offline availability.
- Latency. The delay between speaking and seeing text. Sub-second feels instant. Anything over two seconds breaks your train of thought.
- Text cleanup. Raw transcription includes every "um" and half-finished sentence. Advanced tools offer AI rewriting that turns messy speech into clean text.
- Language support. Can you switch between English and Chinese mid-sentence? Is CJK text normalization handled correctly — numbers, currencies, punctuation?
The third-party landscape
Cloud-first tools
Services like Otter.ai and Wispr Flow send audio to cloud servers for processing. They often deliver high accuracy thanks to large server-side models, but require an internet connection, introduce network latency, and route your audio through third-party infrastructure.
File-based transcription
Tools like MacWhisper are designed for transcribing recorded audio — meetings, podcasts, interviews. They're excellent at what they do, but they're not real-time voice input tools. You can't hold a key, speak, and have text appear at your cursor.
On-device, real-time voice input
This is the newest category. OnType runs speech recognition locally on your Mac's Apple Silicon chip via MLX. Audio never leaves the device. Text appears in real time as you speak, in whatever app has focus — system-wide.
The historical trade-off was accuracy — on-device models used to be notably worse than cloud models. That gap has narrowed dramatically. Optimized inference frameworks now run Whisper-class models on the Neural Engine at hardware-accelerated speeds, delivering accuracy that rivals cloud services with zero network latency.
Setting up voice typing for the best results
Microphone choice
The built-in MacBook microphone is serviceable. An external microphone — even a basic USB one — reduces background noise and improves recognition accuracy. If you use AirPods or Bluetooth headphones, note that Bluetooth's HFP profile switches audio to a lower-quality codec during recording. Selecting a non-Bluetooth input device avoids this.
Natural speech
Modern speech recognition works best with natural speech patterns. You don't need to enunciate robotically or slow down. Talk as you would to a colleague. Good voice typing tools handle punctuation, numbers, and formatting automatically — "three thousand dollars" becomes "$3,000", and "new line" inserts an actual line break.
Match the mode to the task
Advanced tools offer multiple input modes. Quick dictation for short messages. Compose or rewrite mode for long-form content where AI cleans up your speech. Translation mode for bilingual workflows. Matching the right mode to the task is the fastest way to improve your results.
Common use cases
Voice typing is not limited to people who can't type fast. Developers use it to dictate code comments, AI prompts, and documentation without switching mental contexts. Writers draft at 3x their typing speed. Legal professionals use it for case notes and contract drafting where on-device processing satisfies client confidentiality requirements. And for users with accessibility needs, voice input is the primary way they interact with their computer.
Getting started
If you want to try on-device voice typing with zero cloud dependency, download OnType. It's free to use with the on-device engine on any Apple Silicon Mac running macOS 15 or later. Our getting started guide walks through setup and your first dictation.