Apr 21, 2026·Product

OnType 1.0 — Your Voice, Your Keyboard

We started building OnType on January 31, 2026. Eighty-one days and nearly 900 commits later, we are shipping version 1.0 — a macOS voice input tool that turns your voice into text exactly where your cursor is. No app-switching. No waiting. No cloud required by default.

OnType is not just a dictation app. It is three different ways to speak, each designed for a specific kind of moment. Here is how they work.

Tap to speak freely. AI removes fillers and restructures.

Push-to-Talk — Think it, say it, send it

The simplest mode. Hold your hotkey (Fn by default), speak, and release. Text appears at your cursor the instant you let go. Real-time streaming means you see every word as it is recognized — not after a delay, not in a separate window.

Behind the scenes, we have done extensive latency engineering. Audio recording automatically trims the first 120 milliseconds — the gap between pressing the key and actually speaking — so the ASR engine never wastes time on silence or key-press noise. The result is that transcription starts effectively the moment your first syllable hits the microphone.

Say something quick like "see you at three, same place" in Slack. OnType transcribes it locally via MLX-optimized speech recognition running on the Neural Engine of your Apple Silicon Mac. Sub-200ms latency. Your audio never leaves the machine.

see you at three same place

Compose — Turn messy speech into clean writing

Real speech is messy. We say "um" and "like." We start a sentence, correct ourselves mid-stream, and trail off. Traditional dictation transcribes every false start faithfully. OnType Compose does not.

Tap your hotkey once to start recording. Speak freely — fillers, self-corrections, half-formed thoughts and all. Tap again to finish. OnType passes your raw transcript through an on-device rewriting engine that understands what you actually meant.

Here is what it looks like in practice. You speak something like this:

um, so, about the launch next week, first we need to update the docs, and then, like, the test cases aren't done yet, wait no, tests are done, it's the deploy scripts that need checking. and then performance needs some optimization too, oh right, most importantly client compatibility, that's top priority. uh, the docs thing is mainly about syncing the API changes.

The teleprompter in the OnType HUD visualizes your speech in real time — filler words get a subtle wavy underline, self-corrections appear with a strikethrough, and voice commands (like "wait, no" to indicate a correction) are highlighted in blue. You can see the raw chaos as it happens.

um so about the launch next week, first we need to update the docs, and then like the test cases aren't done yet, wait no, tests are done, it's the deploy scripts that need checking. and then performance needs some optimization too, oh right, most importantly client compatibility, that's top priority. uh the docs thing is mainly about syncing the API changes.

Then, after you tap to finish, the AI rewrites it into something actually usable:

Launch prep for next week:

Client compatibility testing (top priority)

Deploy script verification

API change documentation sync

Performance optimization

How the rewrite engine understands you

The AI does not just remove filler words. It recognizes three distinct kinds of self-correction and handles each one differently:

Explicit retraction — when you say "wait, no" or "it should be", the engine discards everything before the correction signal and keeps only the corrected version.
Override by repetition — when you restart a phrase and say it again with modifications, the second version supersedes the first.
Inline annotation — when you clarify a term ("here, 'pie' refers to PI"), the engine replaces the original with the corrected version and removes the meta-explanation.

It also fixes ASR recognition errors by reasoning about context. Homophones that sound right but make no sense — like "refrigeration" when discussing AI models, which should be "intelligence" — are corrected automatically. Brand names misrecognized into similar-sounding words are restored when the context supports it.

Scene-aware rewriting

OnType detects which app you are currently using and adjusts its rewrite strategy accordingly. A quick message in WeChat gets minimal intervention — just filler removal and error correction. A long-form thought in Notion gets actively restructured into logical paragraphs with the main point leading. Meeting notes in Linear get bulleted lists and topic grouping. An AI prompt in Claude gets optimized for prompt quality — clarifying intent, separating context from instructions, making constraints explicit.

This works in Chinese, English, Japanese, Korean, French, Spanish, German, and Italian. The prompts are scene-aware — they know whether you are drafting an email, taking meeting notes, or writing code comments — and adjust tone and structure accordingly.

Translate — Speak in one language, write in another

Sometimes you need to write in a language you are not currently thinking in. OnType Translate lets you speak naturally in one language and outputs polished text in another.

Tap to start, speak your sentence, then hold Shift while finishing. The HUD dot turns blue to indicate translation mode is active. Release, and the translated result appears at your cursor.

我想订明天上午到北京的航班，不对，是改签，不是订新的

我想订明天上午到北京的航班，不对，是改签，不是订新的

→ I'd like to reschedule my flight to Beijing tomorrow morning.

Engineered for speed

OnType is obsessed with low latency. The streaming HUD shows confirmed text — words the ASR engine is confident about — alongside provisional text that may still change. As you speak, the HUD pill grows and scrolls smoothly, always keeping the newest words visible. You are not staring at a static "Listening..." indicator. You are watching your words appear in real time.

In Push-to-Talk mode, the gap between releasing the hotkey and seeing text at your cursor is typically sub-second. In Compose mode, even with the full AI rewrite pipeline, the end-to-end delay is usually under two seconds. We achieve this through a chunked rewrite runtime that processes transcript segments as they arrive, rather than waiting for the entire recording to finish.

Privacy by design

The default path is fully offline. We ship quantized MLX models that run on the Neural Engine of M1-and-later Macs. Whisper-class accuracy at hardware-accelerated speed. For users who need the heaviest models or work on Intel Macs, Cloud Engine providers are available as an option — but the on-device path is always there, always private, always instant.

OnType also includes a custom-built inverse text normalization (ITN) engine. "three thousand dollars" becomes "$3,000." Dates, currencies, and numbers are formatted correctly for your locale automatically. And voice commands like "new line" or "colon" convert to actual keyboard actions in real time.

How we got here

The project started as a Swift prototype focused on one hard problem: getting transcribed text reliably inserted at the cursor across every macOS app. We built a three-tier insertion pipeline — Accessibility API, keyboard simulation, clipboard fallback — and tested it in browsers, terminals, design tools, and IDEs.

From there, the scope grew. Real-time streaming via WebSocket. Multiple ASR providers (DashScope, Volcengine, OpenAI) for Cloud Engine. An IME bundle for apps that do not support direct insertion. A WebView-based settings UI with the interactive overview demos you see above. Onboarding flows that guide users through permission grants. Sparkle auto-updates. Sentry error reporting. A custom Zig- compiled finite-state transducer library for text replacement.

Version 1.0 is a foundation, not a finish line. We are already working on better mixed-language speech handling, richer compose modes, and deeper integrations with the tools developers use most.