Back to blog
·company

Why Your Voice Data Should Never Leave Your Device


When you use a cloud dictation service, your voice leaves your computer. It travels across the internet to a data center, gets processed by a server you don't control, and the text comes back. The audio itself — your actual voice — may be stored, logged, or used for model training. You have no way to verify what happens to it once it's gone.

This is not hypothetical. Major tech companies have confirmed that human reviewers listen to voice assistant recordings for quality assurance. Cloud ASR providers routinely retain audio for model improvement unless you explicitly opt out — and even then, the retention policies are buried in terms of service that change without notice.

We built OnType to make this problem disappear entirely.

Voice is biometric data

Your voice is not like a text message or a search query. It carries biometric information — vocal patterns unique to you, emotional state, accent, speech cadence. It's identifiable in a way that typed text simply isn't.

When a cloud dictation service processes your audio, it receives not just the words you said, but a biometric signature that can be used to identify, profile, and track you. Aggregated voice data across sessions builds an increasingly detailed fingerprint.

For individuals, this is a privacy concern. For professionals handling client-confidential information — lawyers, doctors, financial advisors — it's a compliance risk.

The three problems with cloud processing

1. You lose control of your data

Once audio leaves your device, you're trusting the provider's infrastructure, employees, and policies. Data breaches affect even the most security-conscious companies. Subpoenas can compel disclosure of stored audio. And corporate acquisitions can transfer your data to entities whose privacy standards differ from the original provider's.

2. Latency is physics

Cloud processing introduces an irreducible network round-trip. Even on a fast connection, you're adding 100–300ms of latency on top of processing time. On slower connections or behind VPNs, the delay is worse. And if you're offline — on a plane, in a poor-signal area, or simply disconnected by choice — cloud dictation doesn't work at all.

On-device processing eliminates the network entirely. OnType's speech recognition runs on the Neural Engine of Apple Silicon Macs with sub-200ms latency. It works identically whether you're connected to the internet or not.

3. You're paying for someone else's compute

Cloud ASR is expensive to run. Providers pass that cost to users through subscription tiers, per-minute pricing, or usage caps. The more you use it, the more you pay — and the more audio you're sending to their servers.

On-device processing uses hardware you already own. Your Mac's Neural Engine is sitting there, purpose-built for machine learning inference, waiting to be used. OnType's on-device engine is free forever — no usage limits, no subscription required for basic voice typing.

How OnType keeps everything local

OnType ships with quantized MLX models optimized for Apple Silicon. When you hold your hotkey and speak, the audio is captured by your Mac's microphone, processed by the on-device speech recognition engine, and inserted as text at your cursor. At no point does audio or transcript data leave your machine.

The technical architecture is straightforward: audio buffer → MLX inference on Neural Engine → text normalization → cursor insertion. There is no network stack in this path. No telemetry on your speech content. No server to breach.

For users who want access to the most powerful cloud models — for example, when working in extremely noisy environments or with specialized vocabulary — OnType offers an optional Cloud Engine. But the default path is always local, always private, and always available offline.

Privacy as architecture, not policy

Most cloud services promise privacy through policy: "we won't look at your data." That's a legal guarantee, not a technical one. It can be changed, breached, or overridden.

On-device processing provides privacy through architecture. There is no data to breach because the data never leaves. There is no policy to change because there is no server-side collection to govern. The guarantee is structural — it's enforced by the absence of a network path, not by a promise in a terms-of-service document.

This is why we built OnType the way we did. Not because cloud processing is inherently bad — it has legitimate advantages in accuracy and model size. But because voice is too personal, too identifying, and too sensitive to trust to infrastructure you don't control.

Your voice should stay on your device. That shouldn't be an option you have to opt into. It should be the default.

Try OnType — on-device voice typing for macOS, free forever for local processing.