← BlogFor developers

Why Developers Stopped Typing Design Docs

The bottleneck in developer workflows shifted. It wasn't gradual. It happened when Claude and Cursor became the default IDE, not the auxiliary tool. You've stopped typing code and started typing intent, which means you're spending 15 minutes explaining what you want the model to build instead of 15 minutes building it yourself.

That's changed what voice is actually good for.

The 15-minute design doc problem

Marcus, a backend engineer at a Series B fintech in Stockholm, dictates design docs at 11pm when the problem's still clear in his head. He's using Cursor, which has tab-complete for intent explanation, so the voice-to-text latency matters less. What matters is uninterrupted flow. Ten minutes straight, no stopping to correct, no thinking about punctuation. Just speaking the shape of the system.

Then he hits a word limit. Not because of a technical constraint. Because the dictation tool metered him. Wispr Flow caps its free tier at 5,000 words a month. Willow at 3,000. Otter at 2,000. The limit didn't matter when voice was a novelty. It matters now because it's part of the routine. And when the routine breaks mid-flow, you lose the thread. Marcus stops, switches to typing, fragments the prose into pieces he's got to reassemble the next morning.

That's the actual problem the word cap creates. Not "I've used up my allocation." It's "I was in the zone and the tool ejected me."

Why cloud transcription drives these limits

Every commercial dictation tool uses cloud transcription because latency matters for accuracy, and latency costs money. Send 5,000 words of audio to a cloud service, process it on their hardware, you're paying per request or per minute. The variable cost is real. So they meter it: free tier capped at 3,000 words, you want more you're paying $12 a month.

Totally logical business. Totally wrong for the developer workflow.

Local Whisper changes that equation. Whisper's OpenAI's speech-to-text model, released under Apache 2.0 in 2023. It runs on your machine. Zero variable cost per word transcribed. The engineering cost to serve you is identical whether you dictate 1,000 words a month or 100,000. There's no financial incentive to meter you.

The privacy angle

Marcus also refuses cloud transcription because the design docs contain code snippets, variable names, database schemas. None of it's public IP, all of it's sensitive. He trusts Wispr Flow. But "I trust them" is different from "the code never leaves my machine." The difference matters when you're in fintech. He uses Cursor, not VS Code, specifically because Cursor's tab-complete reduces voice rewrites and also because he can self-host the Cursor instance. He's thought through this.

Recitey runs Whisper locally. The audio is processed on your device. No audio file ever leaves your machine. No transcript sent to a third party. This is technically true of the local Whisper variant of other tools too, but Recitey builds the entire free tier on local Whisper, not as a low-grade option.

What changes when there's no cap

Three things:

First, you stop optimizing for word economy. You don't think about which phrases to include. You just dictate the thought. Which is faster.

Second, you're not having a mental counter. You don't subconsciously edit as you speak to stay under the limit. Which means more natural, spoken prose that actually sounds like you explaining the problem, not you writing carefully to conserve words.

Third, you're not context-switching between the tool and a text editor. The entire design doc fits in one flow. One session. No "I'm at the cap so I'll finish this tomorrow."

The trade-off

Local Whisper is less accurate than cloud Whisper on certain accents, background noise, technical jargon. It's 96.3% accurate on LibriSpeech, which is a standard benchmark. That's good enough for a first draft, which is what voice is for. The rewrite, turning spoken prose into polished documentation, that still happens. It's just no longer interrupted by an arbitrary word limit.

Recitey's Pro tier handles the rewrite part: it polishes the rough draft into a clean sentence in under 2 seconds. But the free tier is the dictation layer, uncapped and local. You're not locked into paying for both.

Who this is for and who it isn't

This is for developers who dictate. That's a specific group. Not every developer speaks at the pace they think. Not every problem is best explained aloud. Cursor users, yes. Developers who pair-code and are used to talking through code, yes. Backend engineers documenting 11pm realizations in Notion, yes. People building in Claude Code and talking to the chat while their hands are on a keyboard, yes.

If you type faster than you talk, or if you prefer written thinking, local Whisper doesn't change your math. Use what you're using.

But if you're the person who opens Notion at 11pm with the system architecture clear in your head, and you want to capture it before you lose the thread, the word cap is theft. It's taking the tool that's supposed to help you think and breaking it mid-thought.

Local Whisper is designed for that specific moment. No cap, no counter, no cloud hop. Just you and the explanation.

More posts
Why Developers Stopped Typing Design Docs | Recitey