The Design Spec That Got Cut in Half

It was 11pm when Marcus started dictating the payment settlement architecture. Three minutes in, his dictation tool hit the word limit for this month. He switched to typing, and the next morning, he didn't recognize the notes. They didn't flow.

Marcus works at a Series B fintech in Stockholm. Like most engineers on LLM-heavy teams, he's stopped typing code and started typing intent. The bottleneck shifted from implementation speed to clarity of specification. Voice is faster for that. But every dictation tool he'd tried capped free tier at 2000 words per month, then charged $14 to uncap. Wispr Flow, Willow, Superwhisper. Same pattern.

The bottleneck moved upstream

Your coding speed doesn't matter anymore. What matters is how clearly you can explain what you want the model to build. Cursor tab-complete feels faster than VS Code specifically because fewer rewrites are needed, less friction in the speak-to-intent-to-code loop. But that loop is now longer. You're not dictating the function. You're dictating the rule the function should implement, the context it needs, the assumptions it should validate.

That's more words. Much more. And it's the work that actually matters.

The economics of word caps

Most dictation tools are cloud-based. Your audio travels to a server, where Whisper or another model transcribes it. Then a polish model cleans it up. That's the pitch. But each minute of audio costs money to process, so they cap free tier to control costs.

Wispr Flow caps free tier at 2000 words per month. That's roughly 7 minutes of dictation. For a design doc, you're done.

The cap isn't a technical limit. It's a pricing lever protecting cloud compute cost. Which means if a tool is uncapped on the free tier, the architecture is fundamentally different.

What changes when the model runs locally

Recitey runs Whisper on your device. No audio leaves the machine. No cloud inference, no variable cost per word. Your speech-to-text is free, now, and forever, with zero word counter.

This matters for two reasons:

Developers don't trust cloud dictation with code. Even if your company's data handling is clean, sending a code snippet and architectural decision to a third-party API feels like a compliance risk. Marcus refused to try Wispr because code IP leaves the device.
The variable cost structure of local inference means uncapping free tier is economically viable. The only cost is your device's CPU, which you're already paying for.

What you give up on free tier

Recitey's free tier is transcription only. The polish pass, the rewrite that turns 'uh so the function should like, iterate over' into clean prose, lives in Pro.

That's the trade-off. Wispr and competitors charge for the transcription because it's expensive. Recitey charges for the rewrite because the transcription is free.

For most developers, this is the better design. You want dictation to be frictionless. You want the rewrite to be optional. The moment your transcription tool starts throttling your thinking, the economics are wrong.

Who this is for

If you're dictating design docs, architectural decisions, PR descriptions, or Slack explanations, anything longer than a quick note, and your current tool caps you mid-month, this changes the constraint.

You're free to think out loud as long as you need to.