Whisper without the cap: why free dictation just changed for developers

The shift happened quiet enough that most of us didn't notice. A few years ago, developers typed code. Now, developers write intent.

The keyboard is the same. The bottleneck moved. You're not waiting for your fingers to catch up to your brain. You're waiting for your brain to articulate what the model should build. That's a different kind of writing. Longer form. Messier first draft. Higher cognitive load.

And voice, suddenly, makes sense.

The problem with metered voice

Most voice dictation tools (Wispr Flow, Willow, Superwhisper) work the same way: free tier is capped. Wispr caps at 2000 words per month. Willow caps at 1000. Superwhisper gives you 500 words per month free, then $8.49 per month to unlock more.

The reason is simple: cloud-based transcription costs money. Wispr outsources to a paid transcription API. Willow uses a metered backend. To make a sustainable free tier, they meter the user.

The logic makes sense from a business standpoint. The problem: you don't hit the meter at a convenient moment. You hit it mid-flow.

Marcus is a backend engineer at a Series B fintech in Stockholm. He writes design docs at 11pm when the office is quiet. Not because he loves late nights, but because that's when deep thought happens. Three weeks ago, he dictated halfway through a critical design doc: payment settlement architecture, nuanced decisions, the kind of thinking that takes coherence to explain. Word cap hit. Tool cut him off. He spent the next morning trying to reconstruct the second half, patching fragments together.

This happens to anyone doing long-form voice writing. Drafting a substantial PR description. Explaining a complex bug investigation in a Slack thread. Outlining architecture changes before a design review. One moment you're in flow, articulating reasoning. Next, you're paused, word counter shows red, and you've lost your train of thought.

The friction is real. The interruption costs more than a meter. It costs coherence.

Why local changes the game

Recitey runs Whisper locally on your device. Whisper is OpenAI's speech-to-text model, trained on 680,000 hours of multilingual audio (per OpenAI's Robust Speech Recognition via Large-Scale Weak Supervision paper). It's not cutting-edge by 2026 standards, but it's stable, accurate enough for code contexts, and crucially, it runs on your hardware. No API call. No cloud dependency. No metering possible.

Running locally means zero variable cost. No per-word charge. No quota. No cap.

The difference is structural, not just a pricing model change. When transcription costs nothing per word, the service model becomes different. You're not paying for the transcription itself. You're paying (if you pay at all) for the next step: the cloud-based rewrite that turns rough dictation into polished prose.

For Marcus, this matters because of what he refuses to do. His company handles regulated financial data. He won't send code snippets, design rationale, or architectural details to a cloud service that doesn't have explicit security guarantees and clear data policies. Some of that is compliance. Some of it is principle: if a tool doesn't show its privacy practices in granular detail, and if there's a device-local alternative, he picks the device-local one.

The Recitey free tier gives you dictation, local, no cap. You can dictate as much as you want. The free tier doesn't include the cloud rewrite step (that's in Pro), but for many workflows (design docs, PR descriptions, Slack threads, GitHub issue comments, Notion updates), the raw Whisper output works. You want the thinking captured first.

A new meter: the workflow one

Here's what surprised Marcus: he expected voice dictation to be faster. Instead, it was better. Not because it was faster in terms of words-per-minute, but because it didn't interrupt.

He dictated a 1400-word design doc about payment settlement latency and transaction finality in one continuous session. No meter. No pause to check the word counter. No anxiety about approaching a cap. Just unbroken flow from voice to text.

The prose was rough. Whisper caught most of the words, but he had to clean up maybe five sentences the next morning. A few misheard words, a couple of run-ons. That's expected. What was different: he didn't have to reconstruct half the document because a meter had interrupted mid-thought and forced him to continue in a second session.

When you're writing intent or reasoning, rough is acceptable. Fragmented because the tool ran out of quota? That breaks the model. It breaks the continuity of thought. You lose the context of what you were building toward.

The trade-off

The Recitey free tier does one thing: speech-to-text, locally on your device, uncapped. It doesn't rewrite. It doesn't grammar-check. It doesn't integrate with a Slack backend or anything else. Just dictation to your clipboard, everywhere on Windows: Slack, email, browsers, Linear comments, terminal, anywhere you can paste.

It's not a writing assistant. It's a dictation engine.

If you need the rewrite step (the cloud-based polishing that turns rough speech into publication-ready prose), that's available in Pro. Some workflows benefit from that. Editorial work. Customer-facing documentation. Anything where the first draft needs heavy cleanup. For those, the Pro tier adds value.

But for the workflows where you're dumping intent onto the page (architecture rationale, code review observations, bug reports, design docs), the local dictation alone often solves the entire problem. You want it captured. You want it uninterrupted. You don't need it rewritten by a model.

For Marcus, this shifted his workflow. Design docs are now dictated first, edited second. Slack threads that explain bugs, same. He's not faster (voice isn't faster than typed prose for him), but he's more coherent. And he's not losing his thinking mid-flow because of an arbitrary meter.

Why this matters

The tools we build for developers should understand how developers work now, not how they worked five years ago. Metering dictation made sense when the only users were people transcribing meeting recordings. It doesn't make sense when the users are engineers articulating complex intent to language models, writing design docs, explaining technical decisions.

The bottleneck shifted. Developers need voice for the thinking-capture phase, not the transcription phase. Tools that meter that phase at the free tier are locking out exactly the people who need voice most: the ones whose work is long-form, intent-heavy, and requires unbroken flow.

Local-first transcription isn't new. But free, uncapped, local transcription (running Whisper on your device with zero variable cost) while the alternatives cap at 500 to 2000 words per month? That's a structural difference. That's a different offer.

The 11pm design doc should not be interrupted by a word counter. The moment you're in flow should not be metered. That's the principle. Recitey's free tier honors it.