You're drafting the design doc at 11pm. Thirty minutes in, mid-explanation of the tricky part, and the word cap hits. Everything after that point doesn't get transcribed. The next morning, you're rewriting fragments because you lost the context. This is the new shape of technical work, and every dictation tool prices around it.
The bottleneck moved
The work shifted. You used to type code. Now you're typing intent.
Long prompts for Claude to understand the architecture you're designing. "Here's what I want to build, here's why the naive approach breaks, here's what I care about." Long specifications for Cursor to implement a feature from scratch. Detailed code reviews that explain the reasoning behind every change, not just the mechanics. Design documents that walk through security assumptions and trade-offs.
That's where voice actually wins. Your mouth moves faster than your fingers. You can explain context while it's fresh in your head. The thinking stays intact when you're speaking instead of typing. You catch yourself mid-thought, correct course, and keep building the idea. You don't have to stop and rewrite.
But every commercial dictation tool caps the free tier. Wispr's free tier comes with a word limit. Superwhisper's too. Willow also meters. All are paid-first models; the free tier is designed as a trial, a taste of what paid unlocks. The message is clear: use voice enough to want more, then pay.
Why the cap exists
Cloud-based transcription has a simple cost structure: you pay per audio second. Every second of audio that arrives at a server, a model runs on it. That costs compute. That costs money. Variable money, scaled with usage.
Wispr charges $14/month for their paid tier. Superwhisper's $8.49/month. The pricing model reflects that infrastructure reality. When you transcribe audio to a server, the company is burning GPU time. So they meter it. The free tier cap isn't a product decision; it's a cost-recovery requirement.
It's not arbitrary. It's the only way to offer anything free while protecting margin. If they didn't cap, and someone dictated 50 hours of audio a month, the infrastructure bill would exceed the lifetime revenue from a free user. So the cap exists. And it breaks your flow at 1200 words, or 1500, or 2000. Pick your limit.
Marcus's moment
Marcus is a payment systems engineer at a Series B fintech in Stockholm. He's been using voice for design docs for six months because Cursor's tab-complete works so well. Rough dictation comes in. Cursor's autocomplete refines it into structured prose. It's genuinely faster than his old typing workflow.
But he refuses to use cloud-based transcription. His code stays local. His bank's specs stay local. He's not paranoid, but he's reasonable: a 2,400-word design doc that explains a new payment settlement algorithm has IP value. It's his company's competitive edge.
A session like that, the kind that clarifies a tricky algorithm or explains security assumptions, would exhaust the free tier cap on Wispr or Willow before he finished. So he's been stuck. Voice is objectively faster for the kind of thinking he does. Every commercial tool that charges per word feels like a step backward from the typing workflow he left behind. He's tried recording to a local voice memo, but then he has to transcribe it separately. Friction everywhere.
The structural difference
Whisper runs locally on your device. Zero variable cost per word. No audio streaming to a server. No infrastructure cost scaled with usage. No data leaving your machine.
That's the technical foundation for an uncapped free tier. Not as a trial. As the actual product.
Recitey uses Whisper. No word counter. No surprise cutoff at 1500 words mid-thought. No "you've used your monthly allotment, upgrade to continue." The product difference isn't a feature. It's a cost structure. Local compute means no metering. No metering means no cap.
Pro tier (paid) adds cloud rewrite: Recitey polishes rough voice output into grammar-checked, punctuated prose. But the dictation itself is free and uncapped. That's the actual inversion of how every other tool structures pricing.
What changes
When the word cap is gone, the flow doesn't break. You hit the moment you're explaining something complex, and you stay in that moment. Fifteen minutes. Twenty minutes. No interrupt. The rough draft is messy. Contractions. Fragments. False starts. But the ideas are intact. The reasoning is there, even if the grammar isn't.
The next day, the cloud rewrite pass (optional) turns it into readable prose. But you didn't lose the thought mid-draft. That's the difference between "voice helps with speed" and "voice changes how you think about explaining things."
It's the difference between reaching for voice as a tactical tool and reaching for voice because the tool fits your brain. Because the constraint is gone.
The tradeoff
Whisper's base model runs locally, not the latest version from OpenAI. It's less accurate on heavy accents. Less accurate on background noise. Less accurate on technical jargon at the edges. It hallucinates less than cloud models, actually, but it misses more context.
It's also completely sufficient for what Marcus actually does: engineer dictating design decisions and architecture notes, not transcribing customer calls or legal recordings or conference talks. There's no free lunch. But there's also no artificial cap in the way. No choice between "use voice properly" and "stay under the word limit." Just: use it.
The word cap isn't a technical constraint. It's a pricing structure that doesn't fit how developers actually work now. Uncapped free doesn't mean you'll never hit friction. It means the friction isn't artificial. It means you can test an idea at full speed, instead of rationing your words per month.