Uncapped voice, local-first: the only thing developers actually need from dic...

Uncapped voice, local-first: the only thing developers actually need from dictation

You're 11pm deep in a Cursor design doc, explaining how your payment settlement engine should handle retry logic. The words are flowing. You're thinking out loud, and it's faster than typing.

Then the dictation tool hits its word cap.

You switch back to typing. The thinking breaks. By morning, your spec reads like it was written in fragments because it was. You clean it up, but the momentum is gone, and the precision edge you had at voice-speed is lost.

This is the problem with every mainstream voice tool today. They're priced like they're solving a problem from 2015: making typing faster. But the bottleneck shifted. The real friction for developers isn't input speed. It's intent clarity.

The real work moved: from code to intent

AI development changed the shape of the problem. You don't spend your time typing code anymore. You spend it writing prompts. Specs. Design docs. Linear ticket descriptions. GitHub PR templates. The output is longer, more detailed, more frequent.

Typing is fine for short messages. But for a 15-minute design doc that explains why you're building something a certain way, voice is fundamentally better. It's the difference between composing and transcribing. When you're composing, you think in sentences and reasoning. Your mouth naturally produces the structure you need.

The problem is that every paid voice tool wraps the free tier in a cage. Wispr Flow charges $14 per month for unlimited. Willow charges $12 per month. Superwhisper costs $8.49. All of them cap the free tier, often at 600 to 1,000 words per month.

That's not a technical constraint. That's a business model. And for developers, it breaks the workflow at the exact moment it starts to work.

Why cloud tools gate the free tier (and why it matters to you)

Cloud-based speech-to-text has a structural cost: every word that comes through the API costs money. You're charged per API call, per hour of audio processed, sometimes per word transcribed. Amazon Transcribe bills you. Google Cloud Speech-to-Text bills you. Whisper API, if you use OpenAI's hosted version, bills you.

That cost is real, but it's not huge. A megabyte of audio costs fractions of a cent. But SaaS companies don't price based on cost; they price based on perceived value and willingness to pay. Cloud-first voice tools cap the free tier because the business model requires you to upgrade to pro for serious use.

For code IP, there's a second problem: your voice data, your audio, goes to someone else's server. Some companies promise deletion after processing, but you're still sending the raw audio over the internet and trusting the deletion. For payment settlement code, for financial engineering, for anything proprietary, that's a risk you have to take with yourself.

What changes if dictation runs locally

Recitey runs Whisper locally on your device. The speech-to-text happens on your machine, with zero variable cost per word. No API calls. No data leaving your device. No per-word billing. No free-tier caps.

That's not marketing hyperbole. It's a structural difference in the tech stack.

Locally-run Whisper has a one-time computational cost on your device, but it's cheap. Whisper-large costs a few gigabytes of disk space and a few seconds of latency per audio chunk. That's it. No metering. No word counter. No "you've hit your limit for this month" dialog.

The pro tier isn't for dictation. The pro tier is for the rewrite: the cloud-based grammar polish that happens after the local transcription, turning rough voice notes into polished prose in about 2 seconds.

For a developer, this means: voice your way through a design doc at 11pm without watching the word meter. Finish the thought. Get it clean.

How this works in the Marcus workflow

Marcus is a backend engineer at a Series B fintech in Stockholm. He uses Cursor because the Cursor tab-complete reduces the number of voice rewrites he has to do. He's skeptical of cloud transcription because his code is proprietary, and he refuses to send audio of his design reasoning to someone else's server.

On Monday at 11pm, Marcus is design-documenting a new settlement batching algorithm. He switches to voice.

With a capped free tier: he gets 20 minutes of speech, the free tier maxes out, he either upgrades or he stops.

With Recitey: he talks for as long as he needs. The local Whisper does the transcription. It's rough. He hits the pro toggle. The cloud rewrite polishes it in about 2 seconds. He pastes it into the Notion doc. He's done thinking, so the thinking-to-draft cycle closes.

He also uses voice in Slack threads explaining investigations, in Linear ticket descriptions, in PR review comments. Every place he used to type an explanation, he now voices it. Cursor tab-complete suggests code from the intent he voiced. It's the same keyboard, but more words per minute because the bottleneck wasn't his fingers, it was his ability to compose while thinking.

The uncapped free tier is not a loss-leader; it's not a business strategy to gain users and hope they upgrade. It's a feature. Developers who need limits can use the capped tools. Developers who need the freedom to think should have it.

Uncapped means you own the choice

Most premium voice tools assume you'll hit the limit eventually and upgrade. Recitey assumes you'll hit the limit, shrug, and find a different tool if that tool costs money for the constraint.

That's backwards, but it's also honest about the market. If you're pricing dictation, you're not pricing the transcription technology. You're pricing the distribution, the UI, the server bills, the support. All of that lives in the cloud rewrite and the polish, not in the dictation itself.

Developers get that. You choose your tools based on what they do and what they cost, not based on what level of paternalism you'll accept. If a tool won't show you what it's running locally, what model it's using, or what data leaves your device, you don't trust it.

Recitey does the opposite. It runs Whisper locally. It tells you what it's doing. It doesn't meter you.

That's the differentiator. Not speed, not accuracy. Freedom. The freedom to think out loud without watching a word counter, and without sending your design reasoning to the cloud.