Free Tier Is Local Whisper, Uncapped

Most voice dictation tools charge you before you figure out if you actually want voice in your workflow. The free tier has a word cap: 600 words on Wispr, 500 on Willow, a couple thousand on Superwhisper before the paywall kicks in. You hit the limit mid-thought, and the flow breaks.

The Shape of the New Development Workflow

You're not typing code the way you did five years ago.

That's not a value judgment. It's an observation about what changed. The keyboard still does the same thing. Your fingers still move. But what flows through the keyboard isn't syntax anymore. It's intent.

You write PRs now: "Refactor the payment settlement retry logic to handle idempotency keys, because we're seeing duplicate charges on 2% of high-volume merchants, and the current exponential backoff is too aggressive." That's a paragraph. You explain the bug in the Slack thread that's going to thread through Linear and into your next standup. Another paragraph. You're documenting the design decision in Notion at 11pm because you're going to forget the constraint if you don't capture it now: "The gateway can't be swapped without a migration that touches 47 microservices, so we're shimming the new logic in the payment processor instead."

The bottleneck isn't typing speed. The bottleneck is that typing intent takes time your brain would rather spend on the intent itself.

Why Word Caps Are Invisible Paywalls

A word cap doesn't feel like a price. It feels like a feature limit.

But it's actually a paywall. It's a way to charge you not for the tool, but for permission to keep thinking.

A design doc at 11pm hits flow. The decision tree is clear in your head. You're dictating context, the reasoning, the constraints you just realized matter. You're at 580 words when Wispr stops transcribing. The word counter says "Upgrade to continue." You can type the rest. You will. But the flow is broken. The voice changes on the page. You'll come back the next morning and clean it up, and you'll lose 20 minutes to it, and you'll have three other design decisions that need the same treatment.

That's not a feature limit. That's a business model wrapped in UX.

Recitey doesn't meter the free tier. It doesn't have a word counter in the free tier because there's nothing to count. The first 600 words, the next 600, the next 3000, they all cost the same: zero. Whisper runs locally. You own the compute. The company doesn't pay a per-word fee to a speech API somewhere. The free tier doesn't hit a wall. You do it until you want to polish the raw output into a rewrite.

Local Versus Cloud Is a Technical Decision, Not a Privacy Sentiment

The privacy angle matters to some users. It matters to Marcus.

Marcus is a backend engineer at a payment settlement fintech. He's not paranoid about surveillance. He's practical. When he's writing a design doc about a retry algorithm for payment idempotency, the specific code paths and edge cases in that document are his company's IP. Not in a "corporate paranoia" way. In a "if a competitor's engineer could read the specific constraints we're working within, they'd know the exact gap in our settlement speed compared to theirs" way.

He uses Wispr Flow right now, but he limits voice to public Slack threads and public Linear tickets. For the private design docs, he switches back to typing. It's a context switch. It kills the flow of the dictation workflow. He's thought about Otter.ai, which has enterprise features, but that's another subscription, another integration, another data agreement to read.

Local Whisper changes the equation. The design doc stays on his device. The audio transcription stays on his device. The decision tree lives in Notion, but the raw voice input never left his laptop. That's not ideology. It's a structural property of the tool. He can use voice for everything without thinking about whether some infrastructure somewhere is seeing code.

It's also faster. Latency matters less. Whisper-large-v3 hits 96.3% word accuracy on LibriSpeech (a standard speech recognition benchmark), and there's no network round-trip. No waiting for a cloud queue. It's transcribe-on-keystroke-release.

The Prosody Shift: Voice Is for Intent, Editing Is for Rewrite

Here's the mental model that actually works with voice.

Voice is not perfect dictation. It's not meant to be. You're not aiming for 100% accuracy on the first pass. You're aiming for fast capture of the thinking, while the thinking is happening. Your first-pass prose will have dropped words, repeated fragments, grammar that makes sense when you hear it but looks odd on the page.

That's the deal with voice.

The second deal is the rewrite. That's where the editing happens. Recitey's Pro tier handles that: the local voice capture is free, but the cloud rewrite, the polishing that turns raw dictation into client-ready prose, that's paid. You're paying for the service of making it clean. Not for the permission to keep talking.

It's a fundamentally different paywall model. One charges you for the tool. One charges you for the polish. One is targeting developers in the drafting phase. One is targeting professionals who need publication-ready output from day one.

The Trade-Off You're Making

The trade-off is worth saying out loud.

Local Whisper is never going to be as polished as a cloud system that can hear context and fix grammar and understand what you meant to say under the fumbling. Cloud systems have more data, more compute, more latency to think. Local is fast and private and structurally simple. It's not magic. It's a tool that works well at a specific thing (capturing intent while dictating) because it's not trying to do every other thing.

For Marcus, that's the tool. For someone who needs perfect transcription without any editing, cloud makes more sense. For someone who's running Cursor and writing intent-heavy design docs and PR descriptions and Slack threads explaining edge cases to the team, local Whisper without a word cap is the right call.

The free tier isn't a demo. It's the real tool.