The 11pm Design Doc: Why Local Dictation Changes How Engineers Write

Backend engineers spend their days writing code, but they spend their nights writing specifications. A design doc at 11pm. A Slack thread explaining a settlement bug. A PR description that connects three different conversations. The work shifted from typing to talking through complexity, and most engineers haven't noticed they're now fighting their tools.

Marcus, a payment settlement engineer at a Series B fintech in Stockholm, hit this wall three months ago. He'd move to a design doc, start explaining system architecture out loud, and hit the word limit of his cloud dictation tool halfway through his thinking. The service would cut off. He'd lose the thread. By morning, the prose was fragmented, and he'd spend twenty minutes cleaning up what his own voice had produced.

The bottleneck isn't typing speed anymore. It's coherence.

The Shift From Code to Prose

Backend engineers used to have a clear division of labor: code in the afternoon, code review in the evening, maybe a brief Slack message. The new workflow flips that. You're in Cursor, you're describing to Claude what the settlement logic should do, you're explaining edge cases in a design doc, you're writing a follow-up thread that connects three different incidents.

Speech-to-text made sense as an accessibility tool. But for developers working with LLMs, it's become infrastructure. The problem is that most voice tools were designed for podcasters, not engineers explaining systems at midnight.

Developers have different constraints. If you're dictating an incident postmortem, you need to capture forty minutes of explanation without interruption. If you're drafting a design doc before review, you need the raw thought, not a polished sentence (you'll edit that later). If you're explaining settlement logic to a junior engineer, you need the recording, not just the transcript.

And there's a constraint nobody talks about: IP. Cloud-based transcription means your explanation of proprietary settlement logic, your exact error messages, your specific system architecture, travels to a server you don't control.

Word Caps as a Workflow Killer

This is where commercial cloud dictation tools hit their limit. Wispr Flow charges $14/month for uncapped access on free tiers. Willow charges $12/month. Superwhisper charges $8.49/month. All three meter free-tier access heavily.

The theory sounds fine: upgrade to unlimited. But the reality of a design doc at 11pm is different. You start explaining the system, you're deep in the logic, and the service stops listening. You have to restart the tool, re-establish context, and re-dictate what you just finished.

This breaks something subtle but critical: narrative momentum. The thinking is connected. By the time the service stops you, you've lost the thread that connected the concepts you were explaining.

For Marcus, this happened during a design doc. He was forty minutes in, mid-explanation of reconciliation logic, when the cap triggered. He restarted the tool and lost the coherence of the complete system explanation. What should have been one connected argument became two fragmented pieces he had to stitch together the next morning.

The Case for Local-First Dictation

This is where architecture matters. If dictation runs on your device, there's no variable cost per word. Whisper, the speech-to-text model released by Open AI in 2022 with support for 99 languages, doesn't charge you for each transcription. It runs once, locally, and returns text.

Recitey uses Whisper locally on your device. No caps. No word metering. No counting. You can dictate your entire three-hour incident investigation, your entire design doc, your entire customer explanation. The tool doesn't interrupt you because there's no cost structure that incentivizes interruption.

The trade-off is honest: local transcription is raw. Whisper will capture filler words, repeated phrases, maybe a misheard name. Cloud tools polish that transcript into something that reads like you meant it. But that polish comes at a cost, literally and structurally. You're now paying a subscription to avoid interruption, and you're also sending your company's proprietary systems explanations to a third-party service.

For developers, the local-first approach flips the value curve. The transcription is raw, but Recitey can then offer an optional cloud rewrite (using a more capable model) if you want to polish the draft. You control when data leaves your device. You control the word limit (which is zero). You control whether your IP explanation needs a trip upstream.

What Changes When the Cap Disappears

Marcus switched to local dictation three weeks ago. The first change was psychological: he stopped planning around limits. With a metered tool, you self-throttle. You think about how much time you've got left before the cap, you rush through the explanation, you hit record and race.

With no cap, you slow down. You explain the edge case you would have skipped before. You think through the full system architecture instead of the abbreviated version. The document is longer, but it's more complete.

The second change is in the editing workflow. Raw transcription means you're rewriting anyway. But you're rewriting from a complete thought, not a fragmented one. An engineer on our team described this moment: "I used to lose six hours the next morning reconstructing what I was explaining. Now I lose forty minutes cleaning up the transcription. That's actually sustainable."

The third change, the one Marcus noticed most: you don't stop talking. You don't check the metering. You don't have anxiety about how much text you've generated. It's transparent. The tool disappears. You're just thinking out loud into text.

Why IP Matters More Than You Think

For Marcus, the IP constraint was decisive. At a fintech, some systems explanations are sensitive. Settlement logic, error handling for specific regulatory scenarios, the exact way reconciliation systems approach edge cases. These aren't "trade secrets" in a legal sense, but they're also not something you want traveling to OpenAI's servers or any third-party provider.

Local-first dictation means Whisper runs on your device. Your device, your data. The explanation never travels upstream unless you ask it to (and if you do use the cloud rewrite option, you're doing it deliberately, not invisibly).

This isn't paranoia. It's a real architectural constraint that backend engineers live with. Most SaaS dictation tools make a conscious design choice to send everything to the cloud. Recitey makes the opposite choice: local first, opt-in upstream.

The Trade-Offs Are Real

The honest version: local dictation doesn't give you a polished sentence. Whisper will misheard some words. It won't perfectly distinguish between speakers. It won't flawlessly parse every acronym.

What it does give you: no metering, no caps, no delays, and no surveillance of your proprietary work. And for developers building in the age of LLMs, where voice has become the fastest way to explain complex systems, that's worth the editing pass the next morning.