[email protected] 108 days, then 54 more · 2026

I audited
my own hype.

A reproducible measurement of AI's effect on my own output, with the method and every number in the open.

I'd been throwing around 20 to 50 times for what Claude did to my output. A friend said prove it, so I measured it instead of guessing, and it came in under that: 12 to 20 times against a real peer. Five months in, that number turned out to be the least interesting thing I found. The interesting part is where the work moved. Claude made making cheap, so it took that. What's left is the part it can't do for me: deciding whether what it built is right.

For fifteen years I designed on an instinct I couldn't explain: people decide in the first second, before they know why. It worked, but I couldn't have told you the rule I was using. Writing the methodology down is when I finally could. It turned the instinct into rules, ones someone who isn't me can run and get the same answer. This audit is that same move pointed at myself. I'd been making a claim, so I checked it.

At a glance

The work moved from making to verifying.
That's where the next multiplier is.

AI multiplied my output, then my pace fell 40% as I came off a sprint. The multiple is real, and I only ever corrected it down. Speed turned out to be the boring part.

Jan 1 – Apr 18 the 108-day audit

12–20× against a real AI-equipped peer, corrected down from my 20–50× brag. 4,861 subagent dispatches, a category of work that didn't exist before the tool.

Apr 19 – Jun 11 54 days later

Commit velocity fell 40% per active day, then climbed back inside brand-new tooling repos. Output held; the work moved up the stack to verification.

Every number is from git log, the filesystem, and the session transcripts. Swap any weight and re-run.

The audit window · three measures

The brag,
measured three ways.

Start with the brag, measured honestly. Every number here comes from git log, the filesystem, and the JSONL session transcripts; the Python script and raw CSVs sit in the linked folder, so swap any weight and re-run. Three ways of asking the same question.

Artifact-weighted

12×

range: 8× – 18×

What I shipped vs. a top-decile solo peer using non-Claude AI (Cursor, Copilot, Sonnet) in the same 108 days.

Activity-volume

20×

range: 10× – 30×

All the work (my labor + parallel agents + tool-call savings, dedup'd against double-count) crammed into 108 days.

Per keyboard-hour

19×

range: 8× – 35×

Human-work-equivalent hours produced per hour I was at the keyboard, with stricter per-tool-call savings (1 min avg, not 5).

Plain version: every hour I was at the keyboard, about 19 human-work-equivalent hours came out. Over 108 days, that's roughly 2,156 workdays of throughput against my 728–1,092 keyboard-hours (91 active days × 8–12 hrs each). The 20–50× rhetoric holds only against a "no-AI 2026" baseline, which doesn't describe anyone actually working. Against real peers with AI, ~12–20× is the honest answer.

Method

The three formulas.

1. Artifact-weighted Equivalent Human-Days

Each ship category gets a weight: how many workdays a solo principal designer-dev without AI would need for one of them. I calibrated against my own pre-2025 pace. Simply Smart Home: $1.5M to $5M. Tantum: 3 startups a year. iO Theater: online ticket sales, 50% to 75% of volume.

// Artifact EHD Stefan_EHD = Σᵢ (count_i × weight_i) Baseline_EHD = same formula on solo-designer baseline output Multiplier_artifact = Stefan_EHD / Baseline_EHD

2. Activity-volume (compressed parallel work)

Counts what a single human physically cannot do serially. Parallel subagent dispatches, tool-call time savings, my own labor. All summed into one equivalent-human-workdays number, divided by 108 calendar days.

// Activity EHD total_work_hrs = (subagents × hrs_parallel) + (tool_calls × min_saved/60) + Stefan_labor_hrs Multiplier_activity = (total_work_hrs / 8) / 108

3. Per keyboard-hour output

What most people mean by "output multiplier." Per hour I'm at the keyboard, how many human-work-equivalent hours come out, including the agents running in parallel. This is the one I trust most.

// Per keyboard-hour total_output_hrs = Stefan_labor + agent_parallel_work + tool_call_savings Stefan_kb_hrs = active_days × hrs_per_active_day Multiplier_per_hour = total_output_hrs / Stefan_kb_hrs

Data inputs

What I shipped,
what I measured.

Every number comes from three sources. git log across all my repos. filesystem scan of clients, plugins, docs, chapters. JSONL parsing of 7,189 Claude Code session transcripts across 15 project directories (I run Claude Code from multiple cwds; the original single-dir scan missed ~30% of my activity).

Shipped artifacts (in window)

Category	Count
Enterprise-tier sites / engagements	7
Portfolio-tier sites shipped	2
Sites in active build (≥50%)	3
Provisional patents filed (77 claims drafted)	4
Trademarks filed with methodology framework	1
Book chapters drafted (publication-quality)	12
OSS plugins / skills / framework	8
Cognograph MVP (Electron + React + WebGL, 1,508 tests, 47 state stores)	1
Plans / substantive docs written	75+

Activity volume

Signal	Count
Git commits (Stefan-authored, 9 repos)	1,578
Lines of code added (gross)	3,018,688
Active commit days / 108	80
Active days across git + Claude Code / 108	91
Claude Code sessions (main + subagent, 15 project dirs)	7,189
Tool calls	220,269
Subagent dispatches (parallel work units)	4,861
Input + output tokens	128.3M
Cache-read tokens (context reuse)	40.4B

2024 same-window baseline on every activity axis: 0. Claude Code did not exist in this form.

Data nugget: cognograph-02 cwd contributed 1,799 sessions in window but zero subagent dispatches. That project pre-dated my Agent-tool adoption. Subagents only show up in the workspace cwd starting March. Cache-read / I+O ratio is ~315×, which you'd expect from heavy long-session context persistence via the prompt cache.

Disclosures

What I checked,
what I skipped.

The "20–50×" phrase was mine, not Claude's. It was a casual brag, not a number anyone measured. Claude flagged that line as marketing filler and cut it from a draft, which is what sent me to the logs. This audit is the first time anyone ran the math on it. I didn't build it to back the brag up. I built it to find out if it held.
Every weight is in the open. Counts are objective (git, filesystem, JSONL). Weights are subjective (workdays a principal solo would need per artifact). If you don't like my weights, swap them in. Artifacts in the companion folder: session-audit.py (streaming parser, per-line JSON, handles 30M+ rows), session-audit-raw-multiproject.csv (per-session rows), audit-totals-multiproject.json (aggregates), plus 01-git-audit.md / 02b-session-audit-multiproject.md / 03-ship-artifacts.md.
Four rounds of self-correction. Every time I pushed back on the first pass, the number went up, not down. First pass weights were at median-senior instead of principal-tier. The per-hour measure initially left out agent work (structurally wrong). The ship list got expanded three times because I kept remembering clients Claude had missed. When every correction moves the same direction, the first pass was just too conservative to start.
Cognograph's weight is a lowball on purpose. A solo designer-frontend without AI isn't shipping Cognograph (Electron + React + custom WebGL shader pipeline + SaaS backend, ~50K LOC source) in 108 days. Realistically impossible. I capped the weight at 250 workdays as a proxy, because "impossible" doesn't compute.
Baseline choice drives half the number. Against "solo designer-frontend without AI, 2026" (a hypothetical nobody actually is), the artifact multiplier is 19–41×. Against "top-decile solo peer with Cursor + Copilot + Sonnet" (what your competitors actually use), it's 8–18×. I default to the stricter baseline in the headline because the no-AI comparison is a ghost. Calibration anchor for both: my own pre-2025 pace. Simply Smart Home drove 233% YOY, Tantum shipped 16 startups on broadcast timelines, iO Theater online ticket sales moved from 50% to 75% of volume.
Confounders I can't control for. Some output credit belongs to (a) 15 years of prior expertise (PFD is assembled from stuff I already knew, not invented in window); (b) Cognograph's pre-2026 foundations; (c) urgent runway, which changes working hours and focus regardless of tooling. "Claude's multiplier" is entangled with "Stefan's 2026 conditions." The numbers don't disaggregate that.
Activity-volume has a double-count risk I corrected for. Subagent dispatches execute their own tool calls, so those tool calls appear both in the 220,269 total AND as subagent parallel work. The stricter read dedup's ~50% of tool calls against the subagent bucket and cuts per-call time savings from 5 min to 1 min. Without that correction, the activity multiplier is 39× (inflated); with it, 20×.
The fact that's hardest to argue with: 4,861 subagent dispatches in 108 days. Work that literally didn't exist as a human-possible category before the tool shipped. That alone is ~1,800 workdays of parallel execution, nearly 17× one person's 108-day calendar capacity. Before I count anything I did myself.
This was a snapshot, and I went back to check it. Twice. See the postscript below: 46 days later, raw velocity was down about 40% per active day and the work had moved from execution to verification; 8 days after that it was climbing back, almost entirely inside new tooling repos. The headline number is unchanged.

The turn · 54 days later

Then I
slowed down.

The audit above closed on April 18. By June 3 I had another 46 days of logs, so I ran the same queries again, and on June 11 I ran them a third time. A number you report once and never check again is just PR. The re-runs showed the part the first audit was too early to catch.

Raw velocity dropped, coming off a sprint.

In the 108-day audit window I committed at about 19.7 commits per active day and 14.6 per calendar day (that's in 01-git-audit.md, not new math). Across the 46 days since, the same git query across 15 repos gives 370 commits over 32 active days: about 11.6 per active day and 8.0 per calendar day. That's roughly 40% fewer commits per day I actually worked, and my active-day density fell from 74% to 70%. By the crude "how fast is he going" reading, I slowed down by a third.

I'm putting that up front, and I'll be straight about why it dropped. Most of it is me coming off a sprint. The first four months were 8 to 14 hour days, back to back, to launch Cognograph, and I ran hot because I was enjoying it, the flow and the learning and the challenge. That pace doesn't hold, so I let it come back to normal. The work changed too: the sprint was heads-down app-building, which throws off commits by the hundred, and once Cognograph launched I moved to client work, consulting, and keeping production sites running, which doesn't. Commit count was never the point anyway.

Commits / active day

11.6

audit: 19.7 · last 8 days: 15.3

Down ~40% per day worked, then a third of the way back. Slower hands, by the raw count.

What shipped in the gap

client sites rebuilt or advanced

Plus a public QA pipeline, a four-plugin agency toolchain, and a multi-tenant platform. Volume held; scope widened.

Where the work moved

execution → orchestration

Fewer one-off commits, more compounding tools that do the next ten jobs for me.

Output didn't fall. It moved up the stack.

Fewer commits, more leverage. In the same 46 days I rebuilt or advanced six client sites, moved several long-standing clients onto a new production stack, and spent a real chunk of the time building tools instead of pages: aurochs-qa-pipeline (open-sourced, public on GitHub), a four-plugin toolchain that runs my voice, scrub, SEO/GEO and recall workflows, and a multi-tenant platform to host client sites. None of that throws off many commits for the hours it eats. So some of the drop is the work moving to tooling. Most of it is just fewer hours.

Then it climbed back, as tooling.

Eight more days, third run, June 11: 92 commits across 6 active days, about 15.3 per active day, up a third from the 46-day low. Before extending the window I re-ran the published one as a check; it reproduced within 1.5 percent (375 commits across 16 repos against the 370 across 15 I reported; one small repo joined the walk). The interesting part is where the rebound lives: 75 of the 92 commits sit in repos that did not exist on June 3. An operations dashboard whose plan went through adversarial review before a line of code existed. The review caught three real defects on paper. My operating-rules kit, the corrective record this essay keeps pointing at, turned into an installable plugin that verifies its own install. A fresh agent with no memory of me ran it cold and passed every check. And a client portfolio site that went from sit-down to live on the client's own domain in a single working day, because the scaffolding already existed. The velocity that came back isn't the sprint returning. It's the spec-and-verify layer paying for itself.

The bottleneck moved.

The making got fast. The finishing didn't. The part where "plausible" has to become "correct" is exactly as slow as it always was. The hard part of every pipeline I run now isn't building the thing, it's checking the thing: the QA and error-correction layer, the polish, the last mile. On a recent enterprise WooCommerce maintenance pass the build was quick, because the pipelines work, and then three days of refinement followed anyway. The win there isn't killing the iteration, it's cutting it in half. That gap, fast to make and slow to check, is where my week goes now.

Here's a concrete one. A few weeks back I built a site for a brand I know inside out, one I'd had in my head for a long time but never had the budget or the say to make. I handed my pipeline the context I'd kept, the brand, the products, the photography, and said build what I always pictured. It came back in under an hour, close enough to production that it just needed polish. The three days went to the polish, mostly rebuilding the image pipeline until the lifestyle shots looked real. Six months ago, before I'd built the scaffolding, the same prompt wouldn't have come back like that. The generation was the fast part. The finishing was the work.

The car essay argued this. The logs are the receipt.

In "You Wouldn't Vibe Code a Car" I argued the skill moves out of execution and into quality control on both ends: knowing what good looks like before anything gets built, and telling whether the output hit that mark after it's rendered. The 54 days bear that out. Not in the commit count, that dropped mostly because I stopped overworking, but in where the remaining hours went: building tools and checking, the spec-and-verify ends. When the middle gets cheap, the value moves there.

So the recurring hard problem is automating the check, not the build.

Which raises the real question: if checking is the bottleneck, can the check itself be derived and automated, the way the building already is? I think yes, more often than people expect. The week of the third run kept answering it: a plan reviewed adversarially before any code existed, an install that checks itself, a nav-and-footer cleanup proven by file hash instead of eyeballing, a certificate watcher whose alarm path I failure-tested end to end.

Where it nets out: velocity fell about 40% per active day coming off the sprint, then climbed back inside new tooling repos, output held, and the audit-window 12 to 20 times still stands, a number I only ever corrected down. The thing worth keeping, though, is where the remaining hours went: out of making, into judgment. And judgment is the thing I've spent fifteen years learning to write down. Make the part people call taste explicit, and checking it gets as cheap as making it. The human stops rendering and starts specifying, checking, orchestrating. That's a bigger job than building ever was, and it's the one I've spent fifteen years getting ready for.

Method note: post-audit figures from git log across the 15 repos I committed to between April 19 and June 3 (six carry over from the original audit's nine; the other nine are new since, mostly the tooling and platform I built in the gap): 370 commits, 32 active days, every commit my own. Audit-window baselines (19.7 per active day, 14.6 per calendar day, 80 active commit-days) are unchanged from 01-git-audit.md. Third run 2026-06-11: the published window reproduced within 1.5 percent (375 commits, 16 repos; one small repo joined the walk), then June 4 through 11 added 92 commits over 6 active days, 15.3 per active day, across three repos new since June 3 plus four carryovers. Same as before: swap any input, re-run.

Full audit · 00-FINAL-multiplier-report.md · companion receipts in the same folder.

I auditedmy own hype.

The brag,measured three ways.