What we measure — and why.

ConductorScore is measure of agentic engineering skill. It ranges from 0-100. Higher is better. It is computed from your last 30 days of Claude Code transcripts plus your GitHub commit history. The score is a sum of five components, described below.

The source code is at https://github.com/flatironconsulting/conductorscore

Leverage/ 60 pts

How hard the agents work for you. The dominant component — 60% of the score.

Sub-metric	Raw unit	Threshold	Ceiling	Max pts
Agent leverage Total agent time divided by your keyboard time. Includes parallel subagent minutes — running three agents at once for an hour counts as three agent-hours.	agent-minutes per human-minute (×)	1× (no leverage baseline)	7× total leverage	+45
Longest agent run Longest contiguous AFK streak in the window. Rewards trusting the agent to run without you.	hours	0 hr	4 hr	+15

Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to agent leverage and longest run, then summed. Log scaling means early gains are worth more per unit than late ones — going from 1× → 3× leverage earns more than 18× → 20× total.

Craft/ 18 pts

Absence of anti-patterns. Each anti-pattern contributes zero penalty below its penalty threshold and saturates at its penalty ceiling.

Sub-metric	Raw unit	Threshold	Ceiling
Coding without a plan Significant edits dispatched without a prior plan signal.	fraction of significant edits	15%	100%
Revert rate Edits the user immediately undoes after the agent applies them.	reverts per session	0.1	1
Repetitive prompts Pairs of user prompts whose content overlaps materially — usually a sign the first attempt didn't land.	fraction of long prompts	10%	50%
Rage quits Sessions ending in abrupt termination after a failed tool sequence.	per 100 sessions	1	20
Auto-compaction rate Times Claude Code had to auto-compact context — proxy for letting sessions sprawl past the context window.	events per 100k tokens	0.1	2
Redundant approvals Times the user re-approved an already-trusted tool signature in the same session.	per session	0.5	10

Formula: 18 × (1 − mean_penalty / 100). Mean is taken across the six anti-patterns above. Each anti-pattern maps linearly from 0 penalty at the threshold to 100 penalty at the ceiling, clamped above.

Customization/ 7 pts

Skill / MCP / plugin invocations — both how much you use them (volume) and how often you actually reuse what you install (depth).

Sub-metric	Raw unit	Threshold	Ceiling
Volume Absolute scale gate — prevents a user with 5 invocations across 1 skill from saturating.	total invocations across skills / MCP / plugins / custom commands	0	200
Depth Subtracting 1 floors single-use installs at zero — installing 50 skills and using each once gets zero depth.	average invocations per distinct surface − 1	0×	4×

Formula: 7 × volume × depth. Both factors required — gates tiny-volume saturation and window-dressing.

Output/ 10 pts

Shipped code over the 30-day window, counted from your CLI transcripts — commits and lines changed.

Sub-metric	Raw unit	Threshold	Ceiling	Max pts
Commits `git commit` calls counted from your transcripts (summed per provider).	git commit invocations	0	50	+5
Lines changed Lines edited, counted from your transcripts (summed per provider).	lines edited	0	3,000	+5

Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to commits and lines. Log scaling means early progress is worth more per unit than late progress — going from 0 → 10 commits earns more than 40 → 50.

Efficiency/ 5 pts

Cost per commit — the only money signal that survives the cache. Target $20/commit, saturates at $2/commit.

Sub-metric	Raw unit	Threshold	Ceiling	Max pts
Cost per commit Below the threshold you earn partial credit; at or below the ceiling you saturate at full points. Lower is better, so the threshold is the higher dollar value.	USD per authored commit	$35	$10	+5

Formula: 5 × clamp(0, 1, log(target/cpc) / log(target/floor))