ConductorScore_

What we measure — and why.

ConductorScore is measure of agentic engineering skill. It ranges from 0-100. Higher is better. It is computed from your last 30 days of Claude Code transcripts plus your GitHub commit history. The score is a sum of five components, described below.

The source code is at https://github.com/flatironconsulting/conductorscore

Leverage/ 60 pts

How hard the agents work for you. The dominant component — 60% of the score.

Sub-metricRaw unitThresholdCeilingMax pts
Agent leverage
Total agent time divided by your keyboard time. Includes parallel subagent minutes — running three agents at once for an hour counts as three agent-hours.
agent-minutes per human-minute (×)1× (no leverage baseline)7× total leverage+45
Longest agent run
Longest contiguous AFK streak in the window. Rewards trusting the agent to run without you.
hours0 hr4 hr+15

Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to agent leverage and longest run, then summed. Log scaling means early gains are worth more per unit than late ones — going from 1× → 3× leverage earns more than 18× → 20× total.

Craft/ 18 pts

Absence of anti-patterns. Each anti-pattern contributes zero penalty below its penalty threshold and saturates at its penalty ceiling.

Sub-metricRaw unitThresholdCeiling
Coding without a plan
Significant edits dispatched without a prior plan signal.
fraction of significant edits15%100%
Revert rate
Edits the user immediately undoes after the agent applies them.
reverts per session0.11
Repetitive prompts
Pairs of user prompts whose content overlaps materially — usually a sign the first attempt didn't land.
fraction of long prompts10%50%
Rage quits
Sessions ending in abrupt termination after a failed tool sequence.
per 100 sessions120
Auto-compaction rate
Times Claude Code had to auto-compact context — proxy for letting sessions sprawl past the context window.
events per 100k tokens0.12
Redundant approvals
Times the user re-approved an already-trusted tool signature in the same session.
per session0.510

Formula: 18 × (1 − mean_penalty / 100). Mean is taken across the six anti-patterns above. Each anti-pattern maps linearly from 0 penalty at the threshold to 100 penalty at the ceiling, clamped above.

Customization/ 7 pts

Skill / MCP / plugin invocations — both how much you use them (volume) and how often you actually reuse what you install (depth).

Sub-metricRaw unitThresholdCeiling
Volume
Absolute scale gate — prevents a user with 5 invocations across 1 skill from saturating.
total invocations across skills / MCP / plugins / custom commands0200
Depth
Subtracting 1 floors single-use installs at zero — installing 50 skills and using each once gets zero depth.
average invocations per distinct surface − 1

Formula: 7 × volume × depth. Both factors required — gates tiny-volume saturation and window-dressing.

Output/ 10 pts

Shipped code over the 30-day window, counted from your CLI transcripts — commits and lines changed.

Sub-metricRaw unitThresholdCeilingMax pts
Commits
`git commit` calls counted from your transcripts (summed per provider).
git commit invocations050+5
Lines changed
Lines edited, counted from your transcripts (summed per provider).
lines edited03,000+5

Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to commits and lines. Log scaling means early progress is worth more per unit than late progress — going from 0 → 10 commits earns more than 40 → 50.

Efficiency/ 5 pts

Cost per commit — the only money signal that survives the cache. Target $20/commit, saturates at $2/commit.

Sub-metricRaw unitThresholdCeilingMax pts
Cost per commit
Below the threshold you earn partial credit; at or below the ceiling you saturate at full points. Lower is better, so the threshold is the higher dollar value.
USD per authored commit$35$10+5

Formula: 5 × clamp(0, 1, log(target/cpc) / log(target/floor))