What we measure — and why.
ConductorScore is measure of agentic engineering skill. It ranges from 0-100. Higher is better. It is computed from your last 30 days of Claude Code transcripts plus your GitHub commit history. The score is a sum of five components, described below.
The source code is at https://github.com/flatironconsulting/conductorscore
Leverage/ 60 pts
How hard the agents work for you. The dominant component — 60% of the score.
| Sub-metric | Raw unit | Threshold | Ceiling | Max pts |
|---|---|---|---|---|
Agent leverage Total agent time divided by your keyboard time. Includes parallel subagent minutes — running three agents at once for an hour counts as three agent-hours. | agent-minutes per human-minute (×) | 1× (no leverage baseline) | 7× total leverage | +45 |
Longest agent run Longest contiguous AFK streak in the window. Rewards trusting the agent to run without you. | hours | 0 hr | 4 hr | +15 |
Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to agent leverage and longest run, then summed. Log scaling means early gains are worth more per unit than late ones — going from 1× → 3× leverage earns more than 18× → 20× total.
Craft/ 18 pts
Absence of anti-patterns. Each anti-pattern contributes zero penalty below its penalty threshold and saturates at its penalty ceiling.
| Sub-metric | Raw unit | Threshold | Ceiling |
|---|---|---|---|
Coding without a plan Significant edits dispatched without a prior plan signal. | fraction of significant edits | 15% | 100% |
Revert rate Edits the user immediately undoes after the agent applies them. | reverts per session | 0.1 | 1 |
Repetitive prompts Pairs of user prompts whose content overlaps materially — usually a sign the first attempt didn't land. | fraction of long prompts | 10% | 50% |
Rage quits Sessions ending in abrupt termination after a failed tool sequence. | per 100 sessions | 1 | 20 |
Auto-compaction rate Times Claude Code had to auto-compact context — proxy for letting sessions sprawl past the context window. | events per 100k tokens | 0.1 | 2 |
Redundant approvals Times the user re-approved an already-trusted tool signature in the same session. | per session | 0.5 | 10 |
Formula: 18 × (1 − mean_penalty / 100). Mean is taken across the six anti-patterns above. Each anti-pattern maps linearly from 0 penalty at the threshold to 100 penalty at the ceiling, clamped above.
Customization/ 7 pts
Skill / MCP / plugin invocations — both how much you use them (volume) and how often you actually reuse what you install (depth).
| Sub-metric | Raw unit | Threshold | Ceiling |
|---|---|---|---|
Volume Absolute scale gate — prevents a user with 5 invocations across 1 skill from saturating. | total invocations across skills / MCP / plugins / custom commands | 0 | 200 |
Depth Subtracting 1 floors single-use installs at zero — installing 50 skills and using each once gets zero depth. | average invocations per distinct surface − 1 | 0× | 4× |
Formula: 7 × volume × depth. Both factors required — gates tiny-volume saturation and window-dressing.
Output/ 10 pts
Shipped code over the 30-day window, counted from your CLI transcripts — commits and lines changed.
| Sub-metric | Raw unit | Threshold | Ceiling | Max pts |
|---|---|---|---|---|
Commits `git commit` calls counted from your transcripts (summed per provider). | git commit invocations | 0 | 50 | +5 |
Lines changed Lines edited, counted from your transcripts (summed per provider). | lines edited | 0 | 3,000 | +5 |
Formula: maxPts × log(1 + raw) / log(1 + ceiling), clamped to [0, maxPts]. Applied independently to commits and lines. Log scaling means early progress is worth more per unit than late progress — going from 0 → 10 commits earns more than 40 → 50.
Efficiency/ 5 pts
Cost per commit — the only money signal that survives the cache. Target $20/commit, saturates at $2/commit.
| Sub-metric | Raw unit | Threshold | Ceiling | Max pts |
|---|---|---|---|---|
Cost per commit Below the threshold you earn partial credit; at or below the ceiling you saturate at full points. Lower is better, so the threshold is the higher dollar value. | USD per authored commit | $35 | $10 | +5 |
Formula: 5 × clamp(0, 1, log(target/cpc) / log(target/floor))