Bus Factor Report
Analyzes git history to calculate which parts of the codebase only one person has ever touched, generating an HTML heatmap of knowledge-concentration risk.
SKILL.md
---
description: Calculate bus factor per file/directory and generate an HTML risk heatmap
allowed-tools: Bash(git *), Write
---
# Bus Factor Report
Analyze git log to identify knowledge silos — files and directories where only one or two people have ever made changes. Generate an HTML heatmap report showing risk areas where the departure of a single person would leave the team unable to maintain the code.
## Steps
1. **Gather author data per file.** Run:
```
git log --format="%an" --name-only --no-merges
```
Parse the output to build a mapping of `file → Set<author>`.
2. **Aggregate by directory.** For each directory (first two levels, e.g., `src/auth`, `src/components`), combine the author sets of all files within it to get the directory-level bus factor.
3. **Calculate bus factor scores.** For each file and directory:
- **Bus factor** = number of unique authors who have made non-trivial commits (exclude commits that only change whitespace or formatting if detectable)
- **Risk level**:
- Bus factor 1 = `CRITICAL` (one person knows this code)
- Bus factor 2 = `HIGH` (fragile — one departure away from critical)
- Bus factor 3 = `MEDIUM` (manageable but worth monitoring)
- Bus factor 4+ = `LOW` (healthy knowledge distribution)
4. **Identify top contributors per area.** For each directory, calculate what percentage of commits each author made. Flag "dominant authors" who account for >70% of commits in an area.
5. **Detect recent risk changes.** Compare bus factor for commits in the last 3 months vs. all time. Flag areas where the bus factor has *decreased* (e.g., only one person has worked on it recently even though multiple people contributed historically).
6. **Generate HTML report.** Create `bus-factor-report.html` with:
- **Heatmap table**: directories as rows, columns for bus factor score, dominant author, total commits, risk level. Color-code rows: red (critical), orange (high), yellow (medium), green (low).
- **Treemap visualization**: nested rectangles sized by number of files, colored by risk level.
- **Top 10 single-author files**: the most critical individual files with only one contributor.
- **Author concentration chart**: bar chart showing how many directories each author "owns" (>70% of commits).
- **Recommendations section**: suggest pairing sessions, code review assignments, or documentation priorities based on the findings.
7. **Open the report.** Run `open bus-factor-report.html`.
## Rules
- Exclude files that are auto-generated (e.g., `package-lock.json`, `.lock` files, build output).
- Exclude authors with fewer than 3 commits to a file (trivial drive-by fixes should not inflate the bus factor).
- Handle author name variations gracefully (e.g., different git configs for the same person) — note if this might be skewing results.
- If the repo has fewer than 3 contributors total, note that bus factor analysis has limited value and focus on file-level knowledge concentration instead.How It Works
The "bus factor" is a well-known concept in engineering management, but it is rarely quantified. This skill turns it from a vague worry into a concrete, color-coded report. By mining git history, it identifies exactly which directories and files represent knowledge silos — areas where a single departure could leave the team stranded.
The "dominant author" analysis adds nuance beyond simple contributor counts. A directory might have five contributors, but if one person wrote 90% of the code, the effective bus factor is still close to one. The 70% threshold for flagging dominance surfaces these hidden risks that a raw unique-author count would miss.
The recommendations section transforms data into action. Rather than just presenting a scary heatmap, it suggests concrete next steps: pair programming sessions to spread knowledge, targeted code reviews to build familiarity, and documentation priorities for the riskiest areas. This makes the report useful for engineering managers planning team resilience, not just interesting for developers.