Chaos Auditor
Identifies fragile code paths — unhandled errors, env vars without defaults, untested functions, and single points of failure — and outputs an HTML resilience report.
SKILL.md
---
description: Audit codebase resilience and generate an HTML fragility report
allowed-tools: Bash, Read, Write, Glob, Grep
---
# Chaos Auditor
Systematically identify fragile code paths that would break under adverse conditions: missing error handling, unsafe environment variable access, untested critical paths, and single points of failure. Generate an HTML report with findings and severity ratings.
## Steps
1. **Scan error handling patterns.** Search the codebase for:
- `try/catch` blocks — check if catch blocks are empty or only log without rethrowing or handling
- `.catch(` on promises — check for swallowed errors
- Async functions without `try/catch` or `.catch` — unhandled promise rejections
- Callback-style error handling — check if error parameter is actually used
- `throw` statements — ensure thrown errors are caught somewhere in the call chain
2. **Audit environment variables.** Find all `process.env.*` or `import.meta.env.*` usage:
- Flag any usage without a default value or fallback (e.g., `process.env.DB_HOST` without `??` or `||`)
- Check if there is a validation layer (like `envalid` or a Zod schema) that catches missing vars at startup
- Cross-reference with `.env.example` to find undocumented variables
- Flag env vars used in critical paths (database connections, auth secrets) vs. non-critical (feature flags, log levels)
3. **Map test coverage.** For each source file in `src/`:
- Check if a corresponding test file exists (`*.test.*`, `*.spec.*`, or file in `__tests__/`)
- For files without tests, check the function length — functions over 50 lines with no tests are high-risk
- Identify critical-path files (auth, payment, database) that lack tests
4. **Detect single points of failure.**
- Find modules imported by >10 other files — a bug here cascades everywhere
- Find functions with no type safety on inputs (any-typed parameters)
- Find hardcoded URLs, IPs, or connection strings (instead of config)
- Find retry-less network calls (`fetch`, `axios`, database queries without retry logic)
5. **Check graceful degradation patterns.**
- Are there circuit breakers or timeout wrappers around external service calls?
- Is there fallback behavior when a dependency is unavailable?
- Are there health check endpoints?
6. **Generate HTML report.** Create `chaos-audit-report.html` with:
- Executive summary with overall resilience score (0-100)
- Four sections matching the audit areas above
- Each finding has: file location, code snippet, severity (Critical/High/Medium/Low), remediation suggestion
- Color-coded severity badges
- Collapsible sections for large finding lists
- Summary statistics and a donut chart showing finding distribution by severity
7. **Open the report.** Run `open chaos-audit-report.html`.
## Severity rating guide
- **Critical**: Unhandled errors in auth/payment paths, missing env vars for database/secrets with no startup validation
- **High**: Untested functions >50 lines in business logic, retry-less calls to external services
- **Medium**: Empty catch blocks, env vars without defaults in non-critical paths
- **Low**: Missing tests for utility functions, hardcoded non-secret values
## Rules
- Do not flag patterns that are intentionally lenient (e.g., optional feature flags).
- Always verify findings — read the surrounding code to confirm the issue is real before including it.
- Sort findings by severity, then by file path.
- Include a "Quick Wins" section at the top with the 5 easiest-to-fix high-impact findings.How It Works
This skill applies chaos engineering principles without actually breaking anything. Instead of injecting failures into a running system (which requires infrastructure), it statically analyzes the code to predict where failures *would* cascade. It answers the question: "If this external service went down right now, what would happen?"
The env var audit is deceptively powerful. In many Node.js applications, a missing environment variable does not crash at startup — it silently becomes `undefined` and causes cryptic failures later. By cross-referencing `process.env` usage with `.env.example` and checking for validation layers, this skill catches configuration landmines before they detonate in production.
The HTML report format with severity ratings and remediation suggestions transforms a daunting audit into an actionable checklist. The "Quick Wins" section is a psychological nudge — by surfacing easy fixes first, it builds momentum and makes the overall resilience improvement feel achievable rather than overwhelming.