Skip to content

Quality Assessment

Estimated time: ~30 minutes

Prerequisite: Module 8.3 (Context Confusion)

Outcome: After this module, you will have a systematic approach to assessing Claude’s output quality, a personal checklist for acceptance criteria, and know when to push for better vs. accept “good enough.”


Claude produces code. It runs. Tests pass. You merge it. Next week, a colleague asks “Why is this function 200 lines?” You realize you accepted Claude’s first output without EVALUATING it.

The code worked but wasn’t good. Quality assessment bridges the gap between “it works” and “it’s good.” Without it, you’re trading development speed for technical debt.


DimensionQuestions to AskRed Flags
CorrectnessDoes it do what was asked?Missing requirements, wrong behavior
CompletenessIs anything left TODO?// TODO, incomplete handlers, missing edge cases
ConsistencyDoes it match codebase patterns?Different naming, different patterns than existing code
CleanlinessIs the code maintainable?200-line functions, no comments, duplication
AppropriatenessIs the solution suitable?Over-engineered for simple task, under-engineered for complex one
graph LR
A[Claude Output] --> B[Quick Scan]
B --> C{Pass?}
C -->|Yes| D[Dimension Check]
C -->|No| E[Request Revision]
D --> F{Accept?}
F -->|Yes| G[Merge]
F -->|No| E
E --> A

Before any deep analysis, do a 30-second gut check:

  • Reasonable length? (not suspiciously short or absurdly long)
  • Familiar patterns? (looks like rest of codebase)
  • No obvious TODOs or placeholders?
  • No commented-out code?
  • Makes sense at first glance?

If Quick Scan fails, request revision immediately. Don’t waste time on deep assessment.

Run automated checks:

Terminal window
npm run lint # Style and basic issues
tsc --noEmit # Type errors
npm test # Functionality
git diff # Review actual changes

Then ask Claude for self-review:

  • “What would a senior dev criticize about this?”
  • “What edge cases might this miss?”
  • “Is there a simpler approach?”

Not all code needs to be perfect.

Risk LevelStandardExample
HighEvery issue fixedPayments, authentication, data handling
MediumCritical issues fixedUser-facing features
LowWorks correctly, tech debt notedInternal tools, prototypes

Decision factors: Risk level, time pressure, and permanence.


Scenario: Claude implemented a user search feature. Let’s assess it systematically.

Claude produces 100 lines of search code. Quick scan:

  • Length: ~100 lines — reasonable for search feature
  • Patterns: Uses async/await like rest of codebase
  • TODOs: None visible
  • First glance: Looks coherent

Quick Scan passes. Move to deeper check.

Terminal window
$ npm run lint

Expected output:

No errors found.
Terminal window
$ tsc --noEmit

Expected output:

No errors.
Terminal window
$ npm test

Expected output:

PASS src/services/userService.test.ts
✓ should search users by name
✓ should return empty array for no matches

Automated checks pass. But let’s go deeper.

Terminal window
$ git diff --stat

Expected output:

src/services/userService.ts | 100 +++++++++++++++++++++++++++++
src/routes/users.ts | 20 ++++++
2 files changed, 120 insertions(+)

Review the actual code. Notice: Large function, no pagination, no rate limiting.

You: Review the search implementation you just wrote.
What would a senior dev criticize?
Claude: Good points of feedback:
1. No pagination — could return 10,000 results
2. No rate limiting — could be abused
3. Search query not sanitized — potential injection
4. No caching — same search hits DB every time

Self-review surfaces real issues that tests and linters missed.

Categorize the issues:

IssuePriorityAction
No paginationHIGHFix now
No sanitizationHIGHFix now
No rate limitingMEDIUMFix later
No cachingLOWPremature optimization
You: Good feedback. Please add:
1. Pagination (limit 50 per page)
2. Query sanitization
We'll add rate limiting and caching later.

Result: Caught real issues before merge through systematic assessment.


Goal: Build your personal Quick Scan checklist.

Instructions:

  1. Ask Claude to implement a feature (e.g., “Add email validation to signup”)
  2. Time yourself doing Quick Scan (target: <30 seconds)
  3. Note what you instinctively check for
  4. Write down your personal Quick Scan checklist

Expected result: A personalized 5-7 item checklist you can use consistently.

💡 Hint

Good Quick Scan items:

  • File length vs. expected complexity
  • Imports (are they familiar packages?)
  • Function names (do they match codebase conventions?)
  • Error handling (any try/catch visible?)
  • Magic numbers or hardcoded values

Goal: Find which self-review prompts work best.

Instructions:

  1. Get code from Claude
  2. Ask Claude to review its own code with different prompts:
    • “What’s wrong with this code?”
    • “What would a senior dev change?”
    • “What edge cases might fail?”
    • “Is there a simpler approach?”
  3. Note which prompts produce the most actionable feedback
✅ Solution

Most effective prompts (in order):

  1. “What would a senior dev criticize?” — Gets architectural and style feedback
  2. “What edge cases might fail?” — Surfaces missing error handling
  3. “Is there a simpler approach?” — Catches over-engineering

Less effective:

  • “What’s wrong?” — Too vague, gets generic responses
  • “Review this code” — No direction, unfocused feedback

Goal: Practice categorizing issues by priority.

Instructions:

  1. Get code from Claude for a medium-complexity feature
  2. List all issues found
  3. Categorize each: Must Fix Now / Fix Later / Acceptable

Expected result: A prioritized list with clear reasoning.


  • Reasonable length
  • Familiar patterns
  • No TODOs/placeholders
  • No commented-out code
  • Makes sense at first glance
Terminal window
npm run lint # Style issues
tsc --noEmit # Type errors
npm test # Functionality
git diff # Review changes
"What would a senior dev criticize about this?"
"What edge cases might fail?"
"Is there a simpler approach?"
"What happens if this input is very large?"
Risk LevelStandardAction
High (payments, auth)Every issue fixedThorough review required
Medium (user features)Critical issues fixedQuick Scan + automated
Low (internal/prototype)Works correctlyQuick Scan, note tech debt

❌ Mistake✅ Correct Approach
Accepting first output without any reviewAt minimum: Quick Scan every time
Running only automated checksLinters miss design issues. Human review required.
Perfectionism on low-risk codeGood enough IS good enough for prototypes
Accepting “works” as sufficient for high-riskHigh-risk code needs thorough review
Not using Claude to review Claude’s codeSelf-review prompts catch real issues
Checking quality only at the endAssess during development, not just after
Ignoring gut feeling “this seems wrong”If it feels off, investigate before accepting

Scenario: Vietnamese fintech team building transaction history feature. Claude produced working code, tests passed, looked fine at first glance.

What happened: Code went to production. A user with 50,000+ transactions triggered the endpoint. No pagination. No date range limits. The service tried to load all transactions into memory. Memory spike. Service crashed. 30 minutes of downtime.

What was missed:

  • No pagination (returned ALL transactions)
  • No date range validation (could query 10 years of data)
  • No limit on response size

What should have happened:

  1. Quick Scan would flag “100 lines seems short for large data handling”
  2. Self-review prompt: “What happens with 50,000 transactions?”
  3. Claude would respond: “This loads all into memory. Add pagination.”

Result: Team added assessment workflow. Quick Scan + self-review prompt became standard. PR checklist now includes edge case review. Prevented estimated $5,000 in lost transactions.


Next: Module 8.5: Emergency Procedures