Data Analysis
Module 13.1: Data Analysis
Section titled “Module 13.1: Data Analysis”Estimated time: ~35 minutes
Prerequisite: Phase 3 (Core Workflows)
Outcome: After this module, you will know how to use Claude Code for exploratory data analysis, understand prompting patterns for data work, and be able to extract insights from various data formats.
1. WHY — Why This Matters
Section titled “1. WHY — Why This Matters”You have a CSV with 10,000 rows of sales data. Traditional approach: open Excel, manually create pivot tables, write formulas, make charts. Takes hours. You might miss insights.
Claude Code approach: “Analyze this sales data. Find patterns, anomalies, and give me actionable insights.” Claude reads the file, writes Python scripts, generates visualizations, and explains findings — in minutes. Data analysis is where Claude Code shines for non-coding tasks.
2. CONCEPT — Core Ideas
Section titled “2. CONCEPT — Core Ideas”Claude Code as Data Analyst
Section titled “Claude Code as Data Analyst”[Your Question] → [Claude reads data] → [Writes analysis code] → [Runs code] → [Interprets results] → [Answers question]Data Analysis Workflow
Section titled “Data Analysis Workflow”| Stage | What Claude Does | Example Prompt |
|---|---|---|
| Load | Read file into memory | ”Read sales.csv” |
| Explore | Understand structure | ”What columns exist? Show sample rows” |
| Clean | Handle missing/bad data | ”Clean this data, handle nulls” |
| Analyze | Find patterns | ”What are the top trends?” |
| Visualize | Create charts | ”Create a chart showing monthly trends” |
| Insight | Business interpretation | ”What should we do based on this?” |
Supported Data Formats
Section titled “Supported Data Formats”- CSV/TSV:
pandas.read_csv() - JSON:
pandas.read_json()orjsonmodule - Excel:
pandas.read_excel()with openpyxl - SQL: Database connections via SQLAlchemy
- Log files: Custom parsing scripts
Analysis Types
Section titled “Analysis Types”- Descriptive: What happened? (sums, averages, counts)
- Diagnostic: Why did it happen? (correlations, drill-downs)
- Predictive: What might happen? (trends, forecasting)
- Prescriptive: What should we do? (recommendations)
3. DEMO — Step by Step
Section titled “3. DEMO — Step by Step”Scenario: Analyze e-commerce sales data (sales.csv).
Step 1: Load and Explore
Section titled “Step 1: Load and Explore”You: I have sales.csv in the current directory. Read it and tell me what'sin this dataset. Show me the structure and a few sample rows.
Claude: [Runs Python code to read CSV]
Dataset Overview:- Rows: 10,423- Columns: 8 (order_id, date, customer_id, product, category, quantity, price, region)- Date range: 2023-01-01 to 2023-12-31- Regions: North, South, East, West- Categories: Electronics, Clothing, Home, Food
Data quality: No missing values. Dates are consistent format.Step 2: Ask Specific Questions
Section titled “Step 2: Ask Specific Questions”You: What is the total revenue by category? Which category performs best?
Claude: [Writes and runs pandas groupby]
Revenue by Category:- Electronics: $2,450,000 (45%)- Home: $1,200,000 (22%)- Clothing: $1,050,000 (19%)- Food: $780,000 (14%)
Electronics is the top performer, contributing nearly half of total revenue.Step 3: Find Patterns
Section titled “Step 3: Find Patterns”You: Find any interesting patterns or anomalies in this data.
Claude: [Runs comprehensive analysis]
Key Patterns Found:1. Seasonality: December revenue 40% higher than average (holiday effect)2. Regional anomaly: West has 20% lower revenue but 30% higher order count — suggests lower average order value3. Product concentration: Top 10 products = 60% of revenue (risk)4. Day-of-week: Mondays have 15% lower sales than FridaysStep 4: Visualize
Section titled “Step 4: Visualize”You: Create a visualization showing monthly revenue trend by category.
Claude: [Writes matplotlib code, saves chart]
Created: monthly_revenue_by_category.png
Key insight: Electronics dominates all months, but Clothing showsstrongest growth trajectory (up 35% from Jan to Dec).Step 5: Get Recommendations
Section titled “Step 5: Get Recommendations”You: Based on this analysis, what are 3 actionable recommendations?
Claude:1. Investigate West region pricing — lower AOV suggests bundling opportunity2. Diversify product portfolio — 60% from 10 products is risky3. Run Monday promotions — could add 5-10% weekly revenue4. PRACTICE — Try It Yourself
Section titled “4. PRACTICE — Try It Yourself”Exercise 1: Explore Any Dataset
Section titled “Exercise 1: Explore Any Dataset”Goal: Practice the exploration workflow.
Instructions:
- Download a public CSV (Kaggle, government open data)
- Ask Claude to read and describe the structure
- Ask 5 specific questions about the data
- Request anomaly detection
💡 Hint
Start with “Read [filename] and describe the structure, data types, and any quality issues.”
✅ Solution
Prompt sequence:1. "Read data.csv and describe structure"2. "What is the distribution of [column]?"3. "What is [metric] by [dimension]?"4. "Find correlations between columns"5. "Identify any outliers or anomalies"Exercise 2: Business Question Analysis
Section titled “Exercise 2: Business Question Analysis”Goal: Let Claude determine the analysis approach.
Instructions:
- Use any sales or transaction data
- Ask: “Which customers should we focus on for retention?”
- Let Claude choose the methodology
- Review and critique the approach
💡 Hint
Claude will likely use RFM (Recency, Frequency, Monetary) analysis or cohort analysis for customer retention questions.
✅ Solution
Claude typically responds with:
- RFM segmentation to identify high-value at-risk customers
- Cohort analysis to spot retention trends
- Churn prediction based on activity patterns
Review: Check if the methodology matches your business context.
Exercise 3: Full Analysis Pipeline
Section titled “Exercise 3: Full Analysis Pipeline”Goal: Complete end-to-end analysis.
Instructions:
- Load data and explore structure
- Ask Claude to clean any issues found
- Analyze for top 3 trends
- Generate 3 visualizations
- Get actionable recommendations
💡 Hint
Use explicit prompts for each stage. Don’t combine too many requests in one prompt.
✅ Solution
Sequential prompts:
- “Read [file], describe structure and data quality”
- “Clean: handle missing values, fix data types”
- “What are the top 3 trends in this data?”
- “Create: bar chart of [metric], line chart of [trend], scatter of [correlation]”
- “Based on this analysis, give 3 actionable recommendations”
5. CHEAT SHEET
Section titled “5. CHEAT SHEET”Analysis Workflow
Section titled “Analysis Workflow”Load → Explore → Clean → Analyze → Visualize → InsightPrompting Patterns
Section titled “Prompting Patterns”| Pattern | Prompt Template |
|---|---|
| Exploration | ”Read [file] and describe the structure” |
| Specific question | ”What is [metric] by [dimension]?” |
| Pattern finding | ”Find top patterns or anomalies” |
| Visualization | ”Create a [chart type] showing [metric] over [dimension]“ |
| Recommendation | ”Based on this analysis, what should we do?” |
Supported Formats
Section titled “Supported Formats”CSV, JSON, Excel, SQL databases, log files
Analysis Types
Section titled “Analysis Types”| Type | Question |
|---|---|
| Descriptive | What happened? |
| Diagnostic | Why did it happen? |
| Predictive | What will happen? |
| Prescriptive | What should we do? |
6. PITFALLS — Common Mistakes
Section titled “6. PITFALLS — Common Mistakes”| ❌ Mistake | ✅ Correct Approach |
|---|---|
| Vague prompts (“analyze this data”) | Specific questions (“What is revenue by region?”) |
| Skipping exploration | Always start with “describe the structure” |
| Assuming data is clean | Ask about data quality, missing values first |
| Not verifying Claude’s code | Review the analysis methodology |
| Accepting first result | Ask follow-up questions, drill deeper |
| Huge files overwhelming context | Sample first, then full analysis |
| No visualization | Charts reveal patterns numbers don’t |
7. REAL CASE — Production Story
Section titled “7. REAL CASE — Production Story”Scenario: Vietnamese e-commerce startup had 6 months of transaction data. Founder wanted insights but had no data analyst. Traditional option: hire consultant (expensive, slow).
Claude Code Approach:
Session 1 (30 min): “Read transactions.csv, describe data, find top patterns”
- Discovered: 80% revenue from 15% of customers
- Found: Specific product bundles with high correlation
Session 2 (20 min): “Which customers are at risk of churning?”
- Claude built RFM analysis automatically
- Identified 200 high-value customers with declining activity
Session 3 (15 min): “Create customer segment visualization”
- Generated scatter plot of customer segments
- Clear visual of VIP vs at-risk customers
Results:
- 1 hour total analysis time (vs 1 week with consultant)
- Actionable segments for marketing campaign
- ₫50M campaign targeted at at-risk customers → 40% retained
Quote: “Claude Code is like having a data scientist who works at conversation speed.”
Next: Module 13.2: Report Generation →