Practice: Data Analysis
Practice: Data Analysis
📋 Project Requirements
This lesson covers two practical scenarios — Sales Report Analysis and Employee Performance Statistics — applying aggregate functions, grouping queries, conditional expressions, JOINs, and subqueries learned in previous lessons.
Database Table Structure
We use a unified 4-table database:
-- Departments table
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50) NOT NULL,
city VARCHAR(50),
budget DECIMAL(12,2)
);
-- Employees table
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100),
salary DECIMAL(10,2),
department_id INT,
hire_date DATE,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
employee_id INT,
order_date DATE,
total_amount DECIMAL(12,2),
status VARCHAR(20),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
-- Products table
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100) NOT NULL,
price DECIMAL(10,2),
stock INT,
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
Test Data
-- Department data
INSERT INTO departments VALUES (1, 'Technology', 'Beijing', 500000);
INSERT INTO departments VALUES (2, 'Sales', 'Shanghai', 300000);
INSERT INTO departments VALUES (3, 'Marketing', 'Guangzhou', 200000);
INSERT INTO departments VALUES (4, 'Finance', 'Shenzhen', 150000);
-- Employee data
INSERT INTO employees VALUES (1, 'Zhang', 'Wei', 'zhangwei@co.com', 12000, 1, '2023-03-15');
INSERT INTO employees VALUES (2, 'Li', 'Na', 'lina@co.com', 9500, 1, '2023-06-20');
INSERT INTO employees VALUES (3, 'Wang', 'Qiang', 'wangqiang@co.com', 8000, 2, '2022-11-10');
INSERT INTO employees VALUES (4, 'Zhao', 'Min', 'zhaomin@co.com', 11000, 2, '2024-01-05');
INSERT INTO employees VALUES (5, 'Liu', 'Yang', 'liuyang@co.com', 7500, 3, '2024-04-18');
INSERT INTO employees VALUES (6, 'Chen', 'Jing', 'chenjing@co.com', 9000, NULL, '2025-02-28');
-- Order data
INSERT INTO orders VALUES (1001, 101, 3, '2026-01-10', 2500.00, 'completed');
INSERT INTO orders VALUES (1002, 102, 3, '2026-02-15', 1800.00, 'completed');
INSERT INTO orders VALUES (1003, 103, 4, '2026-02-20', 3200.00, 'shipped');
INSERT INTO orders VALUES (1004, 101, 1, '2026-03-05', 950.00, 'pending');
INSERT INTO orders VALUES (1005, 104, 4, '2026-03-10', 4100.00, 'completed');
INSERT INTO orders VALUES (1006, 105, 3, '2026-04-01', 2800.00, 'cancelled');
INSERT INTO orders VALUES (1007, 101, 1, '2026-04-15', 5600.00, 'completed');
INSERT INTO orders VALUES (1008, 106, 4, '2026-05-02', 1500.00, 'completed');
INSERT INTO orders VALUES (1009, 102, 3, '2026-05-20', 3800.00, 'shipped');
INSERT INTO orders VALUES (1010, 107, 1, '2026-06-01', 2200.00, 'pending');
INSERT INTO orders VALUES (1011, 103, 4, '2026-06-10', 6300.00, 'completed');
INSERT INTO orders VALUES (1012, 108, 3, '2026-06-15', 1900.00, 'completed');
-- Product data
INSERT INTO products VALUES (1, 'Laptop', 6999.00, 50, 1);
INSERT INTO products VALUES (2, 'Wireless Mouse', 129.00, 200, 1);
INSERT INTO products VALUES (3, 'Office Desk', 899.00, 30, 2);
INSERT INTO products VALUES (4, 'Projector', 3500.00, 15, 3);
INSERT INTO products VALUES (5, 'Printer', 2200.00, 25, 1);
INSERT INTO products VALUES (6, 'Meeting Chair', 599.00, 40, 2);
INSERT INTO products VALUES (7, 'Whiteboard', 450.00, 60, 3);
🏗️ Query Practice
Query 1: Employee Sales Performance Report
Requirement: Generate a sales performance report for each employee, including order count, total sales, average order amount, orders by status, and a performance rating.
SELECT
CONCAT(e.first_name, e.last_name) AS employee_name,
d.department_name AS department,
COUNT(o.order_id) AS total_orders,
COALESCE(SUM(o.total_amount), 0) AS total_sales,
COALESCE(ROUND(AVG(o.total_amount), 2), 0) AS avg_order_amount,
SUM(CASE WHEN o.status = 'completed' THEN 1 ELSE 0 END) AS completed,
SUM(CASE WHEN o.status = 'shipped' THEN 1 ELSE 0 END) AS shipped,
SUM(CASE WHEN o.status = 'pending' THEN 1 ELSE 0 END) AS pending,
SUM(CASE WHEN o.status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled,
CASE
WHEN COALESCE(SUM(o.total_amount), 0) >= 10000 THEN 'S - Outstanding'
WHEN COALESCE(SUM(o.total_amount), 0) >= 5000 THEN 'A - Excellent'
WHEN COALESCE(SUM(o.total_amount), 0) >= 2000 THEN 'B - Good'
WHEN COUNT(o.order_id) > 0 THEN 'C - Needs Improvement'
ELSE 'No Performance'
END AS performance_rating
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name, d.department_name
ORDER BY total_sales DESC;
Result:
employee_name | department | total_orders | total_sales | avg_order_amount | completed | shipped | pending | cancelled | performance_rating
--------------+------------+--------------+-------------+------------------+-----------+---------+---------+-----------+-------------------
ZhaoMin | Sales | 4 | 15100.00 | 3775.00 | 3 | 1 | 0 | 0 | S - Outstanding
WangQiang | Sales | 4 | 10900.00 | 2725.00 | 2 | 1 | 0 | 1 | S - Outstanding
ZhangWei | Technology | 3 | 8750.00 | 2916.67 | 1 | 0 | 2 | 0 | A - Excellent
LiNa | Technology | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | No Performance
LiuYang | Marketing | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | No Performance
ChenJing | NULL | 0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | No Performance
Code Explanation:
LEFT JOIN— Retains all employees, even those without ordersCOALESCE(SUM(...), 0)— Displays 0 for employees with no ordersSUM(CASE WHEN ... THEN 1 ELSE 0 END)— Conditional counting of orders by status (row-to-column pivoting)CASE— Assigns performance rating based on total salesROUND(AVG(...), 2)— Rounds the average to two decimal places
Query 2: Department Quarterly Sales Comparison
Requirement: Aggregate sales by department and quarter, calculate each department's sales percentage, and find the top-selling department for each quarter.
SELECT
d.department_name AS department,
CONCAT('Q', QUARTER(o.order_date)) AS quarter,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS quarterly_sales,
ROUND(
SUM(o.total_amount) /
SUM(SUM(o.total_amount)) OVER (PARTITION BY QUARTER(o.order_date)) * 100,
2) AS percentage,
RANK() OVER (
PARTITION BY QUARTER(o.order_date)
ORDER BY SUM(o.total_amount) DESC
) AS quarter_rank
FROM orders o
INNER JOIN employees e ON o.employee_id = e.employee_id
INNER JOIN departments d ON e.department_id = d.department_id
WHERE o.status != 'cancelled'
GROUP BY d.department_name, QUARTER(o.order_date)
ORDER BY quarter, quarterly_sales DESC;
Result:
department | quarter | order_count | quarterly_sales | percentage | quarter_rank
-----------+---------+-------------+-----------------+------------+-------------
Sales | Q1 | 2 | 4300.00 | 69.35 | 1
Technology | Q1 | 1 | 950.00 | 15.32 | 2
Sales | Q2 | 2 | 5600.00 | 26.54 | 1
Technology | Q2 | 1 | 5600.00 | 26.54 | 1
Marketing | Q2 | 1 | 3800.00 | 18.01 | 3
Code Explanation:
QUARTER(o.order_date)— Extracts the quarter from the order date (Q1-Q4)SUM(o.total_amount) OVER (PARTITION BY ...)— Window function calculating total sales per quarterpercentage— Each department's quarterly sales ÷ total quarterly sales × 100RANK() OVER (PARTITION BY ... ORDER BY ...)— Ranks within each quarter
Query 3: Customer Value Segmentation (Simplified RFM Model)
Requirement: Segment customers based on Recency (last purchase date), Frequency (purchase count), and Monetary (total spend).
WITH customer_stats AS (
SELECT
customer_id,
MAX(order_date) AS last_purchase_date,
COUNT(order_id) AS purchase_count,
SUM(total_amount) AS total_spend
FROM orders
WHERE status != 'cancelled'
GROUP BY customer_id
)
SELECT
customer_id,
last_purchase_date,
purchase_count,
total_spend,
CASE
WHEN purchase_count >= 3 AND total_spend >= 5000 THEN 'VIP Customer'
WHEN purchase_count >= 2 AND total_spend >= 3000 THEN 'Key Customer'
WHEN purchase_count >= 2 THEN 'Active Customer'
WHEN total_spend >= 2000 THEN 'Potential Customer'
ELSE 'Regular Customer'
END AS customer_tier
FROM customer_stats
ORDER BY total_spend DESC;
Result:
customer_id | last_purchase_date | purchase_count | total_spend | customer_tier
------------+-------------------+----------------+-------------+---------------
103 | 2026-06-10 | 2 | 9500.00 | VIP Customer
101 | 2026-04-15 | 2 | 3450.00 | Key Customer
104 | 2026-03-10 | 1 | 4100.00 | Potential Customer
102 | 2026-05-20 | 2 | 5600.00 | VIP Customer
106 | 2026-05-02 | 1 | 1500.00 | Regular Customer
105 | 2026-04-01 | 0 | 0.00 | Regular Customer
107 | 2026-06-01 | 1 | 2200.00 | Potential Customer
108 | 2026-06-15 | 1 | 1900.00 | Regular Customer
Code Explanation:
WITH customer_stats AS (...)— CTE (Common Table Expression) first calculates statistics for each customer- Inside CTE:
MAX(order_date)for last purchase,COUNTfor purchase count,SUMfor total spend - Outer
CASEsegments customers based on multi-dimensional conditions - Excludes cancelled orders (
status != 'cancelled')
Query 4: Product Inventory and Sales Correlation Analysis
Requirement: Analyze product inventory by department, combined with sales data to evaluate inventory turnover.
SELECT
d.department_name AS department,
COUNT(DISTINCT p.product_id) AS product_count,
SUM(p.stock) AS total_stock,
COALESCE(SUM(p.price * p.stock), 0) AS inventory_value,
COUNT(DISTINCT o.order_id) AS related_orders,
COALESCE(SUM(o.total_amount), 0) AS order_total,
CASE
WHEN SUM(p.price * p.stock) > 0
THEN ROUND(COALESCE(SUM(o.total_amount), 0) / SUM(p.price * p.stock) * 100, 2)
ELSE 0
END AS turnover_rate_pct
FROM departments d
LEFT JOIN products p ON d.department_id = p.department_id
LEFT JOIN employees e ON d.department_id = e.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
GROUP BY d.department_id, d.department_name
ORDER BY inventory_value DESC;
Result:
department | product_count | total_stock | inventory_value | related_orders | order_total | turnover_rate_pct
-----------+---------------+-------------+-----------------+----------------+-------------+------------------
Technology | 3 | 275 | 369945.00 | 4 | 8750.00 | 2.36
Sales | 2 | 70 | 68930.00 | 8 | 26000.00 | 37.71
Marketing | 2 | 75 | 86250.00 | 0 | 0.00 | 0.00
Code Explanation:
- Multi-table
LEFT JOIN: departments → products + employees → orders COUNT(DISTINCT ...)— Deduplicated count to avoid duplicatesp.price * p.stock— Calculates product inventory value- Turnover rate = order total ÷ inventory value × 100%
Query 5: Employee Performance Ranking and Department Comparison
Requirement: Rank employees by overall performance and compare with department averages.
WITH emp_performance AS (
SELECT
e.employee_id,
CONCAT(e.first_name, e.last_name) AS name,
d.department_name AS department,
e.salary AS base_salary,
COUNT(o.order_id) AS order_count,
COALESCE(SUM(o.total_amount), 0) AS sales_amount,
COALESCE(AVG(o.total_amount), 0) AS avg_order_amount
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
GROUP BY e.employee_id, e.first_name, e.last_name, d.department_name, e.salary
)
SELECT
name,
department,
base_salary,
order_count,
sales_amount,
ROUND(avg_order_amount, 2) AS avg_order_amount,
RANK() OVER (ORDER BY sales_amount DESC) AS company_rank,
RANK() OVER (PARTITION BY department ORDER BY sales_amount DESC) AS dept_rank,
ROUND(sales_amount - AVG(sales_amount) OVER (PARTITION BY department), 2) AS vs_dept_avg
FROM emp_performance
ORDER BY sales_amount DESC;
Result:
name | department | base_salary | order_count | sales_amount | avg_order_amount | company_rank | dept_rank | vs_dept_avg
----------+------------+-------------+-------------+--------------+------------------+--------------+-----------+------------
ZhaoMin | Sales | 11000.00 | 4 | 15100.00 | 3775.00 | 1 | 1 | 3550.00
WangQiang | Sales | 8000.00 | 3 | 10900.00 | 3633.33 | 2 | 2 | -650.00
ZhangWei | Technology | 12000.00 | 2 | 8750.00 | 4375.00 | 3 | 1 | 8750.00
LiNa | Technology | 9500.00 | 0 | 0.00 | 0.00 | 4 | 2 | -8750.00
LiuYang | Marketing | 7500.00 | 0 | 0.00 | 0.00 | 4 | 1 | 0.00
ChenJing | NULL | 9000.00 | 0 | 0.00 | 0.00 | 4 | 1 | 0.00
Code Explanation:
- CTE
emp_performance— First calculates base performance data for each employee RANK() OVER (ORDER BY ...)— Company-wide rankingRANK() OVER (PARTITION BY ... ORDER BY ...)— Ranking within departmentAVG(sales_amount) OVER (PARTITION BY department)— Calculates department average salesvs_dept_avg— Difference between individual sales and department average
❓ FAQ
Q: What's the difference between CTE (WITH statement) and subqueries? A: CTEs have better readability and can be referenced multiple times within the same query; subqueries re-execute each time they're referenced. CTEs also support recursive queries. For complex queries, CTEs are recommended.
Q: What's the difference between window functions and GROUP BY? A:
GROUP BYmerges multiple rows into one (reducing row count after aggregation); window functions retain all original rows and add aggregate results to each row. If you need both aggregation and detail, use window functions.
Q: Is there a limit to CASE WHEN nesting depth? A: Theoretically, no, but excessive nesting indicates overly complex logic. Consider using CTEs to break down steps, or handle complex logic in the application layer.
Q: How to optimize aggregate queries with large data volumes? A: Create indexes on columns used in GROUP BY and WHERE; use CTEs to break down complex queries; avoid using functions on aggregated columns (e.g., change
WHERE YEAR(date) = 2026toWHERE date >= '2026-01-01' AND date < '2027-01-01'); use EXPLAIN to analyze execution plans.
📖 Summary
| Skill | Application in This Lesson |
|---|---|
| Aggregate Functions | COUNT, SUM, AVG, MAX, MIN |
| GROUP BY | Group statistics by dimension |
| HAVING | Filter aggregate results |
| CASE WHEN | Conditional statistics (row-to-column), rating assignment |
| JOIN | Multi-table queries |
| Subquery / CTE | Build complex queries step by step |
| Window Functions | RANK() ranking, AVG() OVER() comparison analysis |
| COALESCE | Handle NULL values |
Key Takeaways:
- Use CTEs to build complex queries step by step for better readability
- CASE WHEN + aggregate functions = conditional statistics (row-to-column pivoting)
- Window functions retain detail while adding aggregate information
- Always use COALESCE to handle NULL and avoid empty values in reports
- Use EXPLAIN to analyze query performance and create indexes for key columns
📝 Exercises
Exercise 1: Write a query to generate a "Monthly Sales Trend Report" — aggregate order count, total sales, and average order amount by month, and calculate the month-over-month growth rate: (current month - previous month) / previous month × 100%.
Exercise 2: Write a query to find "the highest-paid employee in each department" along with their "number of processed orders," using CTE + window functions.
Exercise 3: Write a query to segment customers by spending amount (≥5000 as VIP, ≥3000 as Key, ≥1000 as Regular, <1000 as New Customer), and count the number of customers and total spend for each tier.
Exercise 4: Create a view v_employee_dashboard containing employee name, department, salary, order count, total sales, and performance rating. Use the view to query performance of "Sales" department employees.
Next Lesson
👉 19-window-functions - Window Functions: Deep dive into ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and other window functions to master advanced data analysis techniques!



