Practice: Data Analysis

Practice: Data Analysis

📋 Project Requirements

This lesson covers two practical scenarios — Sales Report Analysis and Employee Performance Statistics — applying aggregate functions, grouping queries, conditional expressions, JOINs, and subqueries learned in previous lessons.

Database Table Structure

We use a unified 4-table database:

SQL
-- Departments table
CREATE TABLE departments (
    department_id   INT PRIMARY KEY,
    department_name VARCHAR(50) NOT NULL,
    city            VARCHAR(50),
    budget          DECIMAL(12,2)
);

-- Employees table
CREATE TABLE employees (
    employee_id   INT PRIMARY KEY,
    first_name    VARCHAR(50) NOT NULL,
    last_name     VARCHAR(50) NOT NULL,
    email         VARCHAR(100),
    salary        DECIMAL(10,2),
    department_id INT,
    hire_date     DATE,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

-- Orders table
CREATE TABLE orders (
    order_id     INT PRIMARY KEY,
    customer_id  INT,
    employee_id  INT,
    order_date   DATE,
    total_amount DECIMAL(12,2),
    status       VARCHAR(20),
    FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);

-- Products table
CREATE TABLE products (
    product_id   INT PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    price        DECIMAL(10,2),
    stock        INT,
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

Test Data

SQL
-- Department data
INSERT INTO departments VALUES (1, 'Technology', 'Beijing', 500000);
INSERT INTO departments VALUES (2, 'Sales', 'Shanghai', 300000);
INSERT INTO departments VALUES (3, 'Marketing', 'Guangzhou', 200000);
INSERT INTO departments VALUES (4, 'Finance', 'Shenzhen', 150000);

-- Employee data
INSERT INTO employees VALUES (1, 'Zhang', 'Wei', 'zhangwei@co.com', 12000, 1, '2023-03-15');
INSERT INTO employees VALUES (2, 'Li', 'Na', 'lina@co.com', 9500, 1, '2023-06-20');
INSERT INTO employees VALUES (3, 'Wang', 'Qiang', 'wangqiang@co.com', 8000, 2, '2022-11-10');
INSERT INTO employees VALUES (4, 'Zhao', 'Min', 'zhaomin@co.com', 11000, 2, '2024-01-05');
INSERT INTO employees VALUES (5, 'Liu', 'Yang', 'liuyang@co.com', 7500, 3, '2024-04-18');
INSERT INTO employees VALUES (6, 'Chen', 'Jing', 'chenjing@co.com', 9000, NULL, '2025-02-28');

-- Order data
INSERT INTO orders VALUES (1001, 101, 3, '2026-01-10', 2500.00, 'completed');
INSERT INTO orders VALUES (1002, 102, 3, '2026-02-15', 1800.00, 'completed');
INSERT INTO orders VALUES (1003, 103, 4, '2026-02-20', 3200.00, 'shipped');
INSERT INTO orders VALUES (1004, 101, 1, '2026-03-05', 950.00, 'pending');
INSERT INTO orders VALUES (1005, 104, 4, '2026-03-10', 4100.00, 'completed');
INSERT INTO orders VALUES (1006, 105, 3, '2026-04-01', 2800.00, 'cancelled');
INSERT INTO orders VALUES (1007, 101, 1, '2026-04-15', 5600.00, 'completed');
INSERT INTO orders VALUES (1008, 106, 4, '2026-05-02', 1500.00, 'completed');
INSERT INTO orders VALUES (1009, 102, 3, '2026-05-20', 3800.00, 'shipped');
INSERT INTO orders VALUES (1010, 107, 1, '2026-06-01', 2200.00, 'pending');
INSERT INTO orders VALUES (1011, 103, 4, '2026-06-10', 6300.00, 'completed');
INSERT INTO orders VALUES (1012, 108, 3, '2026-06-15', 1900.00, 'completed');

-- Product data
INSERT INTO products VALUES (1, 'Laptop', 6999.00, 50, 1);
INSERT INTO products VALUES (2, 'Wireless Mouse', 129.00, 200, 1);
INSERT INTO products VALUES (3, 'Office Desk', 899.00, 30, 2);
INSERT INTO products VALUES (4, 'Projector', 3500.00, 15, 3);
INSERT INTO products VALUES (5, 'Printer', 2200.00, 25, 1);
INSERT INTO products VALUES (6, 'Meeting Chair', 599.00, 40, 2);
INSERT INTO products VALUES (7, 'Whiteboard', 450.00, 60, 3);

🏗️ Query Practice

Query 1: Employee Sales Performance Report

Requirement: Generate a sales performance report for each employee, including order count, total sales, average order amount, orders by status, and a performance rating.

SQL
SELECT 
    CONCAT(e.first_name, e.last_name) AS employee_name,
    d.department_name AS department,
    COUNT(o.order_id) AS total_orders,
    COALESCE(SUM(o.total_amount), 0) AS total_sales,
    COALESCE(ROUND(AVG(o.total_amount), 2), 0) AS avg_order_amount,
    SUM(CASE WHEN o.status = 'completed' THEN 1 ELSE 0 END) AS completed,
    SUM(CASE WHEN o.status = 'shipped'   THEN 1 ELSE 0 END) AS shipped,
    SUM(CASE WHEN o.status = 'pending'   THEN 1 ELSE 0 END) AS pending,
    SUM(CASE WHEN o.status = 'cancelled' THEN 1 ELSE 0 END) AS cancelled,
    CASE
        WHEN COALESCE(SUM(o.total_amount), 0) >= 10000 THEN 'S - Outstanding'
        WHEN COALESCE(SUM(o.total_amount), 0) >= 5000  THEN 'A - Excellent'
        WHEN COALESCE(SUM(o.total_amount), 0) >= 2000  THEN 'B - Good'
        WHEN COUNT(o.order_id) > 0                      THEN 'C - Needs Improvement'
        ELSE 'No Performance'
    END AS performance_rating
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id
GROUP BY e.employee_id, e.first_name, e.last_name, d.department_name
ORDER BY total_sales DESC;

Result:

TEXT
employee_name | department | total_orders | total_sales | avg_order_amount | completed | shipped | pending | cancelled | performance_rating
--------------+------------+--------------+-------------+------------------+-----------+---------+---------+-----------+-------------------
ZhaoMin       | Sales      |            4 |    15100.00 |          3775.00 |         3 |       1 |       0 |         0 | S - Outstanding
WangQiang     | Sales      |            4 |    10900.00 |          2725.00 |         2 |       1 |       0 |         1 | S - Outstanding
ZhangWei      | Technology |            3 |     8750.00 |          2916.67 |         1 |       0 |       2 |         0 | A - Excellent
LiNa          | Technology |            0 |        0.00 |             0.00 |         0 |       0 |       0 |         0 | No Performance
LiuYang       | Marketing  |            0 |        0.00 |             0.00 |         0 |       0 |       0 |         0 | No Performance
ChenJing      | NULL       |            0 |        0.00 |             0.00 |         0 |       0 |       0 |         0 | No Performance

Code Explanation:

  1. LEFT JOIN — Retains all employees, even those without orders
  2. COALESCE(SUM(...), 0) — Displays 0 for employees with no orders
  3. SUM(CASE WHEN ... THEN 1 ELSE 0 END) — Conditional counting of orders by status (row-to-column pivoting)
  4. CASE — Assigns performance rating based on total sales
  5. ROUND(AVG(...), 2) — Rounds the average to two decimal places

Query 2: Department Quarterly Sales Comparison

Requirement: Aggregate sales by department and quarter, calculate each department's sales percentage, and find the top-selling department for each quarter.

SQL
SELECT 
    d.department_name AS department,
    CONCAT('Q', QUARTER(o.order_date)) AS quarter,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS quarterly_sales,
    ROUND(
        SUM(o.total_amount) / 
        SUM(SUM(o.total_amount)) OVER (PARTITION BY QUARTER(o.order_date)) * 100, 
    2) AS percentage,
    RANK() OVER (
        PARTITION BY QUARTER(o.order_date) 
        ORDER BY SUM(o.total_amount) DESC
    ) AS quarter_rank
FROM orders o
INNER JOIN employees e ON o.employee_id = e.employee_id
INNER JOIN departments d ON e.department_id = d.department_id
WHERE o.status != 'cancelled'
GROUP BY d.department_name, QUARTER(o.order_date)
ORDER BY quarter, quarterly_sales DESC;

Result:

TEXT
department | quarter | order_count | quarterly_sales | percentage | quarter_rank
-----------+---------+-------------+-----------------+------------+-------------
Sales      | Q1      |           2 |         4300.00 |      69.35 |            1
Technology | Q1      |           1 |          950.00 |      15.32 |            2
Sales      | Q2      |           2 |         5600.00 |      26.54 |            1
Technology | Q2      |           1 |         5600.00 |      26.54 |            1
Marketing  | Q2      |           1 |         3800.00 |      18.01 |            3

Code Explanation:

  1. QUARTER(o.order_date) — Extracts the quarter from the order date (Q1-Q4)
  2. SUM(o.total_amount) OVER (PARTITION BY ...) — Window function calculating total sales per quarter
  3. percentage — Each department's quarterly sales ÷ total quarterly sales × 100
  4. RANK() OVER (PARTITION BY ... ORDER BY ...) — Ranks within each quarter

Query 3: Customer Value Segmentation (Simplified RFM Model)

Requirement: Segment customers based on Recency (last purchase date), Frequency (purchase count), and Monetary (total spend).

SQL
WITH customer_stats AS (
    SELECT 
        customer_id,
        MAX(order_date) AS last_purchase_date,
        COUNT(order_id) AS purchase_count,
        SUM(total_amount) AS total_spend
    FROM orders
    WHERE status != 'cancelled'
    GROUP BY customer_id
)
SELECT 
    customer_id,
    last_purchase_date,
    purchase_count,
    total_spend,
    CASE
        WHEN purchase_count >= 3 AND total_spend >= 5000 THEN 'VIP Customer'
        WHEN purchase_count >= 2 AND total_spend >= 3000 THEN 'Key Customer'
        WHEN purchase_count >= 2                       THEN 'Active Customer'
        WHEN total_spend >= 2000                       THEN 'Potential Customer'
        ELSE 'Regular Customer'
    END AS customer_tier
FROM customer_stats
ORDER BY total_spend DESC;

Result:

TEXT
customer_id | last_purchase_date | purchase_count | total_spend | customer_tier
------------+-------------------+----------------+-------------+---------------
        103 | 2026-06-10        |              2 |     9500.00 | VIP Customer
        101 | 2026-04-15        |              2 |     3450.00 | Key Customer
        104 | 2026-03-10        |              1 |     4100.00 | Potential Customer
        102 | 2026-05-20        |              2 |     5600.00 | VIP Customer
        106 | 2026-05-02        |              1 |     1500.00 | Regular Customer
        105 | 2026-04-01        |              0 |        0.00 | Regular Customer
        107 | 2026-06-01        |              1 |     2200.00 | Potential Customer
        108 | 2026-06-15        |              1 |     1900.00 | Regular Customer

Code Explanation:

  1. WITH customer_stats AS (...) — CTE (Common Table Expression) first calculates statistics for each customer
  2. Inside CTE: MAX(order_date) for last purchase, COUNT for purchase count, SUM for total spend
  3. Outer CASE segments customers based on multi-dimensional conditions
  4. Excludes cancelled orders (status != 'cancelled')

Query 4: Product Inventory and Sales Correlation Analysis

Requirement: Analyze product inventory by department, combined with sales data to evaluate inventory turnover.

SQL
SELECT 
    d.department_name AS department,
    COUNT(DISTINCT p.product_id) AS product_count,
    SUM(p.stock) AS total_stock,
    COALESCE(SUM(p.price * p.stock), 0) AS inventory_value,
    COUNT(DISTINCT o.order_id) AS related_orders,
    COALESCE(SUM(o.total_amount), 0) AS order_total,
    CASE 
        WHEN SUM(p.price * p.stock) > 0 
        THEN ROUND(COALESCE(SUM(o.total_amount), 0) / SUM(p.price * p.stock) * 100, 2)
        ELSE 0
    END AS turnover_rate_pct
FROM departments d
LEFT JOIN products p ON d.department_id = p.department_id
LEFT JOIN employees e ON d.department_id = e.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
GROUP BY d.department_id, d.department_name
ORDER BY inventory_value DESC;

Result:

TEXT
department | product_count | total_stock | inventory_value | related_orders | order_total | turnover_rate_pct
-----------+---------------+-------------+-----------------+----------------+-------------+------------------
Technology |             3 |         275 |       369945.00 |              4 |      8750.00 |             2.36
Sales      |             2 |          70 |        68930.00 |              8 |     26000.00 |            37.71
Marketing  |             2 |          75 |        86250.00 |              0 |         0.00 |             0.00

Code Explanation:

  1. Multi-table LEFT JOIN: departments → products + employees → orders
  2. COUNT(DISTINCT ...) — Deduplicated count to avoid duplicates
  3. p.price * p.stock — Calculates product inventory value
  4. Turnover rate = order total ÷ inventory value × 100%

Query 5: Employee Performance Ranking and Department Comparison

Requirement: Rank employees by overall performance and compare with department averages.

SQL
WITH emp_performance AS (
    SELECT 
        e.employee_id,
        CONCAT(e.first_name, e.last_name) AS name,
        d.department_name AS department,
        e.salary AS base_salary,
        COUNT(o.order_id) AS order_count,
        COALESCE(SUM(o.total_amount), 0) AS sales_amount,
        COALESCE(AVG(o.total_amount), 0) AS avg_order_amount
    FROM employees e
    LEFT JOIN departments d ON e.department_id = d.department_id
    LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
    GROUP BY e.employee_id, e.first_name, e.last_name, d.department_name, e.salary
)
SELECT 
    name,
    department,
    base_salary,
    order_count,
    sales_amount,
    ROUND(avg_order_amount, 2) AS avg_order_amount,
    RANK() OVER (ORDER BY sales_amount DESC) AS company_rank,
    RANK() OVER (PARTITION BY department ORDER BY sales_amount DESC) AS dept_rank,
    ROUND(sales_amount - AVG(sales_amount) OVER (PARTITION BY department), 2) AS vs_dept_avg
FROM emp_performance
ORDER BY sales_amount DESC;

Result:

TEXT
name      | department | base_salary | order_count | sales_amount | avg_order_amount | company_rank | dept_rank | vs_dept_avg
----------+------------+-------------+-------------+--------------+------------------+--------------+-----------+------------
ZhaoMin   | Sales      |    11000.00 |           4 |     15100.00 |          3775.00 |            1 |         1 |     3550.00
WangQiang | Sales      |     8000.00 |           3 |     10900.00 |          3633.33 |            2 |         2 |     -650.00
ZhangWei  | Technology |    12000.00 |           2 |      8750.00 |          4375.00 |            3 |         1 |     8750.00
LiNa      | Technology |     9500.00 |           0 |         0.00 |             0.00 |            4 |         2 |    -8750.00
LiuYang   | Marketing  |     7500.00 |           0 |         0.00 |             0.00 |            4 |         1 |        0.00
ChenJing  | NULL       |     9000.00 |           0 |         0.00 |             0.00 |            4 |         1 |        0.00

Code Explanation:

  1. CTE emp_performance — First calculates base performance data for each employee
  2. RANK() OVER (ORDER BY ...) — Company-wide ranking
  3. RANK() OVER (PARTITION BY ... ORDER BY ...) — Ranking within department
  4. AVG(sales_amount) OVER (PARTITION BY department) — Calculates department average sales
  5. vs_dept_avg — Difference between individual sales and department average

❓ FAQ

Q: What's the difference between CTE (WITH statement) and subqueries? A: CTEs have better readability and can be referenced multiple times within the same query; subqueries re-execute each time they're referenced. CTEs also support recursive queries. For complex queries, CTEs are recommended.

Q: What's the difference between window functions and GROUP BY? A: GROUP BY merges multiple rows into one (reducing row count after aggregation); window functions retain all original rows and add aggregate results to each row. If you need both aggregation and detail, use window functions.

Q: Is there a limit to CASE WHEN nesting depth? A: Theoretically, no, but excessive nesting indicates overly complex logic. Consider using CTEs to break down steps, or handle complex logic in the application layer.

Q: How to optimize aggregate queries with large data volumes? A: Create indexes on columns used in GROUP BY and WHERE; use CTEs to break down complex queries; avoid using functions on aggregated columns (e.g., change WHERE YEAR(date) = 2026 to WHERE date >= '2026-01-01' AND date < '2027-01-01'); use EXPLAIN to analyze execution plans.


📖 Summary

Skill Application in This Lesson
Aggregate Functions COUNT, SUM, AVG, MAX, MIN
GROUP BY Group statistics by dimension
HAVING Filter aggregate results
CASE WHEN Conditional statistics (row-to-column), rating assignment
JOIN Multi-table queries
Subquery / CTE Build complex queries step by step
Window Functions RANK() ranking, AVG() OVER() comparison analysis
COALESCE Handle NULL values

Key Takeaways:


📝 Exercises

Exercise 1: Write a query to generate a "Monthly Sales Trend Report" — aggregate order count, total sales, and average order amount by month, and calculate the month-over-month growth rate: (current month - previous month) / previous month × 100%.

Exercise 2: Write a query to find "the highest-paid employee in each department" along with their "number of processed orders," using CTE + window functions.

Exercise 3: Write a query to segment customers by spending amount (≥5000 as VIP, ≥3000 as Key, ≥1000 as Regular, <1000 as New Customer), and count the number of customers and total spend for each tier.

Exercise 4: Create a view v_employee_dashboard containing employee name, department, salary, order count, total sales, and performance rating. Use the view to query performance of "Sales" department employees.


Next Lesson

👉 19-window-functions - Window Functions: Deep dive into ROW_NUMBER, RANK, DENSE_RANK, LAG, LEAD, and other window functions to master advanced data analysis techniques!

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏