Grouping Queries

Grouping Queries

🌍 Real-World Analogy

Imagine you are a teacher with a stack of report cards. You want to know:

This is what GROUP BY does: first split the data into groups, then perform statistics on each group. The aggregate functions learned in the previous lesson calculate a single value for the entire table, while GROUP BY calculates values by category.


🎯 Core Concepts

GROUP BY Basic Syntax

SQL
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;

Execution logic: First group by the GROUP BY column, then apply the aggregate function to each group.

SQL
-- How many employees are in each department?
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id;

Output:

TEXT
department_id  headcount
-------------  ---------
1              3
2              2
3              2
NULL           1
💡 Rule: Columns in SELECT must either appear in GROUP BY or be wrapped in an aggregate function. Otherwise, the semantics are ambiguous and the database will report an error.

Post-Group Aggregation

The core pattern of GROUP BY is: Group → Aggregate.

SQL
-- Average salary per department
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

Common aggregate combinations:

SQL
SELECT department_id,
       COUNT(*) AS headcount,
       SUM(salary) AS total_salary,
       AVG(salary) AS avg_salary,
       MAX(salary) AS highest_salary,
       MIN(salary) AS lowest_salary
FROM employees
GROUP BY department_id;

Multi-Column Grouping

You can group by multiple columns — rows are considered in the same group only when the values of all specified columns match:

SQL
-- Employee count by department and status
SELECT department_id, status, COUNT(*) AS headcount
FROM employees
GROUP BY department_id, status;

Output:

TEXT
department_id  status    headcount
-------------  --------  ---------
1              active    3
2              active    1
2              inactive  1
3              active    2
💡 Understanding: Multi-column grouping is similar to multi-level subtotals in Excel. First group by department, then by status — each department + status combination forms a group.

GROUP BY + ORDER BY

You can sort after grouping:

SQL
-- Average salary by department, sorted from highest to lowest
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC;

WHERE vs HAVING — Filtering Rows vs Filtering Groups

This is one of the most important concepts in this lesson:

Feature WHERE HAVING
Filters Rows (before grouping) Groups (after grouping)
Execution timing Before GROUP BY After GROUP BY
Can use aggregate functions ❌ No ✅ Yes
Usage Can be used alone Must be used with GROUP BY
SQL
-- WHERE: Filter rows first, then group
-- Find departments with more than 3 employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
WHERE salary > 5000       -- First exclude rows with salary below 5000
GROUP BY department_id;

-- HAVING: Group first, then filter groups
-- Find departments with 3 or more employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
HAVING COUNT(*) >= 3;     -- Filter out groups with fewer than 3 members
SQL
-- Combining WHERE + HAVING
-- Find departments with employees hired in 2024 and average salary > 10000
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01'   -- Filter rows first
GROUP BY department_id
HAVING AVG(salary) > 10000;      -- Then filter groups

Execution Order

The writing order and execution order of SQL queries differ:

TEXT
Writing order:  SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

Understanding execution order is crucial for writing correct SQL:

  1. FROM — Determine the data source
  2. WHERE — Filter row by row
  3. GROUP BY — Group
  4. HAVING — Filter groups
  5. SELECT — Select columns, compute expressions
  6. ORDER BY — Sort
💡 Key: This is why aggregate functions cannot be used in WHERE — because when WHERE executes, GROUP BY hasn't executed yet, so the aggregate results don't exist.


📝 Basic Syntax

SQL
-- Basic grouping
SELECT column_name, aggregate_function(column_name)
FROM table_name
GROUP BY column_name;

-- Multi-column grouping
SELECT column1, column2, aggregate_function(column_name)
FROM table_name
GROUP BY column1, column2;

-- WHERE + GROUP BY + HAVING
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE row_level_condition
GROUP BY column_name
HAVING aggregate_level_condition
ORDER BY sort_column;
💡 Tip:

  • Columns in GROUP BY don't need to match the order of columns in SELECT, but must include all non-aggregated columns
  • HAVING can use aggregate function aliases (supported in MySQL and PostgreSQL), but standard SQL requires writing the full expression
  • NULL values are grouped together in GROUP BY

📌 Examples

Example: Employee Statistics by Department

SQL
SELECT d.department_name,
       COUNT(e.employee_id) AS headcount,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS total_salary
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY total_salary DESC;
▶ Try it Yourself

Output:

TEXT
department_name  headcount  avg_salary  total_salary
---------------  ---------  ----------  ------------
Technology       3          17666.67    53000.00
Finance          2          13500.00    27000.00
Marketing        2          11500.00    23000.00
Administration   1          9000.00     9000.00
💡 Interpretation: Using LEFT JOIN ensures that departments without employees are also displayed (with a headcount of 0). GROUP BY department_name groups by department name.


Example: WHERE + GROUP BY + HAVING Combined

SQL
-- Among employees hired in 2024, find departments with average salary > 12000
SELECT department_id,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01'
GROUP BY department_id
HAVING AVG(salary) > 12000
ORDER BY avg_salary DESC;
▶ Try it Yourself

Output:

TEXT
department_id  headcount  avg_salary
-------------  ---------  ----------
1              2          18500.00
3              1          14000.00
💡 Execution process:

  1. WHERE hire_date >= '2024-01-01' — First filter for employees hired in 2024
  2. GROUP BY department_id — Group by department
  3. HAVING AVG(salary) > 12000 — Keep only groups with average salary > 12000
  4. ORDER BY avg_salary DESC — Sort by average salary in descending order

Example: Multi-Column Grouping + Complex Conditions

SQL
-- Employee statistics by department and status, only showing groups with 2+ members
SELECT d.department_name,
       e.status,
       COUNT(*) AS headcount,
       MIN(e.salary) AS lowest_salary,
       MAX(e.salary) AS highest_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 0
GROUP BY d.department_name, e.status
HAVING COUNT(*) >= 2
ORDER BY d.department_name, headcount DESC;
▶ Try it Yourself

Output:

TEXT
department_name  status  headcount  lowest_salary  highest_salary
---------------  ------  ---------  -------------  --------------
Finance          active  2          12000.00       14000.00
Marketing        active  2          10000.00       12000.00
Technology       active  3          15000.00       20000.00
💡 Interpretation: First use WHERE to exclude rows with zero salary, then group by department + status, HAVING filters out groups with fewer than 2 members, and finally sort.


🎬 Practice Scenarios

Scenario 1: Sales Report — Monthly Order Statistics

Generate a monthly sales report, showing only months with 3 or more orders.

SQL
SELECT 
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(MONTH FROM order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_amount,
    AVG(total_amount) AS avg_order_amount
FROM orders
WHERE status IN ('completed', 'shipped')
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
HAVING COUNT(*) >= 3
ORDER BY year, month;
💡 Approach: WHERE filters for completed and shipped orders, group by year and month for statistics, HAVING filters out months with fewer than 3 orders.

Scenario 2: HR Analysis — Finding High-Paying Departments

Find departments with average salary higher than the company-wide average.

SQL
SELECT d.department_name,
       COUNT(e.employee_id) AS headcount,
       AVG(e.salary) AS dept_avg_salary
FROM departments d
JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
HAVING AVG(e.salary) > (SELECT AVG(salary) FROM employees)
ORDER BY dept_avg_salary DESC;
💡 Approach: Subqueries can be used in HAVING. First calculate each department's average salary, then compare it with the company-wide average. This is a combined application of aggregate functions and subqueries.


❓ FAQ

Q: Must columns in SELECT appear in GROUP BY? A: Yes, all non-aggregated columns must appear in GROUP BY. This is a SQL standard requirement. For example, SELECT a, b, COUNT(*) FROM t GROUP BY a will cause an error because b is neither in GROUP BY nor wrapped in an aggregate function.

Q: Can HAVING use aliases from SELECT? A: MySQL and PostgreSQL support using aliases in HAVING, but SQL Server does not. For compatibility, it's recommended to write the full expression: HAVING AVG(salary) > 10000 instead of HAVING avg_salary > 10000.

Q: How are NULL values handled in GROUP BY? A: NULL values are grouped together. If department_id has NULL values, all rows with NULL will be placed in the same group. This is usually useful, but be careful to distinguish between "no department" and "department ID is 0."

Q: Can WHERE and HAVING be used together? A: Yes, and it's very common. WHERE filters rows before grouping, HAVING filters groups after grouping. Execution order: WHEREGROUP BYHAVING.


📖 Summary


📝 Exercises

Exercise 1 (⭐): Query the headcount and average salary for each department, sorted by average salary from highest to lowest.

Exercise 2 (⭐⭐): Query the number of employees hired after 2024 in each department, showing only departments with 2 or more employees.

Exercise 3 (⭐⭐⭐): Query the highest-paid employee in each department. Display department name, employee name, and salary. Hint: this can be achieved using a subquery or the ROW_NUMBER() window function (covered later).


Next Lesson

👉 15-advanced-functions - Advanced Functions: Learn SQL advanced functions, master string processing, numeric calculations, date operations, and other practical skills!

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏