Grouping Queries
Grouping Queries
🌍 Real-World Analogy
Imagine you are a teacher with a stack of report cards. You want to know:
- Each student's average score → Group by student, calculate average
- The highest score for each subject → Group by subject, find the maximum
- The pass rate for each class → Group by class, calculate the ratio
This is what GROUP BY does: first split the data into groups, then perform statistics on each group. The aggregate functions learned in the previous lesson calculate a single value for the entire table, while GROUP BY calculates values by category.
🎯 Core Concepts
GROUP BY Basic Syntax
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;
Execution logic: First group by the GROUP BY column, then apply the aggregate function to each group.
-- How many employees are in each department?
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id;
Output:
department_id headcount
------------- ---------
1 3
2 2
3 2
NULL 1
SELECT must either appear in GROUP BY or be wrapped in an aggregate function. Otherwise, the semantics are ambiguous and the database will report an error.
Post-Group Aggregation
The core pattern of GROUP BY is: Group → Aggregate.
-- Average salary per department
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;
Common aggregate combinations:
SELECT department_id,
COUNT(*) AS headcount,
SUM(salary) AS total_salary,
AVG(salary) AS avg_salary,
MAX(salary) AS highest_salary,
MIN(salary) AS lowest_salary
FROM employees
GROUP BY department_id;
Multi-Column Grouping
You can group by multiple columns — rows are considered in the same group only when the values of all specified columns match:
-- Employee count by department and status
SELECT department_id, status, COUNT(*) AS headcount
FROM employees
GROUP BY department_id, status;
Output:
department_id status headcount
------------- -------- ---------
1 active 3
2 active 1
2 inactive 1
3 active 2
GROUP BY + ORDER BY
You can sort after grouping:
-- Average salary by department, sorted from highest to lowest
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC;
WHERE vs HAVING — Filtering Rows vs Filtering Groups
This is one of the most important concepts in this lesson:
| Feature | WHERE | HAVING |
|---|---|---|
| Filters | Rows (before grouping) | Groups (after grouping) |
| Execution timing | Before GROUP BY |
After GROUP BY |
| Can use aggregate functions | ❌ No | ✅ Yes |
| Usage | Can be used alone | Must be used with GROUP BY |
-- WHERE: Filter rows first, then group
-- Find departments with more than 3 employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
WHERE salary > 5000 -- First exclude rows with salary below 5000
GROUP BY department_id;
-- HAVING: Group first, then filter groups
-- Find departments with 3 or more employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
HAVING COUNT(*) >= 3; -- Filter out groups with fewer than 3 members
-- Combining WHERE + HAVING
-- Find departments with employees hired in 2024 and average salary > 10000
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01' -- Filter rows first
GROUP BY department_id
HAVING AVG(salary) > 10000; -- Then filter groups
Execution Order
The writing order and execution order of SQL queries differ:
Writing order: SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
Understanding execution order is crucial for writing correct SQL:
FROM— Determine the data sourceWHERE— Filter row by rowGROUP BY— GroupHAVING— Filter groupsSELECT— Select columns, compute expressionsORDER BY— Sort
WHERE — because when WHERE executes, GROUP BY hasn't executed yet, so the aggregate results don't exist.
📝 Basic Syntax
-- Basic grouping
SELECT column_name, aggregate_function(column_name)
FROM table_name
GROUP BY column_name;
-- Multi-column grouping
SELECT column1, column2, aggregate_function(column_name)
FROM table_name
GROUP BY column1, column2;
-- WHERE + GROUP BY + HAVING
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE row_level_condition
GROUP BY column_name
HAVING aggregate_level_condition
ORDER BY sort_column;
- Columns in
GROUP BYdon't need to match the order of columns inSELECT, but must include all non-aggregated columns HAVINGcan use aggregate function aliases (supported in MySQL and PostgreSQL), but standard SQL requires writing the full expressionNULLvalues are grouped together inGROUP BY
📌 Examples
Example: Employee Statistics by Department
SELECT d.department_name,
COUNT(e.employee_id) AS headcount,
AVG(e.salary) AS avg_salary,
SUM(e.salary) AS total_salary
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY total_salary DESC;
Output:
department_name headcount avg_salary total_salary
--------------- --------- ---------- ------------
Technology 3 17666.67 53000.00
Finance 2 13500.00 27000.00
Marketing 2 11500.00 23000.00
Administration 1 9000.00 9000.00
LEFT JOIN ensures that departments without employees are also displayed (with a headcount of 0). GROUP BY department_name groups by department name.
Example: WHERE + GROUP BY + HAVING Combined
-- Among employees hired in 2024, find departments with average salary > 12000
SELECT department_id,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01'
GROUP BY department_id
HAVING AVG(salary) > 12000
ORDER BY avg_salary DESC;
Output:
department_id headcount avg_salary
------------- --------- ----------
1 2 18500.00
3 1 14000.00
WHERE hire_date >= '2024-01-01'— First filter for employees hired in 2024GROUP BY department_id— Group by departmentHAVING AVG(salary) > 12000— Keep only groups with average salary > 12000ORDER BY avg_salary DESC— Sort by average salary in descending order
Example: Multi-Column Grouping + Complex Conditions
-- Employee statistics by department and status, only showing groups with 2+ members
SELECT d.department_name,
e.status,
COUNT(*) AS headcount,
MIN(e.salary) AS lowest_salary,
MAX(e.salary) AS highest_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 0
GROUP BY d.department_name, e.status
HAVING COUNT(*) >= 2
ORDER BY d.department_name, headcount DESC;
Output:
department_name status headcount lowest_salary highest_salary
--------------- ------ --------- ------------- --------------
Finance active 2 12000.00 14000.00
Marketing active 2 10000.00 12000.00
Technology active 3 15000.00 20000.00
WHERE to exclude rows with zero salary, then group by department + status, HAVING filters out groups with fewer than 2 members, and finally sort.
🎬 Practice Scenarios
Scenario 1: Sales Report — Monthly Order Statistics
Generate a monthly sales report, showing only months with 3 or more orders.
SELECT
EXTRACT(YEAR FROM order_date) AS year,
EXTRACT(MONTH FROM order_date) AS month,
COUNT(*) AS order_count,
SUM(total_amount) AS total_amount,
AVG(total_amount) AS avg_order_amount
FROM orders
WHERE status IN ('completed', 'shipped')
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
HAVING COUNT(*) >= 3
ORDER BY year, month;
WHERE filters for completed and shipped orders, group by year and month for statistics, HAVING filters out months with fewer than 3 orders.
Scenario 2: HR Analysis — Finding High-Paying Departments
Find departments with average salary higher than the company-wide average.
SELECT d.department_name,
COUNT(e.employee_id) AS headcount,
AVG(e.salary) AS dept_avg_salary
FROM departments d
JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
HAVING AVG(e.salary) > (SELECT AVG(salary) FROM employees)
ORDER BY dept_avg_salary DESC;
HAVING. First calculate each department's average salary, then compare it with the company-wide average. This is a combined application of aggregate functions and subqueries.
❓ FAQ
Q: Must columns in SELECT appear in GROUP BY? A: Yes, all non-aggregated columns must appear in
GROUP BY. This is a SQL standard requirement. For example,SELECT a, b, COUNT(*) FROM t GROUP BY awill cause an error becausebis neither inGROUP BYnor wrapped in an aggregate function.
Q: Can HAVING use aliases from SELECT? A: MySQL and PostgreSQL support using aliases in
HAVING, but SQL Server does not. For compatibility, it's recommended to write the full expression:HAVING AVG(salary) > 10000instead ofHAVING avg_salary > 10000.
Q: How are NULL values handled in GROUP BY? A:
NULLvalues are grouped together. Ifdepartment_idhas NULL values, all rows with NULL will be placed in the same group. This is usually useful, but be careful to distinguish between "no department" and "department ID is 0."
Q: Can WHERE and HAVING be used together? A: Yes, and it's very common.
WHEREfilters rows before grouping,HAVINGfilters groups after grouping. Execution order:WHERE→GROUP BY→HAVING.
📖 Summary
GROUP BYsplits data into groups and applies aggregate functions to each group- Non-aggregated columns in
SELECTmust appear inGROUP BY - Multi-column grouping: group by combinations of multiple column values
WHERE: filters rows before grouping, cannot use aggregate functionsHAVING: filters groups after grouping, can use aggregate functions- Execution order:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY NULLvalues are grouped together
📝 Exercises
Exercise 1 (⭐): Query the headcount and average salary for each department, sorted by average salary from highest to lowest.
Exercise 2 (⭐⭐): Query the number of employees hired after 2024 in each department, showing only departments with 2 or more employees.
Exercise 3 (⭐⭐⭐): Query the highest-paid employee in each department. Display department name, employee name, and salary. Hint: this can be achieved using a subquery or the ROW_NUMBER() window function (covered later).
Next Lesson
👉 15-advanced-functions - Advanced Functions: Learn SQL advanced functions, master string processing, numeric calculations, date operations, and other practical skills!



