Grouping Queries

Q: Q: Must columns in SELECT appear in GROUP BY?

A: Yes, all non-aggregated columns must appear in GROUP BY. This is a SQL standard requirement. For example, SELECT a, b, COUNT(*) FROM t GROUP BY a will cause an error because b is neither in GROUP BY nor wrapped in an aggregate function.

Q: Q: Can HAVING use aliases from SELECT?

A: MySQL and PostgreSQL support using aliases in HAVING, but SQL Server does not. For compatibility, it's recommended to write the full expression: HAVING AVG(salary) > 10000 instead of HAVING avg_salary > 10000.

Q: Q: How are NULL values handled in GROUP BY?

A: NULL values are grouped together. If department_id has NULL values, all rows with NULL will be placed in the same group. This is usually useful, but be careful to distinguish between "no department" and "department ID is 0."

Q: Q: Can WHERE and HAVING be used together?

A: Yes, and it's very common. WHERE filters rows before grouping, HAVING filters groups after grouping. Execution order: WHERE → GROUP BY → HAVING.

🌍 Real-World Analogy

Imagine you are a teacher with a stack of report cards. You want to know:

Each student's average score → Group by student, calculate average
The highest score for each subject → Group by subject, find the maximum
The pass rate for each class → Group by class, calculate the ratio

This is what GROUP BY does: first split the data into groups, then perform statistics on each group. The aggregate functions learned in the previous lesson calculate a single value for the entire table, while GROUP BY calculates values by category.

🎯 Core Concepts

GROUP BY Basic Syntax

SQL

SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column_name;

Execution logic: First group by the GROUP BY column, then apply the aggregate function to each group.

SQL

-- How many employees are in each department?
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id;

Output:

TEXT

department_id  headcount
-------------  ---------
1              3
2              2
3              2
NULL           1

💡 Rule: Columns in SELECT must either appear in GROUP BY or be wrapped in an aggregate function. Otherwise, the semantics are ambiguous and the database will report an error.

Post-Group Aggregation

The core pattern of GROUP BY is: Group → Aggregate.

SQL

-- Average salary per department
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id;

Common aggregate combinations:

SQL

SELECT department_id,
       COUNT(*) AS headcount,
       SUM(salary) AS total_salary,
       AVG(salary) AS avg_salary,
       MAX(salary) AS highest_salary,
       MIN(salary) AS lowest_salary
FROM employees
GROUP BY department_id;

Multi-Column Grouping

You can group by multiple columns — rows are considered in the same group only when the values of all specified columns match:

SQL

-- Employee count by department and status
SELECT department_id, status, COUNT(*) AS headcount
FROM employees
GROUP BY department_id, status;

Output:

TEXT

department_id  status    headcount
-------------  --------  ---------
1              active    3
2              active    1
2              inactive  1
3              active    2

💡 Understanding: Multi-column grouping is similar to multi-level subtotals in Excel. First group by department, then by status — each department + status combination forms a group.

GROUP BY + ORDER BY

You can sort after grouping:

SQL

-- Average salary by department, sorted from highest to lowest
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
ORDER BY avg_salary DESC;

WHERE vs HAVING — Filtering Rows vs Filtering Groups

This is one of the most important concepts in this lesson:

Feature	WHERE	HAVING
Filters	Rows (before grouping)	Groups (after grouping)
Execution timing	Before `GROUP BY`	After `GROUP BY`
Can use aggregate functions	❌ No	✅ Yes
Usage	Can be used alone	Must be used with `GROUP BY`

SQL

-- WHERE: Filter rows first, then group
-- Find departments with more than 3 employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
WHERE salary > 5000       -- First exclude rows with salary below 5000
GROUP BY department_id;

-- HAVING: Group first, then filter groups
-- Find departments with 3 or more employees
SELECT department_id, COUNT(*) AS headcount
FROM employees
GROUP BY department_id
HAVING COUNT(*) >= 3;     -- Filter out groups with fewer than 3 members

SQL

-- Combining WHERE + HAVING
-- Find departments with employees hired in 2024 and average salary > 10000
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01'   -- Filter rows first
GROUP BY department_id
HAVING AVG(salary) > 10000;      -- Then filter groups

Execution Order

The writing order and execution order of SQL queries differ:

TEXT

Writing order:  SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY

Understanding execution order is crucial for writing correct SQL:

FROM — Determine the data source
WHERE — Filter row by row
GROUP BY — Group
HAVING — Filter groups
SELECT — Select columns, compute expressions
ORDER BY — Sort

💡 Key: This is why aggregate functions cannot be used in WHERE — because when WHERE executes, GROUP BY hasn't executed yet, so the aggregate results don't exist.

📝 Basic Syntax

SQL

-- Basic grouping
SELECT column_name, aggregate_function(column_name)
FROM table_name
GROUP BY column_name;

-- Multi-column grouping
SELECT column1, column2, aggregate_function(column_name)
FROM table_name
GROUP BY column1, column2;

-- WHERE + GROUP BY + HAVING
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE row_level_condition
GROUP BY column_name
HAVING aggregate_level_condition
ORDER BY sort_column;

💡 Tip:

Columns in GROUP BY don't need to match the order of columns in SELECT, but must include all non-aggregated columns

HAVING can use aggregate function aliases (supported in MySQL and PostgreSQL), but standard SQL requires writing the full expression

NULL values are grouped together in GROUP BY

📌 Examples

Example: Employee Statistics by Department

SQL

SELECT d.department_name,
       COUNT(e.employee_id) AS headcount,
       AVG(e.salary) AS avg_salary,
       SUM(e.salary) AS total_salary
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
GROUP BY d.department_name
ORDER BY total_salary DESC;

▶ Try it Yourself

Output:

TEXT

department_name  headcount  avg_salary  total_salary
---------------  ---------  ----------  ------------
Technology       3          17666.67    53000.00
Finance          2          13500.00    27000.00
Marketing        2          11500.00    23000.00
Administration   1          9000.00     9000.00

💡 Interpretation: Using LEFT JOIN ensures that departments without employees are also displayed (with a headcount of 0). GROUP BY department_name groups by department name.

Example: WHERE + GROUP BY + HAVING Combined

SQL

-- Among employees hired in 2024, find departments with average salary > 12000
SELECT department_id,
       COUNT(*) AS headcount,
       AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2024-01-01'
GROUP BY department_id
HAVING AVG(salary) > 12000
ORDER BY avg_salary DESC;

▶ Try it Yourself

Output:

TEXT

department_id  headcount  avg_salary
-------------  ---------  ----------
1              2          18500.00
3              1          14000.00

💡 Execution process:

WHERE hire_date >= '2024-01-01' — First filter for employees hired in 2024

GROUP BY department_id — Group by department

HAVING AVG(salary) > 12000 — Keep only groups with average salary > 12000

ORDER BY avg_salary DESC — Sort by average salary in descending order

Example: Multi-Column Grouping + Complex Conditions

SQL

-- Employee statistics by department and status, only showing groups with 2+ members
SELECT d.department_name,
       e.status,
       COUNT(*) AS headcount,
       MIN(e.salary) AS lowest_salary,
       MAX(e.salary) AS highest_salary
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 0
GROUP BY d.department_name, e.status
HAVING COUNT(*) >= 2
ORDER BY d.department_name, headcount DESC;

▶ Try it Yourself

Output:

TEXT

department_name  status  headcount  lowest_salary  highest_salary
---------------  ------  ---------  -------------  --------------
Finance          active  2          12000.00       14000.00
Marketing        active  2          10000.00       12000.00
Technology       active  3          15000.00       20000.00

💡 Interpretation: First use WHERE to exclude rows with zero salary, then group by department + status, HAVING filters out groups with fewer than 2 members, and finally sort.

🎬 Practice Scenarios

Scenario 1: Sales Report — Monthly Order Statistics

Generate a monthly sales report, showing only months with 3 or more orders.

SQL

SELECT 
    EXTRACT(YEAR FROM order_date) AS year,
    EXTRACT(MONTH FROM order_date) AS month,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_amount,
    AVG(total_amount) AS avg_order_amount
FROM orders
WHERE status IN ('completed', 'shipped')
GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date)
HAVING COUNT(*) >= 3
ORDER BY year, month;

💡 Approach: WHERE filters for completed and shipped orders, group by year and month for statistics, HAVING filters out months with fewer than 3 orders.

Scenario 2: HR Analysis — Finding High-Paying Departments

Find departments with average salary higher than the company-wide average.

SQL

SELECT d.department_name,
       COUNT(e.employee_id) AS headcount,
       AVG(e.salary) AS dept_avg_salary
FROM departments d
JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name
HAVING AVG(e.salary) > (SELECT AVG(salary) FROM employees)
ORDER BY dept_avg_salary DESC;

💡 Approach: Subqueries can be used in HAVING. First calculate each department's average salary, then compare it with the company-wide average. This is a combined application of aggregate functions and subqueries.

❓ FAQ

Q: Must columns in SELECT appear in GROUP BY? A: Yes, all non-aggregated columns must appear in GROUP BY. This is a SQL standard requirement. For example, SELECT a, b, COUNT(*) FROM t GROUP BY a will cause an error because b is neither in GROUP BY nor wrapped in an aggregate function.

Q: Can HAVING use aliases from SELECT? A: MySQL and PostgreSQL support using aliases in HAVING, but SQL Server does not. For compatibility, it's recommended to write the full expression: HAVING AVG(salary) > 10000 instead of HAVING avg_salary > 10000.

Q: How are NULL values handled in GROUP BY? A: NULL values are grouped together. If department_id has NULL values, all rows with NULL will be placed in the same group. This is usually useful, but be careful to distinguish between "no department" and "department ID is 0."

Q: Can WHERE and HAVING be used together? A: Yes, and it's very common. WHERE filters rows before grouping, HAVING filters groups after grouping. Execution order: WHERE → GROUP BY → HAVING.

📖 Summary

GROUP BY splits data into groups and applies aggregate functions to each group
Non-aggregated columns in SELECT must appear in GROUP BY
Multi-column grouping: group by combinations of multiple column values
WHERE: filters rows before grouping, cannot use aggregate functions
HAVING: filters groups after grouping, can use aggregate functions
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
NULL values are grouped together

📝 Exercises

Exercise 1 (⭐): Query the headcount and average salary for each department, sorted by average salary from highest to lowest.

Exercise 2 (⭐⭐): Query the number of employees hired after 2024 in each department, showing only departments with 2 or more employees.

Exercise 3 (⭐⭐⭐): Query the highest-paid employee in each department. Display department name, employee name, and salary. Hint: this can be achieved using a subquery or the ROW_NUMBER() window function (covered later).

Next Lesson

👉 15-advanced-functions - Advanced Functions: Learn SQL advanced functions, master string processing, numeric calculations, date operations, and other practical skills!

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

About Us

Grouping Queries

Grouping Queries

🌍 Real-World Analogy

🎯 Core Concepts

GROUP BY Basic Syntax

Post-Group Aggregation

Multi-Column Grouping

GROUP BY + ORDER BY

WHERE vs HAVING — Filtering Rows vs Filtering Groups

Execution Order

📝 Basic Syntax

📌 Examples

Example: Employee Statistics by Department

Example: WHERE + GROUP BY + HAVING Combined

Example: Multi-Column Grouping + Complex Conditions

🎬 Practice Scenarios

Scenario 1: Sales Report — Monthly Order Statistics

Scenario 2: HR Analysis — Finding High-Paying Departments

❓ FAQ

📖 Summary

📝 Exercises

Next Lesson

🙏 帮我们做得更好