Aggregate Functions
Aggregate Functions
🌍 Real-World Analogy
Imagine you are a supermarket manager looking at a pile of sales receipts:
- COUNT = Count how many receipts there are today (counting)
- SUM = Add up the amounts from all receipts (summing)
- AVG = Calculate the average amount per receipt (averaging)
- MAX = Find the receipt with the highest amount (maximum)
- MIN = Find the receipt with the lowest amount (minimum)
Aggregate functions are the "calculators" in SQL, helping you quickly compute summary results from large amounts of data.
🎯 Core Concepts
COUNT — Counting
COUNT is used to count rows. There are two common forms:
-- COUNT(*): Count all rows (including rows with NULL)
SELECT COUNT(*) AS total_employees FROM employees;
-- COUNT(column_name): Count rows where the column is not NULL
SELECT COUNT(email) AS has_email FROM employees;
| Syntax | Description |
|---|---|
COUNT(*) |
Count all rows, regardless of NULL |
COUNT(column_name) |
Count only rows where the column is not NULL |
COUNT(DISTINCT column_name) |
Count distinct non-NULL values in the column |
SUM — Summing
Calculates the total of a numeric column, automatically ignoring NULL values.
SELECT SUM(salary) AS total_salary FROM employees;
AVG — Average
Calculates the average of a numeric column, automatically ignoring NULL values (NULLs are not included in the calculation).
SELECT AVG(salary) AS avg_salary FROM employees;
MAX / MIN — Maximum / Minimum
Finds the maximum or minimum value in a column, automatically ignoring NULL values.
SELECT MAX(salary) AS highest, MIN(salary) AS lowest FROM employees;
Aggregate Functions and NULL Behavior
This is the most common pitfall for beginners:
| Function | How it handles NULL |
|---|---|
COUNT(*) |
NULL is counted (counts rows) |
COUNT(column_name) |
Ignores NULL |
SUM(column_name) |
Ignores NULL |
AVG(column_name) |
Ignores NULL (denominator excludes NULL rows) |
MAX(column_name) |
Ignores NULL |
MIN(column_name) |
Ignores NULL |
-- Suppose the employees table has 10 rows, with 3 NULLs in the email column
SELECT COUNT(*) AS total, -- 10
COUNT(email) AS has_email -- 7
FROM employees;
AVG, the denominator is the number of non-NULL rows, not the total number of rows. If 2 out of 5 employees have NULL salaries, AVG(salary) calculates the average using only the 3 non-NULL salaries.
DISTINCT Deduplication in Aggregation
Using DISTINCT within aggregate functions deduplicates before aggregating:
-- How many distinct departments have employees?
SELECT COUNT(DISTINCT department_id) AS dept_count FROM employees;
-- Sum of all distinct salaries (deduplicated sum)
SELECT SUM(DISTINCT salary) AS unique_salary_sum FROM employees;
📝 Basic Syntax
-- Basic aggregate function syntax
SELECT aggregate_function(column_name) AS alias
FROM table_name
WHERE condition;
-- Multiple aggregate functions can be used simultaneously
SELECT COUNT(*) AS total_count,
SUM(column_name) AS total_sum,
AVG(column_name) AS average,
MAX(column_name) AS maximum,
MIN(column_name) AS minimum
FROM table_name;
-- DISTINCT deduplication in aggregation
SELECT COUNT(DISTINCT column_name) AS distinct_count
FROM table_name;
- Aggregate functions are typically used with
GROUP BY(covered in the next lesson), but can also be used alone (aggregating the entire table) - Aggregate functions cannot be used directly in
WHEREconditions — useHAVINGinstead (covered later) NULLis automatically skipped in aggregate operations — this is both a feature and a pitfall, so be mindful
📌 Examples
Example: Employee Basic Statistics
SELECT COUNT(*) AS total_employees,
COUNT(email) AS has_email,
SUM(salary) AS total_salary,
AVG(salary) AS avg_salary,
MAX(salary) AS highest_salary,
MIN(salary) AS lowest_salary
FROM employees;
Output:
total_employees has_email total_salary avg_salary highest_salary lowest_salary
--------------- --------- ----------- ---------- -------------- -------------
8 7 112000.00 14000.00 20000.00 9000.00
Example: Conditional Aggregation + DISTINCT
-- How many employees are in the Technology department? How many distinct salaries?
SELECT COUNT(*) AS tech_dept_count,
COUNT(DISTINCT salary) AS distinct_salary_count,
AVG(salary) AS tech_dept_avg_salary
FROM employees
WHERE department_id = 1;
Output:
tech_dept_count distinct_salary_count tech_dept_avg_salary
--------------- --------------------- --------------------
3 3 17666.67
WHERE first filters for Technology department employees, then the aggregation is performed on the filtered results.
Example: Multiple Aggregate Functions + Conditional Filtering
-- Order amount statistics: only completed orders
SELECT COUNT(*) AS order_count,
SUM(total_amount) AS total_amount,
AVG(total_amount) AS avg_order_amount,
MAX(total_amount) AS largest_order,
MIN(total_amount) AS smallest_order
FROM orders
WHERE status = 'completed';
Output:
order_count total_amount avg_order_amount largest_order smallest_order
----------- ----------- ---------------- ------------- --------------
5 42500.00 8500.00 15000.00 2800.00
WHERE first filters for completed orders, then performs aggregate calculations on those orders. Note that WHERE executes before aggregation.
🎬 Practice Scenarios
Scenario 1: HR Report — Department Salary Overview
HR needs a report to understand the company's overall salary situation and how many employees have contact information.
SELECT
COUNT(*) AS total_headcount,
COUNT(email) + COUNT(phone) AS total_contacts,
SUM(salary) AS annual_salary_expense,
AVG(salary) AS avg_monthly_salary,
MAX(salary) - MIN(salary) AS max_salary_gap
FROM employees
WHERE hire_date >= '2024-01-01';
WHERE to filter for employees hired after 2024, then use multiple aggregate functions to summarize data from different dimensions.
Scenario 2: Sales Analysis — Order Value Distribution
Analyze the value distribution of all shipped orders.
SELECT
COUNT(*) AS shipped_order_count,
COUNT(DISTINCT customer_id) AS unique_customer_count,
SUM(total_amount) AS total_sales,
AVG(total_amount) AS avg_order_value,
MAX(total_amount) AS largest_order,
MIN(total_amount) AS smallest_order
FROM orders
WHERE status = 'shipped';
COUNT(DISTINCT customer_id) helps us understand how many unique customers have placed orders, rather than just the order count.
❓ FAQ
Q: Is there a difference between COUNT(*) and COUNT(1)? A: In modern databases there is virtually no difference — the optimizer treats them as the same execution plan.
COUNT(*)is the more standard syntax and is recommended.
Q: How does AVG handle NULL? A:
AVGignores NULL values. If 2 out of 5 employees have NULL salaries,AVG(salary)calculates using only the 3 non-NULL salaries, not dividing by 5. To treat NULL as 0, useAVG(COALESCE(salary, 0)).
Q: Can aggregate functions be nested? A: Not directly. For example,
MAX(AVG(salary))will cause an error. To find "the department with the highest average salary," useGROUP BY+ORDER BY AVG(salary) DESC LIMIT 1, or use a subquery.
Q: What does SUM return for an empty table? A:
SUM,AVG,MAX, andMINreturnNULLfor an empty table or when all values are NULL, not 0.COUNTreturns 0. To return 0 instead, useCOALESCE(SUM(column_name), 0).
📖 Summary
| Function | Purpose | NULL Handling | Works with DISTINCT |
|---|---|---|---|
| COUNT(*) | Count all rows | Counted | N/A |
| COUNT(col) | Count non-NULL rows | Ignored | ✅ |
| SUM(col) | Sum | Ignored | ✅ |
| AVG(col) | Average | Ignored (excluded from denominator) | ✅ |
| MAX(col) | Maximum | Ignored | ✅ |
| MIN(col) | Minimum | Ignored | ✅ |
- Aggregate functions are SQL's "statistical tools" that compress multiple rows into a single summary value
- Aggregate functions automatically ignore NULL, except
COUNT(*) DISTINCTcan deduplicate before aggregation- Aggregate functions cannot be used in
WHERE— useHAVINGinstead (covered in the next lesson)
📝 Exercises
Exercise 1 (⭐): Query the employees table to find the total headcount, highest salary, lowest salary, and average salary.
Exercise 2 (⭐⭐): Query the orders table to find the number of completed orders, total amount, and average amount for 2025. Requirement: only count orders with status = 'completed'.
Exercise 3 (⭐⭐⭐): Query the employees table to count employees in each salary tier. Rules: salary >= 15000 is "High", >= 10000 is "Medium", and the rest are "Junior". Hint: use CASE WHEN with aggregate functions.
Next Lesson
👉 14-group-by - Grouping Queries: Learn GROUP BY grouping queries, master category-based statistics, HAVING group filtering, and more!



