Aggregate Functions

Aggregate Functions

🌍 Real-World Analogy

Imagine you are a supermarket manager looking at a pile of sales receipts:

Aggregate functions are the "calculators" in SQL, helping you quickly compute summary results from large amounts of data.


🎯 Core Concepts

COUNT — Counting

COUNT is used to count rows. There are two common forms:

SQL
-- COUNT(*): Count all rows (including rows with NULL)
SELECT COUNT(*) AS total_employees FROM employees;

-- COUNT(column_name): Count rows where the column is not NULL
SELECT COUNT(email) AS has_email FROM employees;
Syntax Description
COUNT(*) Count all rows, regardless of NULL
COUNT(column_name) Count only rows where the column is not NULL
COUNT(DISTINCT column_name) Count distinct non-NULL values in the column

SUM — Summing

Calculates the total of a numeric column, automatically ignoring NULL values.

SQL
SELECT SUM(salary) AS total_salary FROM employees;

AVG — Average

Calculates the average of a numeric column, automatically ignoring NULL values (NULLs are not included in the calculation).

SQL
SELECT AVG(salary) AS avg_salary FROM employees;

MAX / MIN — Maximum / Minimum

Finds the maximum or minimum value in a column, automatically ignoring NULL values.

SQL
SELECT MAX(salary) AS highest, MIN(salary) AS lowest FROM employees;

Aggregate Functions and NULL Behavior

This is the most common pitfall for beginners:

Function How it handles NULL
COUNT(*) NULL is counted (counts rows)
COUNT(column_name) Ignores NULL
SUM(column_name) Ignores NULL
AVG(column_name) Ignores NULL (denominator excludes NULL rows)
MAX(column_name) Ignores NULL
MIN(column_name) Ignores NULL
SQL
-- Suppose the employees table has 10 rows, with 3 NULLs in the email column
SELECT COUNT(*) AS total,        -- 10
       COUNT(email) AS has_email -- 7
FROM employees;
⚠️ Note: When calculating AVG, the denominator is the number of non-NULL rows, not the total number of rows. If 2 out of 5 employees have NULL salaries, AVG(salary) calculates the average using only the 3 non-NULL salaries.

DISTINCT Deduplication in Aggregation

Using DISTINCT within aggregate functions deduplicates before aggregating:

SQL
-- How many distinct departments have employees?
SELECT COUNT(DISTINCT department_id) AS dept_count FROM employees;

-- Sum of all distinct salaries (deduplicated sum)
SELECT SUM(DISTINCT salary) AS unique_salary_sum FROM employees;

📝 Basic Syntax

SQL
-- Basic aggregate function syntax
SELECT aggregate_function(column_name) AS alias
FROM table_name
WHERE condition;

-- Multiple aggregate functions can be used simultaneously
SELECT COUNT(*) AS total_count,
       SUM(column_name) AS total_sum,
       AVG(column_name) AS average,
       MAX(column_name) AS maximum,
       MIN(column_name) AS minimum
FROM table_name;

-- DISTINCT deduplication in aggregation
SELECT COUNT(DISTINCT column_name) AS distinct_count
FROM table_name;
💡 Tip:

  • Aggregate functions are typically used with GROUP BY (covered in the next lesson), but can also be used alone (aggregating the entire table)
  • Aggregate functions cannot be used directly in WHERE conditions — use HAVING instead (covered later)
  • NULL is automatically skipped in aggregate operations — this is both a feature and a pitfall, so be mindful

📌 Examples

Example: Employee Basic Statistics

SQL
SELECT COUNT(*) AS total_employees,
       COUNT(email) AS has_email,
       SUM(salary) AS total_salary,
       AVG(salary) AS avg_salary,
       MAX(salary) AS highest_salary,
       MIN(salary) AS lowest_salary
FROM employees;
▶ Try it Yourself

Output:

TEXT
total_employees  has_email  total_salary  avg_salary  highest_salary  lowest_salary
---------------  ---------  -----------  ----------  --------------  -------------
8                7          112000.00    14000.00    20000.00        9000.00
💡 Interpretation: Out of 8 employees, 7 have email addresses (1 has a NULL email). The average salary is 14000.


Example: Conditional Aggregation + DISTINCT

SQL
-- How many employees are in the Technology department? How many distinct salaries?
SELECT COUNT(*) AS tech_dept_count,
       COUNT(DISTINCT salary) AS distinct_salary_count,
       AVG(salary) AS tech_dept_avg_salary
FROM employees
WHERE department_id = 1;
▶ Try it Yourself

Output:

TEXT
tech_dept_count  distinct_salary_count  tech_dept_avg_salary
---------------  ---------------------  --------------------
3                3                      17666.67
💡 Interpretation: The Technology department has 3 employees with 3 distinct salaries, and an average salary of 17666.67. WHERE first filters for Technology department employees, then the aggregation is performed on the filtered results.


Example: Multiple Aggregate Functions + Conditional Filtering

SQL
-- Order amount statistics: only completed orders
SELECT COUNT(*) AS order_count,
       SUM(total_amount) AS total_amount,
       AVG(total_amount) AS avg_order_amount,
       MAX(total_amount) AS largest_order,
       MIN(total_amount) AS smallest_order
FROM orders
WHERE status = 'completed';
▶ Try it Yourself

Output:

TEXT
order_count  total_amount  avg_order_amount  largest_order  smallest_order
-----------  -----------  ----------------  -------------  --------------
5            42500.00     8500.00           15000.00       2800.00
💡 Interpretation: WHERE first filters for completed orders, then performs aggregate calculations on those orders. Note that WHERE executes before aggregation.


🎬 Practice Scenarios

Scenario 1: HR Report — Department Salary Overview

HR needs a report to understand the company's overall salary situation and how many employees have contact information.

SQL
SELECT 
    COUNT(*) AS total_headcount,
    COUNT(email) + COUNT(phone) AS total_contacts,
    SUM(salary) AS annual_salary_expense,
    AVG(salary) AS avg_monthly_salary,
    MAX(salary) - MIN(salary) AS max_salary_gap
FROM employees
WHERE hire_date >= '2024-01-01';
💡 Approach: Use WHERE to filter for employees hired after 2024, then use multiple aggregate functions to summarize data from different dimensions.

Scenario 2: Sales Analysis — Order Value Distribution

Analyze the value distribution of all shipped orders.

SQL
SELECT 
    COUNT(*) AS shipped_order_count,
    COUNT(DISTINCT customer_id) AS unique_customer_count,
    SUM(total_amount) AS total_sales,
    AVG(total_amount) AS avg_order_value,
    MAX(total_amount) AS largest_order,
    MIN(total_amount) AS smallest_order
FROM orders
WHERE status = 'shipped';
💡 Approach: COUNT(DISTINCT customer_id) helps us understand how many unique customers have placed orders, rather than just the order count.


❓ FAQ

Q: Is there a difference between COUNT(*) and COUNT(1)? A: In modern databases there is virtually no difference — the optimizer treats them as the same execution plan. COUNT(*) is the more standard syntax and is recommended.

Q: How does AVG handle NULL? A: AVG ignores NULL values. If 2 out of 5 employees have NULL salaries, AVG(salary) calculates using only the 3 non-NULL salaries, not dividing by 5. To treat NULL as 0, use AVG(COALESCE(salary, 0)).

Q: Can aggregate functions be nested? A: Not directly. For example, MAX(AVG(salary)) will cause an error. To find "the department with the highest average salary," use GROUP BY + ORDER BY AVG(salary) DESC LIMIT 1, or use a subquery.

Q: What does SUM return for an empty table? A: SUM, AVG, MAX, and MIN return NULL for an empty table or when all values are NULL, not 0. COUNT returns 0. To return 0 instead, use COALESCE(SUM(column_name), 0).


📖 Summary

Function Purpose NULL Handling Works with DISTINCT
COUNT(*) Count all rows Counted N/A
COUNT(col) Count non-NULL rows Ignored
SUM(col) Sum Ignored
AVG(col) Average Ignored (excluded from denominator)
MAX(col) Maximum Ignored
MIN(col) Minimum Ignored

📝 Exercises

Exercise 1 (⭐): Query the employees table to find the total headcount, highest salary, lowest salary, and average salary.

Exercise 2 (⭐⭐): Query the orders table to find the number of completed orders, total amount, and average amount for 2025. Requirement: only count orders with status = 'completed'.

Exercise 3 (⭐⭐⭐): Query the employees table to count employees in each salary tier. Rules: salary >= 15000 is "High", >= 10000 is "Medium", and the rest are "Junior". Hint: use CASE WHEN with aggregate functions.


Next Lesson

👉 14-group-by - Grouping Queries: Learn GROUP BY grouping queries, master category-based statistics, HAVING group filtering, and more!

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏