Window Functions

Q: Q: Can window functions be used in the WHERE clause?

A: No. WHERE is executed before window functions, so the window function hasn't been calculated yet. You need to first compute the window function using a subquery or CTE, then filter with WHERE in the outer query.

Q: Q: When should I use ROW_NUMBER vs RANK?

A: Use ROW_NUMBER when you need "unique numbering, no ties" (e.g., pagination, deduplication); use RANK when you need "rank by value, allowing ties" (e.g., grade ranking, sales ranking).

Q: Q: What's the difference between PARTITION BY and GROUP BY?

A: GROUP BY merges multiple rows into one (reducing row count); PARTITION BY only defines the window scope without reducing rows (each row is preserved). Both can be used together: first aggregate with GROUP BY, then use window functions to append rankings.

🌍 Real-World Analogy

Imagine a class report card:

GROUP BY — Group the entire class by gender, calculate the average score for boys and girls, resulting in only two rows
Window function — On each student's report card, additionally write "your rank among boys" and "how much higher than the boys' average," while retaining each person's information

Window functions are like a "magnifying glass" — they don't reduce the number of rows, yet let you see aggregate statistics on every row.

🎯 Core Concepts

What Is a Window Function

Window functions perform aggregation or ranking calculations over a "window" of data without reducing the row count. Each row can see its own details as well as the summary information within its group.

The OVER Clause

OVER() is the core of window functions — it defines the scope of the "window":

SQL

function() OVER (
    [PARTITION BY column]    -- Which field to partition (group) by
    [ORDER BY column]        -- How to sort within the partition
    [ROWS/RANGE frame]       -- Optional: further restrict the row range
)

ROW_NUMBER / RANK / DENSE_RANK — Ranking Functions

SQL

-- ROW_NUMBER: strictly increasing sequence, no ties (1, 2, 3, 4)
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- RANK: allows ties, with gaps (1, 2, 2, 4)
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- DENSE_RANK: allows ties, no gaps (1, 2, 2, 3)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

Function	Ties Allowed	Gaps	Example Result
ROW_NUMBER	No	—	1, 2, 3, 4
RANK	Yes	Yes	1, 2, 2, 4
DENSE_RANK	Yes	No	1, 2, 2, 3

NTILE — Bucketing

Evenly divide data into N groups:

SQL

-- Divide employees into 4 tiers by salary (quartiles)
NTILE(4) OVER (ORDER BY salary DESC)

LAG / LEAD — Accessing Adjacent Rows

SQL

-- LAG: access the previous row
LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)

-- LEAD: access the next row
LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)

Function	Direction	Use Case
LAG(col, n)	Look back n rows	Calculate period-over-period change, compare with previous record
LEAD(col, n)	Look forward n rows	Predict the next record

SUM / AVG / COUNT OVER — Aggregate Window Functions

SQL

-- Each row shows the department's total salary
SUM(salary) OVER (PARTITION BY department_id)

-- Each row shows the department's average salary
AVG(salary) OVER (PARTITION BY department_id)

-- Each row shows the department's headcount
COUNT(*) OVER (PARTITION BY department_id)

PARTITION BY and ORDER BY in Window

SQL

-- PARTITION BY: define the window scope (similar to GROUP BY, but doesn't collapse rows)
-- ORDER BY: sort within the window, affects ranking and running calculations

-- Example: group by department, rank by salary descending within each department
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- Example: running sum (ordered by hire date, accumulating salary row by row)
SUM(salary) OVER (ORDER BY hire_date)

Window Functions vs GROUP BY

Feature	GROUP BY	Window Function
Row count	Reduced (merged into one row)	Unchanged (all rows preserved)
Detail data	Lost	Preserved
Use case	Summary statistics	Ranking, comparison, running calculations
View detail and summary simultaneously	❌ No	✅ Yes

💡 Rule of thumb: Use window functions when you need to "keep details after aggregation"; use GROUP BY when you only need summary results.

📝 Basic Syntax

SQL

-- Window function general syntax
function() OVER (
    [PARTITION BY col1, col2]
    [ORDER BY col3 [ASC|DESC]]
    [ROWS BETWEEN ... AND ...]
)

-- Ranking
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)
RANK()       OVER (PARTITION BY department_id ORDER BY salary DESC)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- Adjacent rows
LAG(column, offset, default)  OVER (ORDER BY column)
LEAD(column, offset, default) OVER (ORDER BY column)

-- Aggregation
SUM(column) OVER (PARTITION BY col)
AVG(column) OVER (PARTITION BY col)
COUNT(*)    OVER (PARTITION BY col)

-- Bucketing
NTILE(n) OVER (ORDER BY column)

💡 Tip:

PARTITION BY can be omitted, meaning the window spans the entire result set

ORDER BY is required for ranking functions, optional for aggregate functions

Window functions cannot be used in WHERE or HAVING; wrap them in a subquery or CTE

📌 Examples

Example: Employee Salary Ranking Within Department

SQL

SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.salary,
    ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS dense_rank
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;

▶ Try it Yourself

Result:

TEXT

name   | department | salary   | row_num | rank | dense_rank
-------+------------+----------+---------+------+-----------
Zhang Wei | Tech    | 12000.00 |       1 |    1 |         1
Li Na     | Tech    |  9500.00 |       2 |    2 |         2
Wang Qiang| Sales   |  8000.00 |       1 |    1 |         1
Zhao Min  | Sales   | 11000.00 |       2 |    2 |         2
Liu Yang  | Marketing| 7500.00 |       1 |    1 |         1
Chen Jing | NULL    |  9000.00 |       1 |    1 |         1

Explanation: The difference between the three ranking functions is most apparent when there are tied values. ROW_NUMBER assigns strict sequential numbers, RANK allows gaps after ties, and DENSE_RANK does not.

Example: Calculate Difference from Department Average Salary

SQL

SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.salary,
    ROUND(AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS dept_avg,
    ROUND(e.salary - AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS diff_from_avg
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;

▶ Try it Yourself

Result:

TEXT

name   | department | salary   | dept_avg | diff_from_avg
-------+------------+----------+----------+--------------
Zhang Wei | Tech    | 12000.00 | 10750.00 |      1250.00
Li Na     | Tech    |  9500.00 | 10750.00 |     -1250.00
Zhao Min  | Sales   | 11000.00 |  9500.00 |      1500.00
Wang Qiang| Sales   |  8000.00 |  9500.00 |     -1500.00
Liu Yang  | Marketing|  7500.00|  7500.00 |         0.00
Chen Jing | NULL    |  9000.00 |  9000.00 |         0.00

Explanation: AVG() OVER() computes the department average on every row without collapsing rows, so each employee can directly see the gap between their salary and the average.

Example: LAG for Order Period-over-Period Growth

SQL

SELECT 
    o.order_id,
    o.order_date,
    o.total_amount,
    LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) AS prev_order_amount,
    ROUND(
        (o.total_amount - LAG(o.total_amount, 1) OVER (ORDER BY o.order_date)) 
        / LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) * 100, 
    2) AS growth_pct
FROM orders o
WHERE o.status != 'cancelled'
ORDER BY o.order_date;

▶ Try it Yourself

Result:

TEXT

order_id | order_date | total_amount | prev_order_amount | growth_pct
---------+------------+--------------+-------------------+-----------
    1001 | 2026-01-10 |      2500.00 |              NULL |       NULL
    1002 | 2026-02-15 |      1800.00 |           2500.00 |     -28.00
    1003 | 2026-02-20 |      3200.00 |           1800.00 |      77.78
    1004 | 2026-03-05 |       950.00 |           3200.00 |     -70.31
    1005 | 2026-03-10 |      4100.00 |            950.00 |     331.58
    ...

Explanation: LAG() accesses the value of the "previous row," making it ideal for period-over-period, year-over-year, and other trend analyses. The first row has no previous row, so it returns NULL.

🎬 Scenario Practice

Scenario 1: Finding the Top Salesperson in Each Department

Find the employee with the highest sales in each department.

SQL

WITH emp_sales AS (
    SELECT 
        CONCAT(e.first_name, e.last_name) AS name,
        d.department_name AS department,
        COALESCE(SUM(o.total_amount), 0) AS total_sales,
        RANK() OVER (PARTITION BY d.department_id ORDER BY COALESCE(SUM(o.total_amount), 0) DESC) AS dept_rank
    FROM employees e
    LEFT JOIN departments d ON e.department_id = d.department_id
    LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
    GROUP BY e.employee_id, e.first_name, e.last_name, d.department_id, d.department_name
)
SELECT name, department, total_sales, dept_rank
FROM emp_sales
WHERE dept_rank = 1
ORDER BY total_sales DESC;

Key Point: First use a CTE + RANK() to rank, then use WHERE to filter for the top rank. Window functions cannot be used directly in WHERE; they need to be wrapped in a subquery or CTE.

Scenario 2: Tracking Employee Hire Order and Salary Changes

SQL

SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.hire_date,
    e.salary,
    LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS prev_salary,
    ROUND(e.salary - LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date), 2) AS salary_change,
    ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS hire_order
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, e.hire_date;

Key Point: LAG() + PARTITION BY can track sequential changes within a group, making it suitable for trend analysis.

❓ FAQ

Q: Can window functions be used in the WHERE clause? A: No. WHERE is executed before window functions, so the window function hasn't been calculated yet. You need to first compute the window function using a subquery or CTE, then filter with WHERE in the outer query.

Q: When should I use ROW_NUMBER vs RANK? A: Use ROW_NUMBER when you need "unique numbering, no ties" (e.g., pagination, deduplication); use RANK when you need "rank by value, allowing ties" (e.g., grade ranking, sales ranking).

Q: What's the difference between PARTITION BY and GROUP BY? A: GROUP BY merges multiple rows into one (reducing row count); PARTITION BY only defines the window scope without reducing rows (each row is preserved). Both can be used together: first aggregate with GROUP BY, then use window functions to append rankings.

Q: How is the performance of window functions? A: Window functions typically perform better than self-join subqueries. For large datasets, create indexes on columns used in PARTITION BY and ORDER BY.

📖 Summary

Function	Use Case	Example
ROW_NUMBER	Unique sequence number	Pagination, deduplication
RANK	Ranking with ties (gaps)	Grade ranking
DENSE_RANK	Ranking with ties (no gaps)	Tier classification
NTILE	Even bucketing	Percentile analysis
LAG	Access previous row	Period-over-period calculation
LEAD	Access next row	Trend prediction
SUM/AVG/COUNT OVER	Running/group aggregation	Comparative analysis

OVER() defines the window scope, PARTITION BY partitions, ORDER BY sorts
Window functions don't reduce row count — they append aggregate info while preserving details
Window functions cannot be placed in WHERE; wrap them in a subquery or CTE

📝 Exercises

Use ROW_NUMBER() to number employees by hire date and find the most recently hired employee.
Use RANK() to rank employees by salary within each department and find the top two highest-paid employees in each department.
Use LAG() to calculate the salary change for employees within each department, ordered by hire date.
Use SUM() OVER() to calculate the running total of sales for each order (ordered by order date).

Next Lesson

👉 20-cte-temp-table - CTEs and Temporary Tables: Learn how to use WITH clauses (CTEs), including recursive CTEs, CTEs vs subqueries, and the creation and use cases of temporary tables.

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

About Us

Window Functions

Window Functions

🌍 Real-World Analogy

🎯 Core Concepts

What Is a Window Function

The OVER Clause

ROW_NUMBER / RANK / DENSE_RANK — Ranking Functions

NTILE — Bucketing

LAG / LEAD — Accessing Adjacent Rows

SUM / AVG / COUNT OVER — Aggregate Window Functions

PARTITION BY and ORDER BY in Window

Window Functions vs GROUP BY

📝 Basic Syntax

📌 Examples

Example: Employee Salary Ranking Within Department

Example: Calculate Difference from Department Average Salary

Example: LAG for Order Period-over-Period Growth

🎬 Scenario Practice

Scenario 1: Finding the Top Salesperson in Each Department

Scenario 2: Tracking Employee Hire Order and Salary Changes

❓ FAQ

📖 Summary

📝 Exercises

Next Lesson

🙏 帮我们做得更好