Window Functions
Window Functions
🌍 Real-World Analogy
Imagine a class report card:
- GROUP BY — Group the entire class by gender, calculate the average score for boys and girls, resulting in only two rows
- Window function — On each student's report card, additionally write "your rank among boys" and "how much higher than the boys' average," while retaining each person's information
Window functions are like a "magnifying glass" — they don't reduce the number of rows, yet let you see aggregate statistics on every row.
🎯 Core Concepts
What Is a Window Function
Window functions perform aggregation or ranking calculations over a "window" of data without reducing the row count. Each row can see its own details as well as the summary information within its group.
The OVER Clause
OVER() is the core of window functions — it defines the scope of the "window":
function() OVER (
[PARTITION BY column] -- Which field to partition (group) by
[ORDER BY column] -- How to sort within the partition
[ROWS/RANGE frame] -- Optional: further restrict the row range
)
ROW_NUMBER / RANK / DENSE_RANK — Ranking Functions
-- ROW_NUMBER: strictly increasing sequence, no ties (1, 2, 3, 4)
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)
-- RANK: allows ties, with gaps (1, 2, 2, 4)
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
-- DENSE_RANK: allows ties, no gaps (1, 2, 2, 3)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
| Function | Ties Allowed | Gaps | Example Result |
|---|---|---|---|
| ROW_NUMBER | No | — | 1, 2, 3, 4 |
| RANK | Yes | Yes | 1, 2, 2, 4 |
| DENSE_RANK | Yes | No | 1, 2, 2, 3 |
NTILE — Bucketing
Evenly divide data into N groups:
-- Divide employees into 4 tiers by salary (quartiles)
NTILE(4) OVER (ORDER BY salary DESC)
LAG / LEAD — Accessing Adjacent Rows
-- LAG: access the previous row
LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)
-- LEAD: access the next row
LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)
| Function | Direction | Use Case |
|---|---|---|
| LAG(col, n) | Look back n rows | Calculate period-over-period change, compare with previous record |
| LEAD(col, n) | Look forward n rows | Predict the next record |
SUM / AVG / COUNT OVER — Aggregate Window Functions
-- Each row shows the department's total salary
SUM(salary) OVER (PARTITION BY department_id)
-- Each row shows the department's average salary
AVG(salary) OVER (PARTITION BY department_id)
-- Each row shows the department's headcount
COUNT(*) OVER (PARTITION BY department_id)
PARTITION BY and ORDER BY in Window
-- PARTITION BY: define the window scope (similar to GROUP BY, but doesn't collapse rows)
-- ORDER BY: sort within the window, affects ranking and running calculations
-- Example: group by department, rank by salary descending within each department
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
-- Example: running sum (ordered by hire date, accumulating salary row by row)
SUM(salary) OVER (ORDER BY hire_date)
Window Functions vs GROUP BY
| Feature | GROUP BY | Window Function |
|---|---|---|
| Row count | Reduced (merged into one row) | Unchanged (all rows preserved) |
| Detail data | Lost | Preserved |
| Use case | Summary statistics | Ranking, comparison, running calculations |
| View detail and summary simultaneously | ❌ No | ✅ Yes |
GROUP BY when you only need summary results.
📝 Basic Syntax
-- Window function general syntax
function() OVER (
[PARTITION BY col1, col2]
[ORDER BY col3 [ASC|DESC]]
[ROWS BETWEEN ... AND ...]
)
-- Ranking
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
-- Adjacent rows
LAG(column, offset, default) OVER (ORDER BY column)
LEAD(column, offset, default) OVER (ORDER BY column)
-- Aggregation
SUM(column) OVER (PARTITION BY col)
AVG(column) OVER (PARTITION BY col)
COUNT(*) OVER (PARTITION BY col)
-- Bucketing
NTILE(n) OVER (ORDER BY column)
PARTITION BYcan be omitted, meaning the window spans the entire result setORDER BYis required for ranking functions, optional for aggregate functions- Window functions cannot be used in
WHEREorHAVING; wrap them in a subquery or CTE
📌 Examples
Example: Employee Salary Ranking Within Department
SELECT
CONCAT(e.first_name, e.last_name) AS name,
d.department_name AS department,
e.salary,
ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS row_num,
RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS dense_rank
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;
Result:
name | department | salary | row_num | rank | dense_rank
-------+------------+----------+---------+------+-----------
Zhang Wei | Tech | 12000.00 | 1 | 1 | 1
Li Na | Tech | 9500.00 | 2 | 2 | 2
Wang Qiang| Sales | 8000.00 | 1 | 1 | 1
Zhao Min | Sales | 11000.00 | 2 | 2 | 2
Liu Yang | Marketing| 7500.00 | 1 | 1 | 1
Chen Jing | NULL | 9000.00 | 1 | 1 | 1
Explanation: The difference between the three ranking functions is most apparent when there are tied values. ROW_NUMBER assigns strict sequential numbers, RANK allows gaps after ties, and DENSE_RANK does not.
Example: Calculate Difference from Department Average Salary
SELECT
CONCAT(e.first_name, e.last_name) AS name,
d.department_name AS department,
e.salary,
ROUND(AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS dept_avg,
ROUND(e.salary - AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS diff_from_avg
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;
Result:
name | department | salary | dept_avg | diff_from_avg
-------+------------+----------+----------+--------------
Zhang Wei | Tech | 12000.00 | 10750.00 | 1250.00
Li Na | Tech | 9500.00 | 10750.00 | -1250.00
Zhao Min | Sales | 11000.00 | 9500.00 | 1500.00
Wang Qiang| Sales | 8000.00 | 9500.00 | -1500.00
Liu Yang | Marketing| 7500.00| 7500.00 | 0.00
Chen Jing | NULL | 9000.00 | 9000.00 | 0.00
Explanation: AVG() OVER() computes the department average on every row without collapsing rows, so each employee can directly see the gap between their salary and the average.
Example: LAG for Order Period-over-Period Growth
SELECT
o.order_id,
o.order_date,
o.total_amount,
LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) AS prev_order_amount,
ROUND(
(o.total_amount - LAG(o.total_amount, 1) OVER (ORDER BY o.order_date))
/ LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) * 100,
2) AS growth_pct
FROM orders o
WHERE o.status != 'cancelled'
ORDER BY o.order_date;
Result:
order_id | order_date | total_amount | prev_order_amount | growth_pct
---------+------------+--------------+-------------------+-----------
1001 | 2026-01-10 | 2500.00 | NULL | NULL
1002 | 2026-02-15 | 1800.00 | 2500.00 | -28.00
1003 | 2026-02-20 | 3200.00 | 1800.00 | 77.78
1004 | 2026-03-05 | 950.00 | 3200.00 | -70.31
1005 | 2026-03-10 | 4100.00 | 950.00 | 331.58
...
Explanation: LAG() accesses the value of the "previous row," making it ideal for period-over-period, year-over-year, and other trend analyses. The first row has no previous row, so it returns NULL.
🎬 Scenario Practice
Scenario 1: Finding the Top Salesperson in Each Department
Find the employee with the highest sales in each department.
WITH emp_sales AS (
SELECT
CONCAT(e.first_name, e.last_name) AS name,
d.department_name AS department,
COALESCE(SUM(o.total_amount), 0) AS total_sales,
RANK() OVER (PARTITION BY d.department_id ORDER BY COALESCE(SUM(o.total_amount), 0) DESC) AS dept_rank
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
GROUP BY e.employee_id, e.first_name, e.last_name, d.department_id, d.department_name
)
SELECT name, department, total_sales, dept_rank
FROM emp_sales
WHERE dept_rank = 1
ORDER BY total_sales DESC;
Key Point: First use a CTE + RANK() to rank, then use WHERE to filter for the top rank. Window functions cannot be used directly in WHERE; they need to be wrapped in a subquery or CTE.
Scenario 2: Tracking Employee Hire Order and Salary Changes
SELECT
CONCAT(e.first_name, e.last_name) AS name,
d.department_name AS department,
e.hire_date,
e.salary,
LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS prev_salary,
ROUND(e.salary - LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date), 2) AS salary_change,
ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS hire_order
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, e.hire_date;
Key Point: LAG() + PARTITION BY can track sequential changes within a group, making it suitable for trend analysis.
❓ FAQ
Q: Can window functions be used in the WHERE clause? A: No.
WHEREis executed before window functions, so the window function hasn't been calculated yet. You need to first compute the window function using a subquery or CTE, then filter withWHEREin the outer query.
Q: When should I use ROW_NUMBER vs RANK? A: Use
ROW_NUMBERwhen you need "unique numbering, no ties" (e.g., pagination, deduplication); useRANKwhen you need "rank by value, allowing ties" (e.g., grade ranking, sales ranking).
Q: What's the difference between PARTITION BY and GROUP BY? A:
GROUP BYmerges multiple rows into one (reducing row count);PARTITION BYonly defines the window scope without reducing rows (each row is preserved). Both can be used together: first aggregate withGROUP BY, then use window functions to append rankings.
Q: How is the performance of window functions? A: Window functions typically perform better than self-join subqueries. For large datasets, create indexes on columns used in
PARTITION BYandORDER BY.
📖 Summary
| Function | Use Case | Example |
|---|---|---|
| ROW_NUMBER | Unique sequence number | Pagination, deduplication |
| RANK | Ranking with ties (gaps) | Grade ranking |
| DENSE_RANK | Ranking with ties (no gaps) | Tier classification |
| NTILE | Even bucketing | Percentile analysis |
| LAG | Access previous row | Period-over-period calculation |
| LEAD | Access next row | Trend prediction |
| SUM/AVG/COUNT OVER | Running/group aggregation | Comparative analysis |
OVER()defines the window scope,PARTITION BYpartitions,ORDER BYsorts- Window functions don't reduce row count — they append aggregate info while preserving details
- Window functions cannot be placed in
WHERE; wrap them in a subquery or CTE
📝 Exercises
- Use
ROW_NUMBER()to number employees by hire date and find the most recently hired employee. - Use
RANK()to rank employees by salary within each department and find the top two highest-paid employees in each department. - Use
LAG()to calculate the salary change for employees within each department, ordered by hire date. - Use
SUM() OVER()to calculate the running total of sales for each order (ordered by order date).
Next Lesson
👉 20-cte-temp-table - CTEs and Temporary Tables: Learn how to use WITH clauses (CTEs), including recursive CTEs, CTEs vs subqueries, and the creation and use cases of temporary tables.



