Window Functions

Window Functions

🌍 Real-World Analogy

Imagine a class report card:

Window functions are like a "magnifying glass" — they don't reduce the number of rows, yet let you see aggregate statistics on every row.


🎯 Core Concepts

What Is a Window Function

Window functions perform aggregation or ranking calculations over a "window" of data without reducing the row count. Each row can see its own details as well as the summary information within its group.

The OVER Clause

OVER() is the core of window functions — it defines the scope of the "window":

SQL
function() OVER (
    [PARTITION BY column]    -- Which field to partition (group) by
    [ORDER BY column]        -- How to sort within the partition
    [ROWS/RANGE frame]       -- Optional: further restrict the row range
)

ROW_NUMBER / RANK / DENSE_RANK — Ranking Functions

SQL
-- ROW_NUMBER: strictly increasing sequence, no ties (1, 2, 3, 4)
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- RANK: allows ties, with gaps (1, 2, 2, 4)
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- DENSE_RANK: allows ties, no gaps (1, 2, 2, 3)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)
Function Ties Allowed Gaps Example Result
ROW_NUMBER No 1, 2, 3, 4
RANK Yes Yes 1, 2, 2, 4
DENSE_RANK Yes No 1, 2, 2, 3

NTILE — Bucketing

Evenly divide data into N groups:

SQL
-- Divide employees into 4 tiers by salary (quartiles)
NTILE(4) OVER (ORDER BY salary DESC)

LAG / LEAD — Accessing Adjacent Rows

SQL
-- LAG: access the previous row
LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)

-- LEAD: access the next row
LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY hire_date)
Function Direction Use Case
LAG(col, n) Look back n rows Calculate period-over-period change, compare with previous record
LEAD(col, n) Look forward n rows Predict the next record

SUM / AVG / COUNT OVER — Aggregate Window Functions

SQL
-- Each row shows the department's total salary
SUM(salary) OVER (PARTITION BY department_id)

-- Each row shows the department's average salary
AVG(salary) OVER (PARTITION BY department_id)

-- Each row shows the department's headcount
COUNT(*) OVER (PARTITION BY department_id)

PARTITION BY and ORDER BY in Window

SQL
-- PARTITION BY: define the window scope (similar to GROUP BY, but doesn't collapse rows)
-- ORDER BY: sort within the window, affects ranking and running calculations

-- Example: group by department, rank by salary descending within each department
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- Example: running sum (ordered by hire date, accumulating salary row by row)
SUM(salary) OVER (ORDER BY hire_date)

Window Functions vs GROUP BY

Feature GROUP BY Window Function
Row count Reduced (merged into one row) Unchanged (all rows preserved)
Detail data Lost Preserved
Use case Summary statistics Ranking, comparison, running calculations
View detail and summary simultaneously ❌ No ✅ Yes
💡 Rule of thumb: Use window functions when you need to "keep details after aggregation"; use GROUP BY when you only need summary results.


📝 Basic Syntax

SQL
-- Window function general syntax
function() OVER (
    [PARTITION BY col1, col2]
    [ORDER BY col3 [ASC|DESC]]
    [ROWS BETWEEN ... AND ...]
)

-- Ranking
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC)
RANK()       OVER (PARTITION BY department_id ORDER BY salary DESC)
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC)

-- Adjacent rows
LAG(column, offset, default)  OVER (ORDER BY column)
LEAD(column, offset, default) OVER (ORDER BY column)

-- Aggregation
SUM(column) OVER (PARTITION BY col)
AVG(column) OVER (PARTITION BY col)
COUNT(*)    OVER (PARTITION BY col)

-- Bucketing
NTILE(n) OVER (ORDER BY column)
💡 Tip:

  • PARTITION BY can be omitted, meaning the window spans the entire result set
  • ORDER BY is required for ranking functions, optional for aggregate functions
  • Window functions cannot be used in WHERE or HAVING; wrap them in a subquery or CTE

📌 Examples

Example: Employee Salary Ranking Within Department

SQL
SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.salary,
    ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY e.department_id ORDER BY e.salary DESC) AS dense_rank
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;
▶ Try it Yourself

Result:

TEXT
name   | department | salary   | row_num | rank | dense_rank
-------+------------+----------+---------+------+-----------
Zhang Wei | Tech    | 12000.00 |       1 |    1 |         1
Li Na     | Tech    |  9500.00 |       2 |    2 |         2
Wang Qiang| Sales   |  8000.00 |       1 |    1 |         1
Zhao Min  | Sales   | 11000.00 |       2 |    2 |         2
Liu Yang  | Marketing| 7500.00 |       1 |    1 |         1
Chen Jing | NULL    |  9000.00 |       1 |    1 |         1

Explanation: The difference between the three ranking functions is most apparent when there are tied values. ROW_NUMBER assigns strict sequential numbers, RANK allows gaps after ties, and DENSE_RANK does not.

Example: Calculate Difference from Department Average Salary

SQL
SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.salary,
    ROUND(AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS dept_avg,
    ROUND(e.salary - AVG(e.salary) OVER (PARTITION BY e.department_id), 2) AS diff_from_avg
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, salary DESC;
▶ Try it Yourself

Result:

TEXT
name   | department | salary   | dept_avg | diff_from_avg
-------+------------+----------+----------+--------------
Zhang Wei | Tech    | 12000.00 | 10750.00 |      1250.00
Li Na     | Tech    |  9500.00 | 10750.00 |     -1250.00
Zhao Min  | Sales   | 11000.00 |  9500.00 |      1500.00
Wang Qiang| Sales   |  8000.00 |  9500.00 |     -1500.00
Liu Yang  | Marketing|  7500.00|  7500.00 |         0.00
Chen Jing | NULL    |  9000.00 |  9000.00 |         0.00

Explanation: AVG() OVER() computes the department average on every row without collapsing rows, so each employee can directly see the gap between their salary and the average.

Example: LAG for Order Period-over-Period Growth

SQL
SELECT 
    o.order_id,
    o.order_date,
    o.total_amount,
    LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) AS prev_order_amount,
    ROUND(
        (o.total_amount - LAG(o.total_amount, 1) OVER (ORDER BY o.order_date)) 
        / LAG(o.total_amount, 1) OVER (ORDER BY o.order_date) * 100, 
    2) AS growth_pct
FROM orders o
WHERE o.status != 'cancelled'
ORDER BY o.order_date;
▶ Try it Yourself

Result:

TEXT
order_id | order_date | total_amount | prev_order_amount | growth_pct
---------+------------+--------------+-------------------+-----------
    1001 | 2026-01-10 |      2500.00 |              NULL |       NULL
    1002 | 2026-02-15 |      1800.00 |           2500.00 |     -28.00
    1003 | 2026-02-20 |      3200.00 |           1800.00 |      77.78
    1004 | 2026-03-05 |       950.00 |           3200.00 |     -70.31
    1005 | 2026-03-10 |      4100.00 |            950.00 |     331.58
    ...

Explanation: LAG() accesses the value of the "previous row," making it ideal for period-over-period, year-over-year, and other trend analyses. The first row has no previous row, so it returns NULL.


🎬 Scenario Practice

Scenario 1: Finding the Top Salesperson in Each Department

Find the employee with the highest sales in each department.

SQL
WITH emp_sales AS (
    SELECT 
        CONCAT(e.first_name, e.last_name) AS name,
        d.department_name AS department,
        COALESCE(SUM(o.total_amount), 0) AS total_sales,
        RANK() OVER (PARTITION BY d.department_id ORDER BY COALESCE(SUM(o.total_amount), 0) DESC) AS dept_rank
    FROM employees e
    LEFT JOIN departments d ON e.department_id = d.department_id
    LEFT JOIN orders o ON e.employee_id = o.employee_id AND o.status != 'cancelled'
    GROUP BY e.employee_id, e.first_name, e.last_name, d.department_id, d.department_name
)
SELECT name, department, total_sales, dept_rank
FROM emp_sales
WHERE dept_rank = 1
ORDER BY total_sales DESC;

Key Point: First use a CTE + RANK() to rank, then use WHERE to filter for the top rank. Window functions cannot be used directly in WHERE; they need to be wrapped in a subquery or CTE.

Scenario 2: Tracking Employee Hire Order and Salary Changes

SQL
SELECT 
    CONCAT(e.first_name, e.last_name) AS name,
    d.department_name AS department,
    e.hire_date,
    e.salary,
    LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS prev_salary,
    ROUND(e.salary - LAG(e.salary, 1) OVER (PARTITION BY e.department_id ORDER BY e.hire_date), 2) AS salary_change,
    ROW_NUMBER() OVER (PARTITION BY e.department_id ORDER BY e.hire_date) AS hire_order
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
ORDER BY d.department_name, e.hire_date;

Key Point: LAG() + PARTITION BY can track sequential changes within a group, making it suitable for trend analysis.


❓ FAQ

Q: Can window functions be used in the WHERE clause? A: No. WHERE is executed before window functions, so the window function hasn't been calculated yet. You need to first compute the window function using a subquery or CTE, then filter with WHERE in the outer query.

Q: When should I use ROW_NUMBER vs RANK? A: Use ROW_NUMBER when you need "unique numbering, no ties" (e.g., pagination, deduplication); use RANK when you need "rank by value, allowing ties" (e.g., grade ranking, sales ranking).

Q: What's the difference between PARTITION BY and GROUP BY? A: GROUP BY merges multiple rows into one (reducing row count); PARTITION BY only defines the window scope without reducing rows (each row is preserved). Both can be used together: first aggregate with GROUP BY, then use window functions to append rankings.

Q: How is the performance of window functions? A: Window functions typically perform better than self-join subqueries. For large datasets, create indexes on columns used in PARTITION BY and ORDER BY.


📖 Summary

Function Use Case Example
ROW_NUMBER Unique sequence number Pagination, deduplication
RANK Ranking with ties (gaps) Grade ranking
DENSE_RANK Ranking with ties (no gaps) Tier classification
NTILE Even bucketing Percentile analysis
LAG Access previous row Period-over-period calculation
LEAD Access next row Trend prediction
SUM/AVG/COUNT OVER Running/group aggregation Comparative analysis

📝 Exercises

  1. Use ROW_NUMBER() to number employees by hire date and find the most recently hired employee.
  2. Use RANK() to rank employees by salary within each department and find the top two highest-paid employees in each department.
  3. Use LAG() to calculate the salary change for employees within each department, ordered by hire date.
  4. Use SUM() OVER() to calculate the running total of sales for each order (ordered by order date).

Next Lesson

👉 20-cte-temp-table - CTEs and Temporary Tables: Learn how to use WITH clauses (CTEs), including recursive CTEs, CTEs vs subqueries, and the creation and use cases of temporary tables.

Web-Tutorial.com

Web-Tutorial Tech Team

A team of developers maintaining programming tutorials. Each tutorial is written and reviewed by developers with expertise in that field. We work to keep our content accurate and reliable — if you spot an issue, please let us know.

100%

🙏 帮我们做得更好

我们是刚上线的编程教程站,几个人的小团队,精力有限。页面虽经检查,难免还有疏漏——链接失效、排版错乱、内容有误、语言生硬……

如果您发现了,麻烦告诉我们,我们会在收到反馈后第一时间进行修复,再次感谢您的光临 🙏