Window functions operate on row sets without collapsing results. OVER clause defines the window. Available since MySQL 8.0. Enables row-by-row calculations maintaining original rowcount.
-- Basic syntax
SELECT column, SUM(value) OVER (PARTITION BY category) as category_total
FROM products;
-- OVER clause components
SELECT id, name, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) as running_total
FROM employees;
-- Without PARTITION BY (applies to all rows)
SELECT id, salary,
AVG(salary) OVER () as company_avg
FROM employees;
-- Window function types
ROW_NUMBER() OVER (ORDER BY salary) as row_num
RANK() OVER (ORDER BY salary) as rank
DENSE_RANK() OVER (ORDER BY salary) as dense_rank
LAG(salary) OVER (ORDER BY hire_date) as prev_salary
LEAD(salary) OVER (ORDER BY hire_date) as next_salaryWhy it matters: Advanced analytics without losing detail.
Real applications: Running totals, rankings, comparisons with previous/next rows.
Common mistakes: Confusion with GROUP BY, MySQL version < 8.0 not supported.
ROW_NUMBER assigns unique sequentia numbers. RANK assigns rank with gaps for ties. DENSE_RANK assigns rank without gaps. Used for leaderboards and competitions.
-- Example data
salary | employee
3000 | Alice
3000 | Bob
2500 | Charlie
2000 | Dave
-- ROW_NUMBER (unique sequential)
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
-- Result: 1, 2, 3, 4
-- RANK (gaps for ties)
RANK() OVER (ORDER BY salary DESC) as rank
-- Result: 1, 1, 3, 4
-- DENSE_RANK (no gaps for ties)
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
-- Result: 1, 1, 2, 3
-- Practical: Top salary earner per department
SELECT id, name, salary, department,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees WHERE DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) = 1;Why it matters: Ranking requirements (ties, gaps, uniqueness).
Real applications: Leaderboards, competitions, performance rankings.
Common mistakes: Mixing up the three functions, not using PARTITION BY correctly.
LAG accesses previous row value. LEAD accesses next row value. Essential for row-to-row comparisons and changes.
-- LAG: Previous row value
SELECT id, hire_date, salary,
LAG(salary) OVER (ORDER BY hire_date) as prev_salary,
salary - LAG(salary) OVER (ORDER BY hire_date) as salary_change
FROM employees;
-- LEAD: Next row value
SELECT id, order_date, amount,
LEAD(amount) OVER (ORDER BY order_date) as next_amount,
LEAD(order_date) OVER (ORDER BY order_date) as next_date
FROM orders;
-- With default value
SELECT id, salary,
LAG(salary, 1, 0) OVER (ORDER BY hire_date) as prev_salary
FROM employees; -- Default to 0 if no previous row
-- Day-over-day sales comparison
SELECT DATE(order_date) as date, SUM(amount) as daily_sales,
LAG(SUM(amount)) OVER (ORDER BY DATE(order_date)) as prev_day_sales,
SUM(amount) - LAG(SUM(amount)) OVER (ORDER BY DATE(order_date)) as change
FROM orders GROUP BY DATE(order_date);Why it matters: Compare values across consecutive rows.
Real applications: Change detection, trend analysis, day-over-day metrics.
Common mistakes: Not ordering correctly, misunderstanding offset parameter.
FIRST_VALUE returns first value in window. LAST_VALUE returns last value. NTH_VALUE returns nth value. Used for accessing specific positions in windows.
-- FIRST_VALUE: First value in window
SELECT id, salary, hire_date,
FIRST_VALUE(salary) OVER (ORDER BY hire_date) as first_salary
FROM employees;
-- LAST_VALUE: Last in window (requires frame specification)
SELECT id, salary,
LAST_VALUE(salary) OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM employees;
-- NTH_VALUE: Nth value in window
SELECT id, salary,
NTH_VALUE(salary, 2) OVER (ORDER BY salary DESC) as second_highest
FROM employees;
-- Practical: Compare to highest and lowest in department
SELECT id, name, salary, department,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as highest_in_dept,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lowest_in_dept
FROM employees;Why it matters: Access boundary values in windows.
Real applications: Benchmarking, comparisons within groups.
Common mistakes: Not specifying frame for LAST_VALUE, wrong frame boundaries.
PARTITION BY divides data into independent windows. ORDER BY orders rows within window. Most windows require ORDER BY; PARTITION BY is optional.
-- PARTITION BY: Separate windows by category
SELECT id, name, salary, department,
AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;
-- ORDER BY: Order rows within window
SELECT id, amount, order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;
-- Both: Partition and order within partition
SELECT id, salary, department, hire_date,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;
-- Multiple partitions
SELECT id, sales, region, product,
SUM(sales) OVER (PARTITION BY region, product ORDER BY DATE) as product_region_sales
FROM sales;
-- NULL handling in PARTITION BY
-- NULLs are grouped together
SELECT category, COUNT(*) OVER (PARTITION BY category)
FROM products; -- NULL categories are separate groupWhy it matters: Control scope and order of window calculations.
Real applications: Department-level analytics, running calculations.
Common mistakes: Missing ORDER BY when needed, wrong partition granularity.
Frame specifications define which rows are included in calculation. Options: UNGOUNDED/CURRENT ROW, PRECEDING/FOLLOWING. Default is UNBOUNDED PRECEDING to CURRENT ROW for ORDER BY.
-- Default frame (UNBOUNDED PRECEDING to CURRENT ROW)
SELECT amount,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders; -- Running sum
-- ROWS: Physical row positions
SELECT amount,
AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as moving_avg_5
FROM orders;
-- RANGE: Value-based ranges
SELECT amount,
SUM(amount) OVER (ORDER BY price RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING)
FROM products;
-- Frame types
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- All rows from start to current
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- All rows in window
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING -- Current ± 2 rows
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Default
-- Practical: Moving average
SELECT DATE(order_date) as date, SUM(amount) as daily_sales,
AVG(SUM(amount)) OVER (ORDER BY DATE(order_date) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7day
FROM orders GROUP BY DATE(order_date);Why it matters: Control which rows are included in calculations.
Real applications: Moving averages, running totals, windowed aggregates.
Common mistakes: Wrong frame specification affects results, RANGE vs ROWS confusion.
Best practices: Window functions available MySQL 8.0+, use indexes on PARTITION BY/ORDER BY columns, complex frames can impact performance, window results appear in all rows (doesn't collapse).
-- Performance: Index OVER columns
CREATE INDEX idx_dept_salary ON employees(department, salary);
SELECT id, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;
-- Don't repeat window functions
SELECT id,
SUM(salary) OVER (PARTITION BY department) as dept_total,
SUM(salary) OVER (PARTITION BY department) as dept_total2 -- Repeated!
FROM employees;
-- Store in temporary table if using multiple times
CREATE TEMPORARY TABLE temp_rankings AS
SELECT id, salary, department,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;
-- Window function + WHERE
SELECT * FROM (
SELECT id, salary, RANK() OVER (ORDER BY salary DESC) as rank
FROM employees
) AS ranked WHERE rank <= 10;
-- Memory usage
-- Window functions load entire result set into memory
-- Large datasets can cause memory issuesWhy it matters: Production performance and resource usage.
Real applications: Analytics queries, reporting, ranking.
Common mistakes: Not indexing, repeating functions, memory issues on large datasets.
Common patterns: Running totals, year-to-date calculations, rankings per category, comparisons with averages, change detection.
-- Running total pattern
SELECT month, sales,
SUM(sales) OVER (ORDER BY month) as ytd_sales
FROM monthly_sales;
-- Year-over-year comparison
SELECT month, sales, year,
LAG(sales) OVER (PARTITION BY month ORDER BY year) as prior_year_sales
FROM annual_sales;
-- Percentile ranking
SELECT id, salary,
PERCENT_RANK() OVER (ORDER BY salary) * 100 as percentile
FROM employees;
-- Top N per group
SELECT * FROM (
SELECT id, name, salary, department,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees
) t WHERE rank <= 3; -- Top 3 per department
-- Cumulative percentage
SELECT category, sales,
SUM(sales) OVER (ORDER BY sales DESC) as cumulative_sales,
ROUND(SUM(sales) OVER (ORDER BY sales DESC) /
SUM(sales) OVER () * 100, 2) as percent_of_total
FROM sales;Why it matters: Reusable analytical patterns.
Real applications: Revenue reports, rankings, YTD calculations.
Common mistakes: Complex queries can be hard to debug, ordering impacts results.