MySQL Window Functions — MySQL Interview Questions

Window functions operate on row sets without collapsing results. OVER clause defines the window. Available since MySQL 8.0. Enables row-by-row calculations maintaining original rowcount.

-- Basic syntax
SELECT column, SUM(value) OVER (PARTITION BY category) as category_total
FROM products;

-- OVER clause components
SELECT id, name, salary,
       SUM(salary) OVER (PARTITION BY department ORDER BY salary) as running_total
FROM employees;

-- Without PARTITION BY (applies to all rows)
SELECT id, salary,
       AVG(salary) OVER () as company_avg
FROM employees;

-- Window function types
ROW_NUMBER() OVER (ORDER BY salary) as row_num
RANK() OVER (ORDER BY salary) as rank
DENSE_RANK() OVER (ORDER BY salary) as dense_rank
LAG(salary) OVER (ORDER BY hire_date) as prev_salary
LEAD(salary) OVER (ORDER BY hire_date) as next_salary

Why it matters: Advanced analytics without losing detail.

Real applications: Running totals, rankings, comparisons with previous/next rows.

Common mistakes: Confusion with GROUP BY, MySQL version < 8.0 not supported.

ROW_NUMBER assigns unique sequentia numbers. RANK assigns rank with gaps for ties. DENSE_RANK assigns rank without gaps. Used for leaderboards and competitions.

-- Example data
salary | employee
3000   | Alice
3000   | Bob
2500   | Charlie
2000   | Dave

-- ROW_NUMBER (unique sequential)
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num
-- Result: 1, 2, 3, 4

-- RANK (gaps for ties)
RANK() OVER (ORDER BY salary DESC) as rank
-- Result: 1, 1, 3, 4

-- DENSE_RANK (no gaps for ties)
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
-- Result: 1, 1, 2, 3

-- Practical: Top salary earner per department
SELECT id, name, salary, department,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees WHERE DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) = 1;

Why it matters: Ranking requirements (ties, gaps, uniqueness).

Real applications: Leaderboards, competitions, performance rankings.

Common mistakes: Mixing up the three functions, not using PARTITION BY correctly.

LAG accesses previous row value. LEAD accesses next row value. Essential for row-to-row comparisons and changes.

-- LAG: Previous row value
SELECT id, hire_date, salary,
       LAG(salary) OVER (ORDER BY hire_date) as prev_salary,
       salary - LAG(salary) OVER (ORDER BY hire_date) as salary_change
FROM employees;

-- LEAD: Next row value
SELECT id, order_date, amount,
       LEAD(amount) OVER (ORDER BY order_date) as next_amount,
       LEAD(order_date) OVER (ORDER BY order_date) as next_date
FROM orders;

-- With default value
SELECT id, salary,
       LAG(salary, 1, 0) OVER (ORDER BY hire_date) as prev_salary
FROM employees;  -- Default to 0 if no previous row

-- Day-over-day sales comparison
SELECT DATE(order_date) as date, SUM(amount) as daily_sales,
       LAG(SUM(amount)) OVER (ORDER BY DATE(order_date)) as prev_day_sales,
       SUM(amount) - LAG(SUM(amount)) OVER (ORDER BY DATE(order_date)) as change
FROM orders GROUP BY DATE(order_date);

Why it matters: Compare values across consecutive rows.

Real applications: Change detection, trend analysis, day-over-day metrics.

Common mistakes: Not ordering correctly, misunderstanding offset parameter.

FIRST_VALUE returns first value in window. LAST_VALUE returns last value. NTH_VALUE returns nth value. Used for accessing specific positions in windows.

-- FIRST_VALUE: First value in window
SELECT id, salary, hire_date,
       FIRST_VALUE(salary) OVER (ORDER BY hire_date) as first_salary
FROM employees;

-- LAST_VALUE: Last in window (requires frame specification)
SELECT id, salary,
       LAST_VALUE(salary) OVER (ORDER BY salary ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM employees;

-- NTH_VALUE: Nth value in window
SELECT id, salary,
       NTH_VALUE(salary, 2) OVER (ORDER BY salary DESC) as second_highest
FROM employees;

-- Practical: Compare to highest and lowest in department
SELECT id, name, salary, department,
       FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) as highest_in_dept,
       LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lowest_in_dept
FROM employees;

Why it matters: Access boundary values in windows.

Real applications: Benchmarking, comparisons within groups.

Common mistakes: Not specifying frame for LAST_VALUE, wrong frame boundaries.

PARTITION BY divides data into independent windows. ORDER BY orders rows within window. Most windows require ORDER BY; PARTITION BY is optional.

-- PARTITION BY: Separate windows by category
SELECT id, name, salary, department,
       AVG(salary) OVER (PARTITION BY department) as dept_avg
FROM employees;

-- ORDER BY: Order rows within window
SELECT id, amount, order_date,
       SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- Both: Partition and order within partition
SELECT id, salary, department, hire_date,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

-- Multiple partitions
SELECT id, sales, region, product,
       SUM(sales) OVER (PARTITION BY region, product ORDER BY DATE) as product_region_sales
FROM sales;

-- NULL handling in PARTITION BY
-- NULLs are grouped together
SELECT category, COUNT(*) OVER (PARTITION BY category)
FROM products;  -- NULL categories are separate group

Why it matters: Control scope and order of window calculations.

Real applications: Department-level analytics, running calculations.

Common mistakes: Missing ORDER BY when needed, wrong partition granularity.

Frame specifications define which rows are included in calculation. Options: UNGOUNDED/CURRENT ROW, PRECEDING/FOLLOWING. Default is UNBOUNDED PRECEDING to CURRENT ROW for ORDER BY.

-- Default frame (UNBOUNDED PRECEDING to CURRENT ROW)
SELECT amount,
       SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;  -- Running sum

-- ROWS: Physical row positions
SELECT amount,
       AVG(amount) OVER (ORDER BY order_date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) as moving_avg_5
FROM orders;

-- RANGE: Value-based ranges
SELECT amount,
       SUM(amount) OVER (ORDER BY price RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING)
FROM products;

-- Frame types
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  -- All rows from start to current
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING  -- All rows in window
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING  -- Current ± 2 rows
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  -- Default

-- Practical: Moving average
SELECT DATE(order_date) as date, SUM(amount) as daily_sales,
       AVG(SUM(amount)) OVER (ORDER BY DATE(order_date) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7day
FROM orders GROUP BY DATE(order_date);

Why it matters: Control which rows are included in calculations.

Real applications: Moving averages, running totals, windowed aggregates.

Common mistakes: Wrong frame specification affects results, RANGE vs ROWS confusion.

Best practices: Window functions available MySQL 8.0+, use indexes on PARTITION BY/ORDER BY columns, complex frames can impact performance, window results appear in all rows (doesn't collapse).

-- Performance: Index OVER columns
CREATE INDEX idx_dept_salary ON employees(department, salary);
SELECT id, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

-- Don't repeat window functions
SELECT id,
       SUM(salary) OVER (PARTITION BY department) as dept_total,
       SUM(salary) OVER (PARTITION BY department) as dept_total2  -- Repeated!
FROM employees;

-- Store in temporary table if using multiple times
CREATE TEMPORARY TABLE temp_rankings AS
SELECT id, salary, department,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;

-- Window function + WHERE
SELECT * FROM (
    SELECT id, salary, RANK() OVER (ORDER BY salary DESC) as rank
    FROM employees
) AS ranked WHERE rank <= 10;

-- Memory usage
-- Window functions load entire result set into memory
-- Large datasets can cause memory issues

Why it matters: Production performance and resource usage.

Real applications: Analytics queries, reporting, ranking.

Common mistakes: Not indexing, repeating functions, memory issues on large datasets.

Common patterns: Running totals, year-to-date calculations, rankings per category, comparisons with averages, change detection.

-- Running total pattern
SELECT month, sales,
       SUM(sales) OVER (ORDER BY month) as ytd_sales
FROM monthly_sales;

-- Year-over-year comparison
SELECT month, sales, year,
       LAG(sales) OVER (PARTITION BY month ORDER BY year) as prior_year_sales
FROM annual_sales;

-- Percentile ranking
SELECT id, salary,
       PERCENT_RANK() OVER (ORDER BY salary) * 100 as percentile
FROM employees;

-- Top N per group
SELECT * FROM (
    SELECT id, name, salary, department,
           ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
    FROM employees
) t WHERE rank <= 3;  -- Top 3 per department

-- Cumulative percentage
SELECT category, sales,
       SUM(sales) OVER (ORDER BY sales DESC) as cumulative_sales,
       ROUND(SUM(sales) OVER (ORDER BY sales DESC) / 
             SUM(sales) OVER () * 100, 2) as percent_of_total
FROM sales;

Why it matters: Reusable analytical patterns.

Real applications: Revenue reports, rankings, YTD calculations.

Common mistakes: Complex queries can be hard to debug, ordering impacts results.

MySQL Window Functions Interview Questions

1 What are window functions and OVER clause?

2 What are ROW_NUMBER, RANK, and DENSE_RANK?

3 What are LAG and LEAD functions?

4 What are FIRST_VALUE, LAST_VALUE, NTH_VALUE?

5 What do PARTITION BY and ORDER BY do in window functions?

6 What are frame specifications in window functions?

7 How do you use window functions in production?

8 What are common window function patterns?