MySQL

MySQL Aggregate Functions Interview Questions

8 Questions

COUNT counts rows. SUM adds values. AVG calculates average. MIN/MAX find minimum/maximum. All aggregate functions collapse result sets into single rows.

-- Basic aggregates
SELECT COUNT(*) FROM orders;  -- Total rows
SELECT COUNT(customer_id) FROM orders;  -- Non-NULL values
SELECT SUM(amount) FROM orders;  -- Total amount
SELECT AVG(price) FROM products;  -- Average price
SELECT MIN(created_at), MAX(created_at) FROM orders;  -- Date range

-- Multiple aggregates
SELECT COUNT(*) as total_orders, SUM(amount) as total_revenue, 
       AVG(amount) as avg_order FROM orders;

-- With GROUP BY
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders GROUP BY customer_id;

Why it matters: Summarize large datasets into meaningful metrics.

Real applications: Dashboards, reporting, business analytics.

Common mistakes: COUNT(*) vs COUNT(column), NULL handling, GROUP BY without aggregates.

COUNT(*) includes NULLs and counts all rows. COUNT(column) counts non-NULL values only. Difference appears when column has NULLs.

-- Data example
id | name    | email
1  | Alice   | alice@ex.com
2  | Bob     | NULL
3  | Charlie | charlie@ex.com

-- Comparison
SELECT COUNT(*) FROM users;  -- 3 (all rows)
SELECT COUNT(email) FROM users;  -- 2 (non-NULL emails)
SELECT COUNT(email) as valid_emails FROM users;  -- 2

-- Practical scenario
SELECT COUNT(*) as total_rows,
       COUNT(email) as verified_emails,
       COUNT(*) - COUNT(email) as unverified_emails
FROM users;

Why it matters: Understand NULL impact on aggregates.

Real applications: Data quality reports, completeness checks.

Common mistakes: Not accounting for NULLs, using COUNT(*) when column count is needed.

GROUP BY divides rows into groups based on column values. Aggregates are calculated per group. Essential for breakdowns by category, region, date, etc.

-- Group by single column
SELECT product_category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products GROUP BY product_category;

-- Group by multiple columns
SELECT YEAR(order_date) as year, MONTH(order_date) as month, 
       COUNT(*) as monthly_orders, SUM(amount) as monthly_revenue
FROM orders GROUP BY YEAR(order_date), MONTH(order_date);

-- Group by with ORDER BY
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders GROUP BY customer_id 
ORDER BY total_spent DESC LIMIT 10;  -- Top 10 customers

-- NULLs in GROUP BY
SELECT category, COUNT(*) FROM products 
GROUP BY category;  -- NULL appears as separate group

Why it matters: Break down aggregates by dimensions.

Real applications: Sales by region, products by category, traffic by source.

Common mistakes: SELECT non-aggregate columns not in GROUP BY, wrong aggregate function.

WHERE filters rows before grouping. HAVING filters groups after aggregation. HAVING can use aggregate functions.

-- WHERE: Filter before grouping
SELECT product_category, COUNT(*) FROM products 
WHERE price > 100 GROUP BY product_category;

-- HAVING: Filter after grouping
SELECT product_category, COUNT(*) as count FROM products 
GROUP BY product_category 
HAVING count > 5;  -- Only categories with 5+ products

-- Both WHERE and HAVING
SELECT product_category, AVG(price) as avg_price
FROM products 
WHERE stock > 0  -- Filter before grouping
GROUP BY product_category 
HAVING AVG(price) > 50;  -- Filter after grouping

-- Common aggregation pattern
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total
FROM orders WHERE created_at >= '2024-01-01'
GROUP BY customer_id 
HAVING SUM(amount) > 1000;

Why it matters: Filter results at different stages for efficiency.

Real applications: Segment customers by spending, filter categories by count.

Common mistakes: Using aggregate functions in WHERE, unclear filter placement.

GROUP_CONCAT concatenates values within groups into comma-separated strings. Useful for combining related values without extra joins.

-- Basic GROUP_CONCAT
SELECT order_id, GROUP_CONCAT(item_name) as items
FROM order_items GROUP BY order_id;

-- With separator
SELECT order_id, GROUP_CONCAT(item_name, ', ') as items
FROM order_items GROUP BY order_id;

-- With ORDER BY within group
SELECT order_id, GROUP_CONCAT(item_name ORDER BY price DESC)
FROM order_items GROUP BY order_id;

-- Limit items
SELECT order_id, GROUP_CONCAT(DISTINCT item_name SEPARATOR '; ')
FROM order_items WHERE price > 50 GROUP BY order_id;

-- Control length (session variable)
SET SESSION group_concat_max_len = 10000;
SELECT user_id, GROUP_CONCAT(tag_name) FROM user_tags GROUP BY user_id;

Why it matters: Combine related values without complex JOINs.

Real applications: Order items display, tag aggregation, skill listing.

Common mistakes: Exceeding default length, performance with large result sets.

WITH ROLLUP adds grand total and subtotal rows. Generates multiple grouping levels in single result. Useful for hierarchical summaries.

-- Basic ROLLUP
SELECT year, quarter, SUM(amount) FROM sales 
GROUP BY year, quarter WITH ROLLUP;
-- Results: per quarter, per year (subtotals), grand total

-- Example data
2024 | Q1 | 100000
2024 | Q2 | 150000
2024 | NULL | 250000  -- Q1+Q2 subtotal
NULL | NULL | 250000  -- Grand total

-- Multiple levels ROLLUP
SELECT category, subcategory, SUM(sales) FROM products
GROUP BY category, subcategory WITH ROLLUP;

-- GROUPING function to identify rollup rows
SELECT COALESCE(category, 'Total') as category, 
       SUM(amount) FROM sales 
GROUP BY category WITH ROLLUP;

-- Performance: ROLLUP is memory-intensive
-- For large datasets, consider separate queries

Why it matters: Generate hierarchical summaries efficiently.

Real applications: Financial reports, sales rollups, hierarchical totals.

Common mistakes: Performance with large groups, misunderstanding NULL meaning.

COUNT(DISTINCT column) counts unique values. Works with aggregates to exclude duplicates. Useful for unique counts.

-- Unique counts
SELECT COUNT(DISTINCT customer_id) as unique_customers FROM orders;
SELECT COUNT(DISTINCT category) as category_count FROM products;

-- Multiple column distinct
SELECT COUNT(DISTINCT CONCAT(city, state)) as unique_locations 
FROM customers;

-- Distinct with GROUP BY
SELECT store_id, COUNT(DISTINCT customer_id) as unique_customers,
       COUNT(*) as total_transactions
FROM orders GROUP BY store_id;

-- Distinct limitations
-- Not supported: SUM(DISTINCT column) - OK
-- Not supported: AVG(DISTINCT column), MIN(DISTINCT) - not typical

-- Alternative: Use subquery
SELECT AVG(distinct_value) FROM (
    SELECT DISTINCT salary FROM employees WHERE salary > 50000
) as distinct_salaries;

Why it matters: Count unique values without duplicates.

Real applications: Unique customer counts, category analysis.

Common mistakes: Assuming all aggregates support DISTINCT, performance impact.

Performance tips: Use indexes on GROUP BY columns, avoid aggregates on non-grouped columns, consider HAVING clause optimization, use appropriate functions for data types.

-- Index GROUP BY columns
CREATE INDEX idx_order_customer ON orders(customer_id);
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id;  -- Uses index

-- Avoid functions on GROUP BY columns
SELECT YEAR(order_date), SUM(amount) FROM orders 
GROUP BY YEAR(order_date);  -- Function prevents index use

-- Better: Use expression index or partition
ALTER TABLE orders ADD COLUMN year INT GENERATED ALWAYS AS (YEAR(order_date));
CREATE INDEX idx_year ON orders(year);

-- EXPLAIN to verify
EXPLAIN SELECT product_id, COUNT(*) FROM order_items GROUP BY product_id;

-- Slow: Multiple aggregates on large dataset
SELECT category, COUNT(*), SUM(price), AVG(price), MIN(price), MAX(price)
FROM products GROUP BY category;  -- Can be optimized to fewer aggregates

Why it matters: Aggregates on large tables can be slow.

Real applications: Optimizing reports and dashboards.

Common mistakes: Not indexing GROUP BY columns, redundant aggregates.