COUNT counts rows. SUM adds values. AVG calculates average. MIN/MAX find minimum/maximum. All aggregate functions collapse result sets into single rows.
-- Basic aggregates
SELECT COUNT(*) FROM orders; -- Total rows
SELECT COUNT(customer_id) FROM orders; -- Non-NULL values
SELECT SUM(amount) FROM orders; -- Total amount
SELECT AVG(price) FROM products; -- Average price
SELECT MIN(created_at), MAX(created_at) FROM orders; -- Date range
-- Multiple aggregates
SELECT COUNT(*) as total_orders, SUM(amount) as total_revenue,
AVG(amount) as avg_order FROM orders;
-- With GROUP BY
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders GROUP BY customer_id;Why it matters: Summarize large datasets into meaningful metrics.
Real applications: Dashboards, reporting, business analytics.
Common mistakes: COUNT(*) vs COUNT(column), NULL handling, GROUP BY without aggregates.
COUNT(*) includes NULLs and counts all rows. COUNT(column) counts non-NULL values only. Difference appears when column has NULLs.
-- Data example
id | name | email
1 | Alice | alice@ex.com
2 | Bob | NULL
3 | Charlie | charlie@ex.com
-- Comparison
SELECT COUNT(*) FROM users; -- 3 (all rows)
SELECT COUNT(email) FROM users; -- 2 (non-NULL emails)
SELECT COUNT(email) as valid_emails FROM users; -- 2
-- Practical scenario
SELECT COUNT(*) as total_rows,
COUNT(email) as verified_emails,
COUNT(*) - COUNT(email) as unverified_emails
FROM users;Why it matters: Understand NULL impact on aggregates.
Real applications: Data quality reports, completeness checks.
Common mistakes: Not accounting for NULLs, using COUNT(*) when column count is needed.
GROUP BY divides rows into groups based on column values. Aggregates are calculated per group. Essential for breakdowns by category, region, date, etc.
-- Group by single column
SELECT product_category, COUNT(*) as product_count, AVG(price) as avg_price
FROM products GROUP BY product_category;
-- Group by multiple columns
SELECT YEAR(order_date) as year, MONTH(order_date) as month,
COUNT(*) as monthly_orders, SUM(amount) as monthly_revenue
FROM orders GROUP BY YEAR(order_date), MONTH(order_date);
-- Group by with ORDER BY
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total_spent
FROM orders GROUP BY customer_id
ORDER BY total_spent DESC LIMIT 10; -- Top 10 customers
-- NULLs in GROUP BY
SELECT category, COUNT(*) FROM products
GROUP BY category; -- NULL appears as separate groupWhy it matters: Break down aggregates by dimensions.
Real applications: Sales by region, products by category, traffic by source.
Common mistakes: SELECT non-aggregate columns not in GROUP BY, wrong aggregate function.
WHERE filters rows before grouping. HAVING filters groups after aggregation. HAVING can use aggregate functions.
-- WHERE: Filter before grouping
SELECT product_category, COUNT(*) FROM products
WHERE price > 100 GROUP BY product_category;
-- HAVING: Filter after grouping
SELECT product_category, COUNT(*) as count FROM products
GROUP BY product_category
HAVING count > 5; -- Only categories with 5+ products
-- Both WHERE and HAVING
SELECT product_category, AVG(price) as avg_price
FROM products
WHERE stock > 0 -- Filter before grouping
GROUP BY product_category
HAVING AVG(price) > 50; -- Filter after grouping
-- Common aggregation pattern
SELECT customer_id, COUNT(*) as order_count, SUM(amount) as total
FROM orders WHERE created_at >= '2024-01-01'
GROUP BY customer_id
HAVING SUM(amount) > 1000;Why it matters: Filter results at different stages for efficiency.
Real applications: Segment customers by spending, filter categories by count.
Common mistakes: Using aggregate functions in WHERE, unclear filter placement.
GROUP_CONCAT concatenates values within groups into comma-separated strings. Useful for combining related values without extra joins.
-- Basic GROUP_CONCAT
SELECT order_id, GROUP_CONCAT(item_name) as items
FROM order_items GROUP BY order_id;
-- With separator
SELECT order_id, GROUP_CONCAT(item_name, ', ') as items
FROM order_items GROUP BY order_id;
-- With ORDER BY within group
SELECT order_id, GROUP_CONCAT(item_name ORDER BY price DESC)
FROM order_items GROUP BY order_id;
-- Limit items
SELECT order_id, GROUP_CONCAT(DISTINCT item_name SEPARATOR '; ')
FROM order_items WHERE price > 50 GROUP BY order_id;
-- Control length (session variable)
SET SESSION group_concat_max_len = 10000;
SELECT user_id, GROUP_CONCAT(tag_name) FROM user_tags GROUP BY user_id;Why it matters: Combine related values without complex JOINs.
Real applications: Order items display, tag aggregation, skill listing.
Common mistakes: Exceeding default length, performance with large result sets.
WITH ROLLUP adds grand total and subtotal rows. Generates multiple grouping levels in single result. Useful for hierarchical summaries.
-- Basic ROLLUP
SELECT year, quarter, SUM(amount) FROM sales
GROUP BY year, quarter WITH ROLLUP;
-- Results: per quarter, per year (subtotals), grand total
-- Example data
2024 | Q1 | 100000
2024 | Q2 | 150000
2024 | NULL | 250000 -- Q1+Q2 subtotal
NULL | NULL | 250000 -- Grand total
-- Multiple levels ROLLUP
SELECT category, subcategory, SUM(sales) FROM products
GROUP BY category, subcategory WITH ROLLUP;
-- GROUPING function to identify rollup rows
SELECT COALESCE(category, 'Total') as category,
SUM(amount) FROM sales
GROUP BY category WITH ROLLUP;
-- Performance: ROLLUP is memory-intensive
-- For large datasets, consider separate queriesWhy it matters: Generate hierarchical summaries efficiently.
Real applications: Financial reports, sales rollups, hierarchical totals.
Common mistakes: Performance with large groups, misunderstanding NULL meaning.
COUNT(DISTINCT column) counts unique values. Works with aggregates to exclude duplicates. Useful for unique counts.
-- Unique counts
SELECT COUNT(DISTINCT customer_id) as unique_customers FROM orders;
SELECT COUNT(DISTINCT category) as category_count FROM products;
-- Multiple column distinct
SELECT COUNT(DISTINCT CONCAT(city, state)) as unique_locations
FROM customers;
-- Distinct with GROUP BY
SELECT store_id, COUNT(DISTINCT customer_id) as unique_customers,
COUNT(*) as total_transactions
FROM orders GROUP BY store_id;
-- Distinct limitations
-- Not supported: SUM(DISTINCT column) - OK
-- Not supported: AVG(DISTINCT column), MIN(DISTINCT) - not typical
-- Alternative: Use subquery
SELECT AVG(distinct_value) FROM (
SELECT DISTINCT salary FROM employees WHERE salary > 50000
) as distinct_salaries;Why it matters: Count unique values without duplicates.
Real applications: Unique customer counts, category analysis.
Common mistakes: Assuming all aggregates support DISTINCT, performance impact.
Performance tips: Use indexes on GROUP BY columns, avoid aggregates on non-grouped columns, consider HAVING clause optimization, use appropriate functions for data types.
-- Index GROUP BY columns
CREATE INDEX idx_order_customer ON orders(customer_id);
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; -- Uses index
-- Avoid functions on GROUP BY columns
SELECT YEAR(order_date), SUM(amount) FROM orders
GROUP BY YEAR(order_date); -- Function prevents index use
-- Better: Use expression index or partition
ALTER TABLE orders ADD COLUMN year INT GENERATED ALWAYS AS (YEAR(order_date));
CREATE INDEX idx_year ON orders(year);
-- EXPLAIN to verify
EXPLAIN SELECT product_id, COUNT(*) FROM order_items GROUP BY product_id;
-- Slow: Multiple aggregates on large dataset
SELECT category, COUNT(*), SUM(price), AVG(price), MIN(price), MAX(price)
FROM products GROUP BY category; -- Can be optimized to fewer aggregatesWhy it matters: Aggregates on large tables can be slow.
Real applications: Optimizing reports and dashboards.
Common mistakes: Not indexing GROUP BY columns, redundant aggregates.