MySQL

MySQL Performance Tuning Interview Questions

8 Questions

Key metrics: Query time, CPU/memory usage, disk I/O, lock contention, replication lag, connection count. Use SHOW STATUS, PERFORMANCE_SCHEMA, and monitoring tools.

-- Check server status
SHOW STATUS LIKE 'Threads%';
-- Threads_connected: Current connections
-- Threads_running: Active queries

-- Check running queries
SHOW PROCESSLIST;
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE TIME > 300;  -- Long-running

-- Performance Schema statistics
SELECT * FROM performance_schema.events_statements_summary_by_digest 
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;  -- Top slow queries

-- InnoDB status
SHOW ENGINE INNODB STATUS;  -- I/O, locks, transactions

-- Slow query log queries
SELECT * FROM mysql.slow_log 
WHERE query_time > INTERVAL '1' SECOND 
ORDER BY start_time DESC;

-- Check connections per host
SELECT host, COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY host;

-- Query count by type
SELECT command, COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY command;

-- Database size
SELECT SUM(data_length + index_length) / 1024 / 1024 as size_mb 
FROM INFORMATION_SCHEMA.TABLES 
WHERE table_schema = 'mydb';

Why it matters: Identify bottlenecks and optimization opportunities.

Real applications: Production monitoring, capacity planning, alerting.

Common mistakes: Not monitoring until problems occur, ignoring disk I/O.

Slow query log captures queries exceeding threshold. Use EXPLAIN to analyze query plans. Optimize with indexes, query rewrite, and joins.

-- Enable slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;  -- Queries > 1 second
SET GLOBAL log_queries_not_using_indexes = ON;  -- Log full table scans

-- View slow query log
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;

-- Analyze query with EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 5 AND order_date > '2024-01-01'\G

-- Output analysis
-- type: ALL (full table scan), INDEX (good), CONST (best)
-- rows: Estimated rows scanned
-- Extra: "Using where", "Using index", "Using filesort"

-- Explain with format=JSON
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE customer_id = 5\G

-- Add missing index
CREATE INDEX idx_customer_date ON orders(customer_id, order_date);

-- Query rewrite for optimization
-- Slow: SELECT * FROM orders WHERE YEAR(order_date) = 2024
-- Better: SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';

-- Use pt-query-digest to analyze log
pt-query-digest /var/log/mysql/slow-query.log

-- Simple slow query aggregation
SELECT SUM(query_time) as total_time, COUNT(*) as count, query
FROM mysql.slow_log 
GROUP BY query 
ORDER BY total_time DESC LIMIT 10;

Why it matters: Find and fix performance problems efficiently.

Real applications: Production troubleshooting, capacity planning.

Common mistakes: Adding indexes blindly, not considering query rewrite.

Index optimization: Use selective columns, cover queries, avoid full scans, monitor unused indexes, balance read/write performance.

-- Find missing indexes (queries with full table scans)
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST WHERE state LIKE '%reading%';

-- Check index usage
SELECT object_schema, object_name, count_read, count_write 
FROM sys.schema_unused_indexes;  -- Unused indexes

SELECT object_schema, object_name, count_read, count_write 
FROM sys.schema_io_global_by_file_by_bytes;

-- Analyze index statistics
ANALYZE TABLE orders;  -- Update statistics
SELECT * FROM INFORMATION_SCHEMA.STATISTICS 
WHERE table_name = 'orders' AND seq_in_index = 1;

-- Covering index (SELECT can be fully satisfied)
SELECT id, customer_id, amount FROM orders WHERE customer_id = 5 AND order_date > '2024-01-01';
-- Index: (customer_id, order_date, id, amount) - covers all columns

-- Index for ORDER BY
SELECT * FROM orders WHERE customer_id = 5 ORDER BY order_date;
-- Index: (customer_id, order_date) - efficient sort

-- Composite index order matters
-- Query: WHERE a = 1 AND b = 2 ORDER BY c
-- Index: (a, b, c) - optimal
-- Index: (b, a, c) - still works but less efficient

-- Drop unused indexes
DROP INDEX idx_unused ON table_name;

-- Check for redundant indexes
SELECT * FROM sys.schema_redundant_indexes;

-- Disable index temporarily
ALTER TABLE orders DISABLE KEYS;
-- [Bulk operations]
ALTER TABLE orders ENABLE KEYS;

Why it matters: Maximum query performance with minimal overhead.

Real applications: Query optimization, cleanup of unused indexes.

Common mistakes: Too many indexes, not monitoring usage.

Buffer pool caches data and indexes. Should be 50-80% of server RAM. Larger pool reduces disk I/O. Monitor hit ratio and LRU list.

-- Configure buffer pool size
SET GLOBAL innodb_buffer_pool_size = 8589934592;  -- 8GB

-- Check buffer pool metrics
SHOW STATUS WHERE variable_name LIKE 'Innodb_buffer_pool%';
-- Innodb_buffer_pool_pages_total: Total pages
-- Innodb_buffer_pool_pages_data: Data pages currently cached
-- Innodb_buffer_pool_pages_free: Free pages

-- Calculate hit ratio
-- Hit ratio = (Innodb_buffer_pool_pages_total - Innodb_buffer_pool_pages_free) 
--             / Innodb_buffer_pool_pages_total

-- Query for hit ratio
SELECT 
  (1 - (CAST(Innodb_buffer_pool_pages_free AS DECIMAL) + 
        CAST(Innodb_buffer_pool_pages_misc AS DECIMAL)) / 
       CAST(Innodb_buffer_pool_pages_total AS DECIMAL)) * 100 
  AS buffer_pool_hit_ratio_percent
FROM INFORMATION_SCHEMA.GLOBAL_STATUS;

-- Monitor LRU list (young/old pages)
SHOW STATUS LIKE 'Innodb_buffer_pool_pages_young%';
-- Pages_young: Recently accessed
-- Pages_not_young: Older pages

-- Avoid buffer pool thrashing
-- Scan large table without loading into pool
SET SESSION innodb_buffer_pool_dump_pct = 0;  -- Don't save to cache
SELECT * FROM large_table;  -- Full scan, not cached

-- Warm up buffer pool on restart
mk-warm-cache --cache=innodb < dump.sql

-- Monitor buffer pool operations
SELECT object_schema, object_name, pages_evicted_on_insert
FROM sys.innodb_buffer_stats_by_table 
ORDER BY pages_evicted_on_insert DESC;

Why it matters: Most important MySQL tuning parameter.

Real applications: Production optimization, new server setup.

Common mistakes: Too small buffer pool, not monitoring hit ratio.

Lock contention causes slowdowns. Reduce with shorter transactions, row-level locking, partitioning, and avoiding hot tables.

-- Check lock waits
SHOW ENGINE INNODB STATUS;  -- Look for WAITING FOR lock

-- Query lock information
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS;
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS;

-- Visualize locks with sys schema
SELECT * FROM sys.innodb_lock_waits;

-- Reduce lock time: Keep transactions short
-- BAD: Long transaction locks for long time
START TRANSACTION;
UPDATE orders SET status = 'processing' WHERE id = 1;
[Long-running operation outside DB]
COMMIT;

-- GOOD: Minimize DB lock time
UPDATE orders SET status = 'processing' WHERE id = 1;
[Long-running operation]

-- Use appropriate isolation levels
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;  -- Lower lock overhead
-- vs
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;  -- More locks

-- Partition hot tables to reduce contention
-- Instead of one auto_increment sequence, use per-partition
ALTER TABLE events PARTITION BY RANGE (id) (
    PARTITION p1 VALUES LESS THAN (10000),
    PARTITION p2 VALUES LESS THAN (20000)
);

-- Use gap locks carefully
-- SELECT ... FOR UPDATE locks entire range
SELECT * FROM orders WHERE id > 100 FOR UPDATE;

-- Instead: Lock specific rows
SELECT * FROM orders WHERE id = 100 FOR UPDATE;

-- Check lock statistics
SELECT * FROM performance_schema.table_io_waits_summary_by_table 
ORDER BY sum_timer_wait DESC;

Why it matters: Concurrent query performance.

Real applications: High-concurrency systems, batch operations.

Common mistakes: Long transactions, unnecessary FOR UPDATE.

Table optimization: Choose appropriate data types, normalize schema, partition large tables, archive old data, use views to hide complexity.

-- Use efficient data types
-- BAD: VARCHAR(255) for phone number, email
-- GOOD: VARCHAR(20) phone, VARCHAR(100) email

-- Fixed vs variable length
-- CHAR(10): Always 10 bytes
-- VARCHAR(10): 1-10 bytes (plus length byte)
-- Use VARCHAR for variable content

-- Numeric efficiency
-- INT (4 bytes) vs BIGINT (8 bytes) - use smallest needed
-- DECIMAL(10,2) for money, not FLOAT

-- Archive old data
CREATE TABLE orders_archive LIKE orders;
INSERT INTO orders_archive 
SELECT * FROM orders WHERE created_at < '2023-01-01';
DELETE FROM orders WHERE created_at < '2023-01-01';

-- Partition table for range queries
ALTER TABLE events PARTITION BY RANGE (YEAR(created_at)) (
    PARTITION y2022 VALUES LESS THAN (2023),
    PARTITION y2023 VALUES LESS THAN (2024),
    PARTITION y2024 VALUES LESS THAN (2025),
    PARTITION y_future VALUES LESS THAN MAXVALUE
);

-- Check table sizes
SELECT table_name, 
       ROUND(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB'
FROM INFORMATION_SCHEMA.TABLES 
WHERE table_schema = 'mydb' 
ORDER BY data_length + index_length DESC;

-- Optimize/rebuild table
OPTIMIZE TABLE orders;  -- Defragment, update statistics
ANALYZE TABLE orders;  -- Update statistics only

Why it matters: Storage efficiency, query performance.

Real applications: Large datasets, long-term data retention.

Common mistakes: Over-sized data types, no archiving.

JOIN optimization: Index join columns, join in optimal order, avoid Cartesian products, use EXPLAIN to verify plans.

-- Create indexes on join columns
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_customer_id ON customers(id);

-- JOIN optimization with EXPLAIN
EXPLAIN SELECT o.*, c.name FROM orders o 
JOIN customers c ON o.customer_id = c.id 
WHERE o.order_date > '2024-01-01'\G

-- Optimal join order: Filter first
-- Join smaller result set (customers in USA)
SELECT COUNT(*) FROM customers WHERE country = 'USA';  -- 100K
SELECT COUNT(*) FROM orders;  -- 10M rows

-- Optimal query
SELECT o.*, c.name FROM orders o 
JOIN customers c ON o.customer_id = c.id 
WHERE c.country = 'USA' AND o.order_date > '2024-01-01';
-- Filters customers first, then joins

-- Avoid Cartesian product
-- BAD: JOIN without ON clause
SELECT * FROM orders, customers;  -- 10M * 100K rows!

-- Use STRAIGHT_JOIN to force order
SELECT STRAIGHT_JOIN ... FROM orders 
JOIN customers ON o.customer_id = c.id;

-- Multiple joins optimization
SELECT o.*, c.name, p.category 
FROM orders o 
JOIN customers c ON o.customer_id = c.id 
JOIN products p ON o.product_id = p.id 
WHERE o.order_date > '2024-01-01';

-- Use EXPLAIN to verify all joins are indexed
EXPLAIN FORMAT=JSON SELECT * FROM ...\G

Why it matters: Multi-table query performance.

Real applications: Complex reports, aggregations.

Common mistakes: Missing indexes, inefficient join order.

Best practices: Monitor continuously, measure before optimizing, use EXPLAIN, avoid premature optimization, test changes, maintain statistics.

-- ✅ Good: Use EXPLAIN first
EXPLAIN SELECT * FROM orders WHERE customer_id = 5;
-- Check type (INDEX, const), rows, Extra (using index, etc)

-- ✅ Good: Add indexes on WHERE/JOIN columns
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_order_date ON orders(order_date);

-- ✅ Good: Monitor metrics regularly
SELECT COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE time > 300;  -- Long queries
SELECT SUM(query_time) FROM mysql.slow_log GROUP BY query;  -- Find slowest

-- ✅ Good: Keep statistics updated
ANALYZE TABLE orders;
ANALYZE TABLE customers;

-- ❌ Bad: Premature optimization without measurement
CREATE INDEX idx_price ON products(price);  -- If nobody filters by price

-- ❌ Bad: Ignore query plans
SELECT * FROM large_table WHERE column LIKE '%text%';  -- Full table scan with LIKE prefix

-- ✅ Good: Test on staging first
# Run on staging to verify improvement
EXPLAIN SELECT * FROM orders WHERE customer_id = 5;  -- Before
CREATE INDEX idx_customer_id ON orders(customer_id);
EXPLAIN SELECT * FROM orders WHERE customer_id = 5;  -- After: type changes to const

-- ✅ Good: Document decisions
-- Add comments explaining why indexes exist
-- Track performance metrics before/after
-- Document configuration rationale

-- Configuration optimization (my.cnf)
[mysqld]
innodb_buffer_pool_size = 8G  -- 50-80% of server RAM
max_connections = 500  -- Appropriate for workload
thread_cache_size = 100  -- Reduce thread creation
query_cache_size = 0  -- Disabled in MySQL 8.0+
slow_query_log = 1
long_query_time = 1

Why it matters: Sustainable high performance systems.

Real applications: Production optimization, continuous improvement.

Common mistakes: Optimizing wrong bottleneck, no metrics, not testing.