Indexes are data structures that allow the database engine to quickly find rows in a table without scanning every row. They work similarly to a book's index, enabling efficient searching and retrieval of data. Proper indexing is crucial for performance as it can dramatically reduce query execution time.
-- Create a simple index
CREATE INDEX idx_email ON users(email);
-- Query using index is much faster
SELECT * FROM users WHERE email = 'john@example.com';
Why it matters: Indexes are fundamental to database performance. Without proper indexes, queries must perform full table scans, which become increasingly slow as data grows.
Real applications: E-commerce sites use indexes on product IDs and user emails for instant search results. Banking systems index account numbers for rapid lookups during transactions.
Common mistakes: Creating too many indexes slows down INSERT/UPDATE operations. Beginners often index every column without analyzing query patterns.
MySQL supports several index types including PRIMARY KEY (unique identifier), UNIQUE INDEX (ensures uniqueness), FULLTEXT INDEX (for text searching), and SPATIAL INDEX (for geographic data). Each type serves different query patterns and has specific performance characteristics.
-- Different index types
CREATE TABLE products (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE,
description TEXT,
FULLTEXT INDEX idx_description (description)
);
-- SPATIAL index for geographic queries
ALTER TABLE locations ADD SPATIAL INDEX idx_geometry (coordinates);
Why it matters: Choosing the right index type ensures optimal query performance for specific access patterns and data types.
Real applications: E-commerce sites use FULLTEXT indexes for product searches, mapping applications use SPATIAL indexes for location queries.
Common mistakes: Using FULLTEXT for simple string matching leads to poor performance. Not understanding when each index type is appropriate.
PRIMARY KEY uniquely identifies each row and cannot contain NULL values, while a UNIQUE INDEX can have NULL values and there can be multiple NULL values. PRIMARY KEY is typically used as the main identifier, whereas UNIQUE INDEX is used for other fields that must be unique.
CREATE TABLE users (
user_id INT PRIMARY KEY, -- No nulls allowed
email VARCHAR(255) UNIQUE, -- Can have NULL values
phone VARCHAR(20) UNIQUE -- Multiple NULLs allowed
);
-- This is valid - multiple NULLs in unique column
INSERT INTO users VALUES (1, NULL, NULL);
INSERT INTO users VALUES (2, NULL, '123-456');
Why it matters: Understanding this distinction helps design proper database constraints and query logic.
Real applications: User IDs use PRIMARY KEY, email uses UNIQUE INDEX (optional registration field), social media handles use UNIQUE INDEX.
Common mistakes: Trying to use multiple PRIMARY KEYs, not understanding NULL handling in UNIQUE indexes.
CREATE INDEX creates an index on one or more columns while DROP INDEX removes an existing index. The syntax varies slightly for PRIMARY KEY and UNIQUE constraints which are typically managed through ALTER TABLE statements.
-- Create single-column index
CREATE INDEX idx_lastname ON employees(last_name);
-- Create multi-column index
CREATE INDEX idx_name ON employees(last_name, first_name);
-- Create unique index
CREATE UNIQUE INDEX idx_ssn ON employees(ssn);
-- Drop index
DROP INDEX idx_lastname ON employees;
-- Alter table to drop primary key
ALTER TABLE employees DROP PRIMARY KEY;
-- Create index if it doesn't exist
CREATE INDEX IF NOT EXISTS idx_dept ON employees(department_id);
Why it matters: Proper index management is essential for database maintenance and optimization as requirements change.
Real applications: DBA tasks include adding indexes after production issues and removing unused indexes to free up space.
Common mistakes: Not checking if index exists before dropping, creating duplicate indexes, not considering lock time on large tables.
Composite indexes (also called multi-column indexes) index multiple columns together, optimizing queries that filter on multiple columns. The order of columns matters significantly due to the leftmost prefix principle where MySQL can use the index if the query starts with the leftmost columns.
-- Composite index on last_name, first_name
CREATE INDEX idx_name ON employees(last_name, first_name);
-- This query uses the index efficiently
SELECT * FROM employees WHERE last_name='Smith' AND first_name='John';
-- This also uses the index (first column match)
SELECT * FROM employees WHERE last_name='Smith';
-- This query cannot use the index effectively
SELECT * FROM employees WHERE first_name='John';
Why it matters: Composite indexes reduce the number of indexes needed and improve queries with multiple WHERE conditions.
Real applications: Address lookups (country, state, city), user searches (department, role, status).
Common mistakes: Creating composite indexes in wrong column order, not understanding leftmost prefix principle, creating redundant single-column indexes when composite exists.
EXPLAIN (or EXPLAIN PLAN) shows how MySQL executes a query, including whether indexes are used, row count estimates, and join order. It's essential for performance optimization and understanding query execution plans. The type column indicates the join type, with ALL (full scan) being worst.
-- Analyze query performance
EXPLAIN SELECT * FROM employees WHERE last_name = 'Smith';
-- Output columns: id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
-- Type values from best to worst:
-- system -> const -> eq_ref -> ref -> fulltext -> ref_or_null -> index_merge -> unique_subquery -> index_subquery -> range -> index -> ALL
-- Check if index is used
EXPLAIN SELECT * FROM employees WHERE employee_id = 123; -- type: const (best)
EXPLAIN SELECT * FROM employees WHERE last_name = 'Smith'; -- type: ALL (worst, no index)
Why it matters: EXPLAIN reveals whether indexes are being used and identifies slow queries before production issues occur.
Real applications: Performance troubleshooting, query optimization during development, capacity planning.
Common mistakes: Ignoring rows column which indicates estimated rows examined, not checking type column for full scans.
Foreign key indexes should be created to optimize JOIN operations between tables. When performing JOINs, MySQL needs to quickly find matching rows in the related table. Without indexes on foreign keys, MySQL performs full table scans for each join, significantly impacting performance.
-- Without index on foreign key - slow joins
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT, -- Foreign key, but no index
order_date DATE
);
-- With index on foreign key - fast joins
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
INDEX idx_customer (customer_id)
);
-- JOIN benefits from customer_id index
SELECT o.*, c.name FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.country = 'USA';
Why it matters: Foreign key indexes dramatically improve JOIN performance, which is critical in relational databases.
Real applications: Online stores (orders to customers), banking (transactions to accounts), social media (posts to users).
Common mistakes: Assuming foreign keys are automatically indexed, not considering JOIN patterns when designing schema.
Covering indexes include all columns needed for a query result, allowing MySQL to satisfy the entire query from the index without accessing the main table. This eliminates the lookup step and significantly improves performance, especially for large result sets or frequently run queries.
-- Covering index example
CREATE INDEX idx_covering ON employees(last_name, first_name, salary);
-- Query that uses covering index - no table lookup needed
SELECT last_name, first_name, salary FROM employees
WHERE last_name = 'Smith' AND first_name = 'John';
-- Query that does NOT use covering index - must access table
SELECT * FROM employees WHERE last_name = 'Smith';
-- EXPLAIN shows 'Using index' for covering index
EXPLAIN SELECT last_name, first_name, salary FROM employees
WHERE last_name = 'Smith'; -- Extra: Using index
Why it matters: Covering indexes can reduce I/O operations significantly, improving query speed especially for frequently executed queries.
Real applications: Leaderboard queries (user_id, score, rank), product listings (product_id, price, rating).
Common mistakes: Creating overly large covering indexes that slow down writes, not considering maintenance overhead.
MySQL's default index uses a B-tree structure, a self-balancing tree where data is sorted and organized hierarchically. B-trees ensure logarithmic time complexity for searches, inserts, and deletes. The structure automatically rebalances to maintain optimal height, ensuring consistent performance regardless of data distribution.
-- B-tree structure visualization:
-- [50]
-- / \
-- [20] [80]
-- / \ / \
-- [10] [35] [65] [90]
-- All operations have O(log n) complexity
-- Find: navigate from root to leaf - logarithmic
-- Insert: find position and rebalance - logarithmic
-- Delete: remove and rebalance - logarithmic
-- B-tree properties enable efficient range queries
SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 100000; -- Uses B-tree range scan
Why it matters: Understanding B-tree structure explains why indexes are so effective and ensures proper index design.
Real applications: Any database with indexed data uses B-tree or similar structures for efficiency.
Common mistakes: Not understanding why range queries still benefit from indexes, assuming linear scan pattern.
Hash indexes use hash functions to locate data, optimizing exact-value lookups but not range queries. Unlike B-tree indexes which maintain sorted order, hash indexes scatter data based on hash values. Memory storage engine supports hash indexes; InnoDB primarily uses B-tree for reliability and range query support.
-- Hash index characteristics
-- Pros: O(1) average lookup for exact matches
-- Cons: Cannot do range queries, no ordering, no prefix matching
-- CREATE HASH INDEX (Memory engine only)
CREATE TABLE hash_example (
id INT PRIMARY KEY USING HASH,
email VARCHAR(255) UNIQUE USING HASH
) ENGINE=MEMORY;
-- Hash index - excellent for
SELECT * FROM users WHERE user_id = 12345;
-- Hash index - poor for
SELECT * FROM users WHERE salary > 50000; -- Must scan all entries
SELECT * FROM users WHERE email LIKE 'john%'; -- Cannot use hash index
-- InnoDB (default) uses B-tree even for PRIMARY KEY
CREATE TABLE innodb_example (
id INT PRIMARY KEY -- B-tree, not hash
);
Why it matters: Choosing between hash and B-tree indexes depends on query patterns and storage engine capabilities.
Real applications: In-memory session stores, cache lookups in memory storage engine.
Common mistakes: Trying to use hash indexes for range queries, not understanding storage engine limitations.
Partial indexes index only a subset of table rows based on a WHERE condition, reducing index size and improving query performance. MySQL doesn't support partial indexes natively like PostgreSQL, but you can achieve similar benefits through composite indexes with conditional queries or by partitioning data strategically.
-- Partial index concept in MySQL (workaround)
-- Index only active employees, excluding inactive
CREATE INDEX idx_active_salary ON employees(salary)
WHERE active = 1; -- MySQL doesn't support WHERE, but query optimization still helpful
-- Optimal approach in MySQL: include filtering columns
CREATE INDEX idx_emp_status ON employees(active, salary);
-- Query benefits from limited index scope
SELECT * FROM employees WHERE active = 1 AND salary > 50000;
-- Different approach: separate table for active employees
CREATE TABLE active_employees AS
SELECT * FROM employees WHERE active = 1;
CREATE INDEX idx_salary ON active_employees(salary);
-- Queries on active employees hit smaller, faster index
Why it matters: Partial indexes reduce storage requirements and improve cache efficiency for frequently filtered queries.
Real applications: Archived data (index only non-archived), soft deletes (index only non-deleted).
Common mistakes: Forgetting to include filter columns in index, creating redundant full-table indexes.
FULLTEXT indexes are specialized for text searching, enabling natural language searches beyond simple LIKE patterns. They support boolean search operators, relevance ranking, and are much faster than LIKE for large text documents. FULLTEXT is ideal for article searches, product descriptions, and content management systems.
-- Create FULLTEXT index
CREATE TABLE articles (
id INT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
FULLTEXT INDEX idx_content (title, content)
);
-- FULLTEXT search queries
SELECT * FROM articles WHERE MATCH(content) AGAINST('database' IN BOOLEAN MODE);
-- Boolean search operators
SELECT * FROM articles WHERE MATCH(content)
AGAINST('+MySQL -PostgreSQL' IN BOOLEAN MODE); -- Must have MySQL, exclude PostgreSQL
-- Natural language search (default)
SELECT * FROM articles WHERE MATCH(content)
AGAINST('database optimization techniques');
-- Relevance ranking
SELECT *, MATCH(content) AGAINST('database' IN BOOLEAN MODE) as relevance
FROM articles ORDER BY relevance DESC;
Why it matters: FULLTEXT indexes enable efficient text search functionality critical for content-heavy applications.
Real applications: Article databases, product searches, job postings, documentation systems.
Common mistakes: Using LIKE for large text searches, not understanding boolean mode, ignoring stopwords.
Effective index design involves analyzing query patterns first, indexing columns used in WHERE clauses and JOINs, considering column cardinality (uniqueness), ordering composite index columns based on query frequency, and regularly monitoring index usage. Avoid over-indexing as each index increases write performance impact.
-- Best practices for index design
-- 1. Index high-cardinality columns first in composite indexes
CREATE INDEX idx_name ON users(gender, last_name); -- Bad: gender has low cardinality
CREATE INDEX idx_name ON users(last_name, gender); -- Good: last_name has high cardinality
-- 2. Avoid indexing low-cardinality columns
-- Indexing is_active (only true/false) rarely helps, MySQL may ignore it
-- 3. Index columns in WHERE clause and JOIN conditions
SELECT * FROM employees e JOIN departments d ON e.dept_id = d.dept_id
WHERE e.hire_date > '2020-01-01';
-- Index: employees(hire_date), employees(dept_id), departments(dept_id)
-- 4. Check for unused indexes
-- Monitor with Performance Schema or INFORMATION_SCHEMA
-- 5. Regular maintenance
ANALYZE TABLE employees; -- Update index statistics
OPTIMIZE TABLE employees; -- Reclaim space, rebuild indexes
-- 6. Consider insert/update cost
-- More indexes = slower writes, optimize for read-heavy workloads
Why it matters: Poor index strategy can harm overall database performance, while good design significantly improves it.
Real applications: Production database tuning, schema design for new systems.
Common mistakes: Creating indexes without understanding query patterns, not removing unused indexes, over-indexing.
Index prefixes allow indexing only the first N characters of a column, reducing index size while maintaining usefulness for searches. Key length indicates the space used by index, affecting memory usage and cache efficiency. Shorter keys use less memory and cache, enabling more index data in memory.
-- Index prefix on long string columns
CREATE TABLE products (
id INT PRIMARY KEY,
description VARCHAR(500),
INDEX idx_desc_prefix (description(20)) -- Index only first 20 characters
);
-- Prefix works for exact matches and LIKE starting with value
SELECT * FROM products WHERE description LIKE 'High quality%'; -- Uses prefix index
-- But doesn't work for LIKE in middle
SELECT * FROM products WHERE description LIKE '%durable%'; -- Doesn't use prefix index
-- Check key length in SHOW INDEX
SHOW INDEX FROM products;
-- key_len shows actual bytes used (20 for VARCHAR(20) or prefix(20))
-- Memory impact example
-- Full column index: VARCHAR(255) ≈ 255 bytes per index entry
-- Prefix index: VARCHAR(255,20) ≈ 20 bytes per index entry (87% smaller)
Why it matters: Prefix indexes reduce memory usage and improve cache hit rates while maintaining search functionality.
Real applications: Long URL columns, JSON fields, product descriptions.
Common mistakes: Using too short a prefix (15 chars) losing search selectivity, not considering prefix limitations.
Avoid indexing small tables (full scan is faster), low-cardinality columns (many duplicates make index less effective), frequently updated columns (index maintenance overhead), and rarely accessed columns. Monitor unused indexes and remove them to reduce write overhead and storage requirements.
-- When NOT to use indexes
-- 1. Small tables - full scan is faster
-- Size < 1000 rows: Usually no benefit from indexing
SELECT * FROM lookup_values; -- Better as full scan
-- 2. Low-cardinality columns
CREATE TABLE products (
id INT,
in_stock BOOLEAN -- Only 2 values (true/false)
);
-- Indexing in_stock doesn't help much, MySQL may ignore it
-- 3. Columns always used together with very selective column
-- Bad: Index on category if always queried with specific product_id
CREATE INDEX idx_category ON products(category); -- Redundant if product_id always used first
-- Identify unused indexes
SELECT * FROM INFORMATION_SCHEMA.STATISTICS
WHERE OBJECT_SCHEMA = 'database_name' AND INDEX_NAME != 'PRIMARY';
-- Check Performance Schema (if enabled)
-- Look for unused indexes in performance_schema.table_io_waits_summary_by_index_usage
-- Remove unused indexes to speed up writes
DROP INDEX idx_rarely_used ON table_name;
Why it matters: Removing unnecessary indexes improves INSERT/UPDATE performance and reduces storage overhead.
Real applications: Database cleanup, performance optimization after query pattern analysis.
Common mistakes: Creating indexes on small tables, not regularly reviewing index usage, keeping indexes from old features.