CRUD Operations

Interview questions and answers

CRUD Operations — MySQL Interview Questions

CRUD represents the four fundamental database operations: Create (INSERT new data), Read (SELECT to retrieve data), Update (UPDATE to modify existing data), and Delete (DELETE to remove data). Every database application performs these operations repeatedly. Understanding CRUD is essential because most database work revolves around these four operations, and optimizing them improves application performance.

-- CREATE: Insert new data
INSERT INTO employees (name, salary) VALUES ('John', 50000);

-- READ: Retrieve data
SELECT * FROM employees WHERE id = 1;

-- UPDATE: Modify existing data
UPDATE employees SET salary = 55000 WHERE id = 1;

-- DELETE: Remove data
DELETE FROM employees WHERE id = 1;

Why it matters: CRUD is the foundation of database work. Demonstrating solid CRUD knowledge shows proficiency in fundamental database operations and SQL understanding.

Real applications: Every web application performs CRUD operations—user registration (CREATE), viewing profiles (READ), changing settings (UPDATE), and account deletion (DELETE).

Common mistakes: Forgetting WHERE clauses causing UPDATE or DELETE to affect all rows, using SELECT without proper filtering returning unnecessary data, or not understanding transaction context of CRUD operations.

INSERT adds new rows to a table. Basic syntax inserts single rows with specified values, while INSERT...SELECT inserts data from another table. INSERT IGNORE silently skips errors like duplicate keys, REPLACE replaces existing rows with same key, and INSERT...ON DUPLICATE KEY UPDATE updates if key exists, otherwise inserts. Multiple rows can be inserted in one statement for efficiency.

-- Single row INSERT
INSERT INTO users (name, email) VALUES ('Alice', 'alice@example.com');

-- Multiple rows in one INSERT
INSERT INTO users (name, email) VALUES 
  ('Bob', 'bob@example.com'),
  ('Charlie', 'charlie@example.com');

-- INSERT...SELECT copies from another table
INSERT INTO users_backup SELECT * FROM users WHERE status = 'inactive';

-- INSERT IGNORE: skip on constraint violation
INSERT IGNORE INTO users (id, name) VALUES (1, 'Duplicate');

-- REPLACE: delete & insert if key exists
REPLACE INTO users (id, name) VALUES (1, 'NewName');

Why it matters: Mastering INSERT variations allows efficient bulk operations and data migration. Understanding different INSERT modes prevents transaction issues in production.

Real applications: User registration uses single INSERT, data imports use bulk INSERT, archiving uses INSERT...SELECT, and deduplication uses REPLACE.

Common mistakes: Forgetting column list causing positional mismatch, not using multi-row INSERT for bulk data (much slower), or using REPLACE when UPDATE intended causing unnecessary deletions.

INSERT IGNORE skips rows that violate constraints like unique keys or NOT NULL, continuing with remaining rows silently. REPLACE deletes the existing row with conflicting key and inserts the new row, effectively replacing it. INSERT IGNORE preserves existing data; REPLACE overwrites it. Choose INSERT IGNORE to avoid duplicates; choose REPLACE when new data should override old data. Replacing can trigger cascading deletes on foreign keys.

-- INSERT IGNORE: Skip row on constraint violation
INSERT IGNORE INTO users (id, name) VALUES (1, 'NewName');
-- If id=1 exists, this row is skipped, existing data untouched

-- REPLACE: Delete old & insert new
REPLACE INTO users (id, name) VALUES (1, 'NewName');
-- If id=1 exists, delete it then insert new row

-- Practical example:
-- INSERT IGNORE for API uploads with potential duplicates
INSERT IGNORE INTO import_logs (id, status) VALUES (100, 'processed');

-- REPLACE for configuration updates where last-write-wins
REPLACE INTO config (setting, value) VALUES ('theme', 'dark');

Why it matters: Choosing the right insertion method prevents data loss or unwanted duplicates. This shows understanding of constraint handling in production scenarios.

Real applications: User imports use INSERT IGNORE to skip duplicates. Configuration tables use REPLACE for last-write-wins updates. Cache invalidation uses REPLACE.

Common mistakes: Using REPLACE unintentionally losing data, using INSERT IGNORE thinking it's like upsert, or not understanding REPLACE triggers cascade deletes on dependent tables.

SELECT retrieves data from tables using a specific execution order: FROM (tables), WHERE (filter rows), GROUP BY (aggregate), HAVING (filter groups), ORDER BY (sort), LIMIT (restrict results). Specific columns or * for all are selected, with optional aliases for readability. SELECT can aggregate data, join multiple tables, and perform calculations. The LIMIT clause is critical for preventing huge result sets in production.

-- Basic SELECT with filtering
SELECT name, email FROM users WHERE status = 'active';

-- SELECT with calculations
SELECT name, salary * 1.1 as salary_with_bonus FROM employees;

-- SELECT with sorting and limit
SELECT name FROM customers ORDER BY registration_date DESC LIMIT 10;

-- SELECT with aliases
SELECT u.name as customer_name, COUNT(o.id) as total_orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
HAVING COUNT(o.id) > 5
ORDER BY total_orders DESC;

Why it matters: SELECT is the most frequently used SQL statement. Query performance depends on proper WHERE clauses, appropriate joins, and appropriate LIMIT usage.

Real applications: Every dashboard query, report, and search feature uses SELECT. Pagination uses OFFSET LIMIT. Analytics use GROUP BY aggregations.

Common mistakes: SELECT * without LIMIT returning millions of rows, missing WHERE clauses filtering unnecessary data, or using SELECT without indexes on filter columns causing scan delays.

UPDATE modifies existing rows, with WHERE clause specifying which rows to change. Omitting WHERE clause updates all rows—a critical mistake. SET clause contains column=value pairs separated by commas. Multiple columns can be updated in one statement. UPDATE often has performance implications if no index on WHERE columns, potentially locking the entire table.

-- Update single column
UPDATE users SET status = 'inactive' WHERE id = 5;

-- Update multiple columns atomically
UPDATE orders 
SET status = 'shipped', shipped_date = NOW() 
WHERE id = 100;

-- Update based on calculation
UPDATE employees SET salary = salary * 1.05 WHERE department = 'Sales';

-- Update from another table
UPDATE users u
SET u.status = 'verified'
WHERE u.id IN (SELECT user_id FROM email_confirmations);

Why it matters: Proper WHERE usage prevents accidentally updating all records. Understanding UPDATE performance impact shows database design maturity.

Real applications: Status changes (order shipped), bulk updates (annual raises), verification marking (email confirmed), and field calculations (age increments) all use UPDATE.

Common mistakes: Forgetting WHERE clause updating entire table, using string comparison for numeric IDs, or updating without transaction context losing changes on rollback.

DELETE removes rows from a table. Like UPDATE, WHERE clause specifies which rows to delete. Omitting WHERE deletes all rows permanently. DELETE triggers foreign key constraints potentially cascading deletes to related tables. DELETE can be slow for large tables without proper indexes. Many production systems use soft deletes (status='deleted') instead of actual deletion for audit trails.

-- Delete specific row
DELETE FROM users WHERE id = 5;

-- Delete multiple rows with condition
DELETE FROM orders WHERE status = 'cancelled' AND amount < 10;

-- Delete all rows (use carefully!)
DELETE FROM temporary_data;

-- Delete with JOIN (cascade simulation)
DELETE u FROM users u
WHERE u.status = 'inactive' 
AND u.last_login < DATE_SUB(NOW(), INTERVAL 1 YEAR);

Why it matters: Understanding DELETE consequences prevents accidental data loss. Production environments often avoid hard deletes due to audit and recovery needs.

Real applications: Cleanup tasks delete expired sessions. Soft deletes mark records as deleted without actual removal. Archive operations move old data before deletion.

Common mistakes: Forgetting WHERE clause deleting entire tables, not understanding DELETE triggers cascade deletes, or deleting production data without backups causing permanent loss.

TRUNCATE removes all rows from a table but doesn't execute row triggers, operates faster as it deallocates pages rather than deleting rows, and cannot be filtered with WHERE. TRUNCATE resets AUTO_INCREMENT counters, while DELETE preserves them. DELETE is slower, row-by-row, triggering row triggers. Both should be used carefully in production, but TRUNCATE is faster for clearing entire tables (use in development only).

-- TRUNCATE: Fast removal of all rows, resets AUTO_INCREMENT
TRUNCATE TABLE sessions;
-- Next insert will have id = 1 again

-- DELETE: Slower row-by-row, preserves AUTO_INCREMENT
DELETE FROM sessions;
-- Next insert will continue from last id

-- Comparison in transaction context:
-- TRUNCATE cannot be rolled back in all MySQL versions reliably
-- DELETE can be rolled back within transactions

-- Practical: TRUNCATE for development, DELETE for production

Why it matters: Understanding TRUNCATE vs DELETE impacts production cleanup scripts and transaction safety. Misuse causes unexpected behavior in sequential operations.

Real applications: Test cleanup uses TRUNCATE. Production uses DELETE with WHERE for targeted removal. Cache tables use TRUNCATE on invalidation.

Common mistakes: Using TRUNCATE in transactions relying on rollback, using TRUNCATE without confirming intent, or using DELETE for entire table clearing when TRUNCATE is faster.

LIMIT restricts the number of rows returned by SELECT, essential for pagination and preventing huge result sets. LIMIT 10 returns first 10 rows, LIMIT 10 OFFSET 20 returns 10 rows starting from position 20 (skipping first 20). MySQL also supports LIMIT 10, 20 syntax (offset, count). Large OFFSET values are slow as MySQL must skip all previous rows; keyset pagination is more efficient for large datasets.

-- Get top 10 recent posts
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;

-- Pagination: Page 2 with 10 per page (OFFSET 10)
SELECT * FROM users ORDER BY id LIMIT 10 OFFSET 10;

-- Alternative syntax: LIMIT offset, count
SELECT * FROM users ORDER BY id LIMIT 10, 10;

-- Common pagination pattern
SET @page = 2;
SET @per_page = 20;
SELECT * FROM users 
ORDER BY id 
LIMIT (@page - 1) * @per_page, @per_page;

Why it matters: LIMIT prevents memory exhaustion and improves response times. Pagination is essential for user-facing interfaces handling large datasets.

Real applications: Search results show 10-50 items per page using LIMIT. APIs return paginated data. Reports limit output for readability.

Common mistakes: SELECT without LIMIT on large tables returning millions of rows, using large OFFSET causing performance degradation, or not using ORDER BY making LIMIT results unpredictable.

DISTINCT removes duplicate rows from result set, returning only unique combinations of selected columns. DISTINCT slows queries as it requires deduplication, so apply filters with WHERE first to minimize data processed. Counting distinct values uses COUNT(DISTINCT column). Multiple columns after DISTINCT consider combinations; for example, DISTINCT (name, city) returns unique name-city pairs.

-- Get unique customer states
SELECT DISTINCT state FROM customers;

-- Get unique product categories
SELECT DISTINCT category FROM products;

-- Count distinct emails
SELECT COUNT(DISTINCT email) as unique_customers FROM users;

-- Multiple columns: unique combinations
SELECT DISTINCT city, state FROM customers;
-- Returns one row per unique city-state combination

-- DISTINCT impacts performance
SELECT DISTINCT name FROM customers;  -- Slow on large tables
-- Better: Use GROUP BY or index for query optimization

Why it matters: DISTINCT is useful for data exploration but can cause performance issues. Understanding when DISTINCT is necessary vs using GROUP BY shows query optimization knowledge.

Real applications: Reporting unique visitors, finding all states where customers exist, counting unique email subscribers, generating distinct category lists.

Common mistakes: Using DISTINCT without proper indexing causing full scans, not filtering with WHERE before DISTINCT, or using DISTINCT on entire rows instead of specific columns unnecessarily.

Aggregate functions compute summary statistics across multiple rows: COUNT counts rows, SUM totals numeric values, AVG calculates average, MIN/MAX find minimum/maximum values. These functions return single values per group. NULL values are ignored in aggregates except COUNT(*) which counts all rows. GROUP BY combines rows into groups for per-group aggregation. Aggregate functions are essential for reporting and analytics.

-- Aggregate functions
SELECT COUNT(*) as total_users FROM users;
SELECT SUM(salary) as total_payroll FROM employees;
SELECT AVG(price) as average_product_price FROM products;
SELECT MIN(salary) as lowest_salary, MAX(salary) as highest_salary FROM employees;

-- COUNT(DISTINCT...) counts unique values
SELECT COUNT(DISTINCT city) as unique_cities FROM customers;

-- GROUP BY with aggregates
SELECT department, COUNT(*) as emp_count, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

Why it matters: Aggregate functions are foundational for reporting and analytics. Many interview problems involve correctly using aggregates with GROUP BY and HAVING.

Real applications: Dashboards show user counts, revenue totals, and average order values using aggregates. Analytics compute metrics per segment using GROUP BY aggregates.

Common mistakes: Using aggregate functions without GROUP BY or GROUP BY missing columns, forgetting NULLs are excluded from aggregates, or using wrong aggregate function ( AVG instead of SUM).

GROUP BY organizes rows into groups based on specified columns, enabling aggregate calculations per group. HAVING filters groups (like WHERE filters rows), operating after GROUP BY. All non-aggregated columns in SELECT must be in GROUP BY. HAVING conditions use aggregate functions, while WHERE conditions use column values. Execution order: WHERE (filter rows) → GROUP BY (create groups) → HAVING (filter groups) → SELECT.

-- GROUP BY with aggregates
SELECT department, COUNT(*) as emp_count, AVG(salary) as avg_salary
FROM employees
GROUP BY department;

-- HAVING to filter groups
SELECT city, COUNT(*) as cust_count
FROM customers
GROUP BY city
HAVING COUNT(*) > 5;  -- Only cities with more than 5 customers

-- Wrong: Non-aggregated column not in GROUP BY
-- SELECT department, salary, COUNT(*) FROM employees GROUP BY department;

-- Correct: All non-aggregated columns in GROUP BY
SELECT department, COUNT(*) FROM employees GROUP BY department;

Why it matters: GROUP BY with HAVING is essential for complex reporting. Interviewers frequently test understanding of GROUP BY constraints and HAVING vs WHERE.

Real applications: Sales reports breakdown by region, customer segmentation by purchase count, user demographics by age group all use GROUP BY with HAVING.

Common mistakes: Forgetting to include all non-aggregated columns in GROUP BY, using aggregate functions in WHERE instead of HAVING, or confusing HAVING filter order with WHERE.

ORDER BY sorts result set by specified columns in ascending (ASC, default) or descending (DESC) order. Multiple columns can be sorted with PRIMARY then SECONDARY sort keys. ORDER BY without LIMIT returns unsorted results with indexes. Sorting is expensive for large result sets, especially without indexes, so apply filters with WHERE first. Using a column alias in ORDER BY improves readability.

-- Sort by single column, ascending (default)
SELECT * FROM users ORDER BY registration_date;

-- Sort descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple columns: sort by department, then by salary
SELECT * FROM employees ORDER BY department, salary DESC;

-- Sort by calculated column
SELECT name, salary * 1.1 as updated_salary 
FROM employees 
ORDER BY updated_salary DESC;

-- Sort by column position (not recommended)
SELECT name, salary FROM employees ORDER BY 2 DESC;

Why it matters: Correct sorting is crucial for user-facing queries showing results in expected order. ORDER BY performance varies significantly with and without indexes.

Real applications: Recent posts sorted by date, products sorted by price, leaderboards sorted by score, search results sorted by relevance.

Common mistakes: ORDER BY on non-indexed columns causing full table scans, sorting text as numbers causing "10" before "2", or forgetting LIMIT with ORDER BY causing massive result processing.

Aliases provide alternative names for columns (column aliases with AS) and tables (table aliases for joins) for readability. Column aliases rename calculated fields or long expressions. Table aliases shorten table references in joins and enable self-joins. Aliases don't persist; they only exist for the query result. Aliases are optional but highly recommended for code clarity and maintenance.

-- Column alias
SELECT salary * 1.1 AS salary_with_bonus FROM employees;

-- Table alias for clarity
SELECT u.name, u.email FROM users AS u WHERE u.status = 'active';

-- Multiple table aliases
SELECT u.name, o.order_date, o.total
FROM users AS u
JOIN orders AS o ON u.id = o.user_id;

-- Alias for calculated fields
SELECT 
  name,
  YEAR(registration_date) as registration_year,
  MONTH(registration_date) as registration_month
FROM users;

Why it matters: Aliases greatly improve code readability and maintainability. Well-aliased queries are easier to understand and debug in complex scenarios.

Real applications: Complex joins use table aliases to disambiguate columns. Calculated fields use column aliases for clarity. Reports use meaningful alias names for export headers.

Common mistakes: Overusing unclear single-letter aliases like 'a', 'b', not aliasing calculated fields making output confusing, or forgetting alias context in complex joins causing ambiguity.

WHERE filters rows based on boolean conditions using operators: = (equals), <>, <, >, <=, >= for comparisons, AND/OR for logical combinations, IN for value lists, BETWEEN for ranges, and LIKE for pattern matching. Complex conditions use parentheses for precedence. WHERE is applied before GROUP BY, ensuring efficient filtering. Conditions on indexed columns are faster than non-indexed columns.

-- Comparison operators
WHERE age > 18
WHERE status = 'active'
WHERE salary <> 40000

-- Logical operators
WHERE status = 'active' AND age > 18
WHERE country = 'USA' OR country = 'Canada'
WHERE NOT status = 'deleted'

-- IN for value lists
WHERE status IN ('active', 'pending', 'verified')

-- BETWEEN for ranges
WHERE salary BETWEEN 40000 AND 60000

-- LIKE for patterns
WHERE name LIKE 'J%'  -- Starts with J
WHERE email LIKE '%@gmail.com'  -- Ends with @gmail.com

Why it matters: WHERE clause efficiency directly impacts query performance. Using indexed columns and efficient operators prevents full table scans in production.

Real applications: Filter active users, find orders in date range, search users by name pattern, locate customers by country all rely on proper WHERE clauses.

Common mistakes: Filter on calculated columns preventing index use, using LIKE '%value%' instead of starting position, or mixing ANDs and ORs without parentheses causing logic errors.

UPDATE...SELECT combines data retrieval with modification, updating rows based on results from SELECT query. This enables mass updates from related tables or calculated values. The SELECT can reference multiple tables via joins, apply filters, and use aggregate functions. UPDATE...SELECT is powerful for data synchronization and bulk corrections but can lock significant table ranges affecting concurrent operations.

-- Update from another table
UPDATE employees e
SET e.salary = e.salary * 1.05
WHERE e.department_id IN (SELECT id FROM departments WHERE name = 'Sales');

-- Update with JOIN
UPDATE orders o
JOIN customers c ON o.customer_id = c.id
SET o.customer_level = c.membership_level;

-- Bulk reset based on condition
UPDATE orders
SET status = 'expired'
WHERE order_date < DATE_SUB(NOW(), INTERVAL 30 DAY)
  AND status = 'pending';

Why it matters: UPDATE...SELECT enables sophisticated data management patterns. Understanding its implications for locking and performance shows production database experience.

Real applications: Employee salary updates from department budgets, inventory synchronization from orders, data corrections based on multiple criteria all use UPDATE...SELECT.

Common mistakes: Forgetting to add WHERE clause updating all rows unintentionally, not understanding update impacts row count returns, or using without transactions risking partial updates on error.

1What does CRUD stand for in SQL?

2Explain the INSERT statement and its variations.

3What is the difference between INSERT IGNORE and REPLACE?

4How does the SELECT statement work?

5Explain the UPDATE statement with WHERE clause.

6What is the DELETE statement?

7Explain TRUNCATE vs DELETE.

8What is the LIMIT clause and how is it used?

9Explain DISTINCT in SELECT queries.

10What are aggregate functions in SELECT?

11What is GROUP BY and HAVING?

12Explain ORDER BY and sorting.

13What are aliases and when should they be used?

14Explain WHERE clause conditions and operators.

15What is UPDATE...SELECT and how is it used?