Normalization is a database design technique that eliminates data redundancy and improves data integrity through logical organization. Normal forms (1NF through 5NF) guide design decisions. Benefits include reduced anomalies, consistency, easier maintenance, and efficient storage.
-- Un-normalized data: Redundancy and anomalies
CREATE TABLE employees_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
job_titles VARCHAR(255), -- Multiple jobs comma-separated
department_names VARCHAR(255), -- Multiple departments
project_assignments VARCHAR(500) -- Projects as text
);
-- Normalized design: Separate concerns into tables
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments(id)
);
CREATE TABLE job_titles (
id INT PRIMARY KEY,
employee_id INT,
title VARCHAR(100),
FOREIGN KEY (employee_id) REFERENCES employees(id)
);
CREATE TABLE project_assignments (
id INT PRIMARY KEY,
employee_id INT,
project_id INT,
FOREIGN KEY (employee_id) REFERENCES employees(id),
FOREIGN KEY (project_id) REFERENCES projects(id)
);
Why it matters: Normalization prevents data anomalies and maintains consistency.
Real applications: All databases benefit from proper normalization.
Common mistakes: Over-normalization causing excessive joins, under-normalization causing redundancy.
First Normal Form (1NF) eliminates repeating groups. Second Normal Form (2NF) removes partial dependencies. Third Normal Form (3NF) removes transitive dependencies. Most practical databases target 3NF, balancing normalization with query complexity.
-- First Normal Form (1NF): Eliminate repeating groups
-- Bad: Multiple values in one column
CREATE TABLE orders_1nf_bad (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100),
product_names VARCHAR(500) -- Multiple products: "Laptop, Mouse, Monitor"
);
-- Good: One value per field
CREATE TABLE orders_1nf (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);
CREATE TABLE order_items_1nf (
item_id INT PRIMARY KEY,
order_id INT,
product_name VARCHAR(100),
FOREIGN KEY (order_id) REFERENCES orders_1nf(order_id)
);
-- Second Normal Form (2NF): 1NF + Remove partial dependencies
-- Bad: Non-key attributes depend on part of composite key
CREATE TABLE student_courses_2nf_bad (
student_id INT,
course_id INT,
instructor_name VARCHAR(100), -- Depends on course_id, not student_id
PRIMARY KEY (student_id, course_id)
);
-- Good: Separate concerns
CREATE TABLE courses_2nf (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor_name VARCHAR(100)
);
CREATE TABLE student_courses_2nf (
student_id INT,
course_id INT,
grade CHAR(1),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (course_id) REFERENCES courses_2nf(course_id)
);
-- Third Normal Form (3NF): 2NF + Remove transitive dependencies
-- Bad: Non-key attribute depends on another non-key attribute
CREATE TABLE employees_3nf_bad (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
department_id INT,
department_name VARCHAR(100) -- Depends on department_id, not emp_id (transitive)
);
-- Good: Separate tables
CREATE TABLE departments_3nf (
department_id INT PRIMARY KEY,
department_name VARCHAR(100)
);
CREATE TABLE employees_3nf (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
department_id INT,
FOREIGN KEY (department_id) REFERENCES departments_3nf(department_id)
);
Why it matters: Understanding normal forms ensures logical database design.
Real applications: Most enterprise systems use 3NF, data warehouses use denormalized designs.
Common mistakes: Not understanding differences between forms, overusing high normal forms.
Denormalization intentionally duplicates data to improve query performance by reducing joins. Trade-offs include potential inconsistency, update complexity, and increased storage. Use denormalization strategically for read-heavy systems or after profiling shows join overhead.
-- Normalized: Multiple joins for summary data
SELECT o.order_id, o.order_date, c.customer_name,
SUM(oi.quantity) as total_items, SUM(oi.price) as total_value
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id;
-- Denormalized: Redundant data for performance
ALTER TABLE orders ADD COLUMN customer_name VARCHAR(100);
ALTER TABLE orders ADD COLUMN total_items INT;
ALTER TABLE orders ADD COLUMN total_value DECIMAL(10,2);
-- Now query is faster: Single table scan
SELECT order_id, order_date, customer_name, total_items, total_value
FROM orders;
-- But must maintain consistency with triggers
DELIMITER //
CREATE TRIGGER update_order_stats AFTER INSERT ON order_items
FOR EACH ROW
BEGIN
UPDATE orders SET total_items = total_items + NEW.quantity,
total_value = total_value + (NEW.quantity * NEW.price)
WHERE id = NEW.order_id;
END //
DELIMITER //
-- Denormalization strategies
// 1. Summary tables: Aggregate results stored separately
// 2. Redundant columns: Store frequently joined values
// 3. Materialized views: Pre-computed complex queries
// 4. Calculated fields: Store results of expensive calculations
Why it matters: Strategic denormalization significantly improves performance for reporting systems.
Real applications: Data warehouses, analytics systems, caching layers.
Common mistakes: Denormalizing without profiling, not maintaining consistency.
Boyce-Codd Normal Form (BCNF) strengthens 3NF handling edge cases. Fourth Normal Form (4NF) eliminates multivalued dependencies. Fifth Normal Form (5NF) handles join dependencies. Most practical systems use 3NF; higher forms are rare.
-- Boyce-Codd Normal Form (BCNF): Stricter than 3NF
-- Handles cases where non-key column determines key column
-- Practical impact: Rarely needed, 3NF usually sufficient
-- Fourth Normal Form (4NF): Independent multivalued dependencies
-- Bad: Mixing independent multivalued dependencies
CREATE TABLE course_teacher_book_4nf_bad (
course_id INT,
teacher_id INT,
book_id INT,
PRIMARY KEY (course_id, teacher_id, book_id)
);
-- Good: Separate tables for independent relationships
CREATE TABLE course_teacher_4nf (
course_id INT,
teacher_id INT,
PRIMARY KEY (course_id, teacher_id)
);
CREATE TABLE course_book_4nf (
course_id INT,
book_id INT,
PRIMARY KEY (course_id, book_id)
);
-- 5NF: Rarely used in practice
// Handles join dependencies not covered by earlier forms
// Most systems never reach 5NF normalization
-- Practical consideration:
// 1NF to 3NF: Essential, standard practice
// BCNF: Sometimes needed, edge cases
// 4NF, 5NF: Theoretical, rarely needed in practice
Why it matters: Understanding advanced normal forms handles edge cases in complex designs.
Real applications: Complex analytical systems sometimes need BCNF or 4NF.
Common mistakes: Over-normalizing beyond practical benefits, not understanding practical trade-offs.
Insertion anomaly: Can't insert data without parent record. Deletion anomaly: Removing a record loses unrelated information. Update anomaly: Changing one value requires updating multiple places. Normalization prevents all three.
-- Un-normalized table showing anomalies
CREATE TABLE student_courses_bad (
student_id INT PRIMARY KEY,
student_name VARCHAR(100),
course_id INT,
course_name VARCHAR(100),
instructor_name VARCHAR(100),
grade CHAR(1)
);
-- Insertion Anomaly: Can't add new course without student
-- We want to add a new course but have no student yet
-- Deletion Anomaly: Deleting student loses course info
-- If student 5 is deleted, we lose that course information
-- Update Anomaly: Changing instructor name requires multiple updates
-- If instructor changes, update affects multiple student records
-- Normalized design eliminates anomalies
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor_name VARCHAR(100)
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
grade CHAR(1),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
-- Now:
-- Insertion: Add new course without any student
INSERT INTO courses VALUES (101, 'Database Design', 'Dr. Smith');
-- Deletion: Delete a student, courses remain intact
DELETE FROM students WHERE student_id = 5;
-- Update: Change instructor once, affects all students
UPDATE courses SET instructor_name = 'Dr. Johnson' WHERE course_id = 101;
Why it matters: Understanding anomalies motivates normalization decisions.
Real applications: All databases benefit from avoiding these anomalies.
Common mistakes: Not recognizing anomalies in design, accepting them for "convenience".
Identifying issues involves checking for repeating groups (1NF), partial dependencies (2NF), and transitive dependencies (3NF). Fixing requires decomposing tables, creating new entities, and establishing relationships through foreign keys.
-- Identifying 1NF violation: Repeating groups
CREATE TABLE projects_bad (
project_id INT PRIMARY KEY,
project_name VARCHAR(100),
employees VARCHAR(500) -- Violation: Multiple values in one field
);
-- Fix: Create separate table
CREATE TABLE project_employees (
project_id INT,
employee_id INT,
FOREIGN KEY (project_id) REFERENCES projects(project_id),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
);
-- Identifying 2NF violation: Partial dependency in composite key
CREATE TABLE course_enrollment_bad (
student_id INT,
course_id INT,
instructor_name VARCHAR(100), -- Depends on course_id only, not student_id
PRIMARY KEY (student_id, course_id)
);
-- Fix: Separate into two tables
CREATE TABLE courses (
course_id INT PRIMARY KEY,
instructor_name VARCHAR(100)
);
CREATE TABLE enrollment (
student_id INT,
course_id INT,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
-- Identifying 3NF violation: Transitive dependency
CREATE TABLE employee_bad (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
dept_id INT,
dept_name VARCHAR(100) -- Depends on dept_id, not emp_id
);
-- Fix: Separate departments
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100)
);
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
Why it matters: Systematic identification and fixing prevents design flaws.
Real applications: Database design review and refactoring.
Common mistakes: Not following systematic approach, addressing issues piecemeal.
Normalized design usually improves INSERT/UPDATE/DELETE performance (less redundancy to maintain) but may slow SELECT queries (more joins). Balance is needed: normalize for correctness, denormalize strategically for read performance.
-- Normalized: Faster writes, slower reads
-- Many small tables, fewer updates needed
CREATE TABLE orders (id INT PRIMARY KEY, customer_id INT, total DECIMAL(10,2));
CREATE TABLE order_items (id INT PRIMARY KEY, order_id INT, product_id INT, quantity INT);
CREATE TABLE customers (id INT PRIMARY KEY, name VARCHAR(100));
-- Query requires multiple joins (slower)
SELECT o.id, c.name, oi.product_id, oi.quantity
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id;
-- Denormalized: Slower writes, faster reads
-- Single table with redundancy
CREATE TABLE orders_denorm (
id INT PRIMARY KEY,
customer_id INT,
customer_name VARCHAR(100), -- Redundant
total DECIMAL(10,2),
product_id INT,
quantity INT
);
-- Query is fast: Single table scan
SELECT * FROM orders_denorm WHERE customer_id = 1;
-- But updates are slower: Must maintain redundancy
UPDATE orders_denorm SET customer_name = 'John'
WHERE customer_id = 1; -- Must update all rows for customer
-- Performance optimization strategy:
// 1. Start normalized (3NF)
// 2. Profile query performance
// 3. Denormalize strategically where needed
// 4. Maintain consistency with triggers/procedures
Why it matters: Understanding trade-offs guides optimization decisions.
Real applications: OLTP (online transaction processing) uses more normalization; data warehouses use more denormalization.
Common mistakes: Assuming denormalization always improves performance, not measuring.
1NF violations include comma-separated values, JSON in single columns, array data types, and repeating groups. Fix by creating separate tables with proper relationships.
-- 1NF Violation 1: Comma-separated values
CREATE TABLE employees_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
phone_numbers VARCHAR(255) -- "555-1234, 555-5678, 555-9012"
);
-- Fix
CREATE TABLE phones (
id INT PRIMARY KEY AUTO_INCREMENT,
employee_id INT,
phone_number VARCHAR(20),
FOREIGN KEY (employee_id) REFERENCES employees(id)
);
-- 1NF Violation 2: JSON/nested data
CREATE TABLE users_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
addresses JSON -- {"street": "123 Main", "city": "NYC", "zip": "10001"}
);
-- Fix
CREATE TABLE addresses (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
street VARCHAR(100),
city VARCHAR(50),
zip VARCHAR(10),
FOREIGN KEY (user_id) REFERENCES users(id)
);
-- 1NF Violation 3: Repeating columns
CREATE TABLE students_bad (
id INT PRIMARY KEY,
name VARCHAR(100),
grade1 CHAR(1),
grade2 CHAR(1),
grade3 CHAR(1) -- Pattern repeats
);
-- Fix
CREATE TABLE grades (
id INT PRIMARY KEY AUTO_INCREMENT,
student_id INT,
course_number INT,
grade CHAR(1),
FOREIGN KEY (student_id) REFERENCES students(id)
);
Why it matters: Recognizing 1NF violations helps catch early design problems.
Real applications: Legacy systems often have 1NF violations requiring cleanup.
Common mistakes: Using JSON/arrays to avoid proper normalization, comma-separated storage.
2NF violations occur with partial dependencies in composite key tables. 3NF violations involve transitive dependencies where non-key attributes determine each other.
-- 2NF Violation: Partial dependency
CREATE TABLE enrollment_bad (
student_id INT,
course_id INT,
course_name VARCHAR(100), -- Depends on course_id, not (student_id, course_id)
instructor_name VARCHAR(100), -- Depends on course_id, not (student_id, course_id)
enrollment_date DATE, -- Depends on full composite key
PRIMARY KEY (student_id, course_id)
);
-- Fix: Separate tables
CREATE TABLE courses_2nf (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor_name VARCHAR(100)
);
CREATE TABLE enrollment_2nf (
student_id INT,
course_id INT,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (course_id) REFERENCES courses_2nf(course_id)
);
-- 3NF Violation: Transitive dependency
CREATE TABLE employee_bad (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
dept_id INT,
dept_name VARCHAR(100), -- Depends on dept_id, not emp_id
dept_manager VARCHAR(100) -- Depends on dept_id, not emp_id
);
-- Fix: Separate into tables
CREATE TABLE departments_3nf (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100),
dept_manager VARCHAR(100)
);
CREATE TABLE employees_3nf (
emp_id INT PRIMARY KEY,
emp_name VARCHAR(100),
dept_id INT,
FOREIGN KEY (dept_id) REFERENCES departments_3nf(dept_id)
);
Why it matters: Recognizing dependency violations ensures designs don't suffer anomalies.
Real applications: Systematic checklist prevents common design errors.
Common mistakes: Mixing concerns in single tables, not separating concerns properly.
Star schema uses centralized fact table surrounded by dimension tables (denormalized). Snowflake schema has normalized dimension tables (hierarchical). Star is simpler and faster, snowflake is more normalized and scalable.
-- Star Schema: Fact table surrounded by dimension tables
-- Denormalized dimension tables for fast queries
CREATE TABLE fact_sales (
sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
store_id INT,
date_id INT,
quantity INT,
sales_amount DECIMAL(10,2)
);
CREATE TABLE dim_customer (customer_id, name, city, zip);
CREATE TABLE dim_product (product_id, product_name, category, supplier);
CREATE TABLE dim_store (store_id, store_name, city, region);
CREATE TABLE dim_date (date_id, month, quarter, year);
-- Query is fast: Joins with dimension tables
SELECT dc.name, SUM(fs.sales_amount)
FROM fact_sales fs
JOIN dim_customer dc ON fs.customer_id = dc.customer_id
GROUP BY dc.name;
-- Snowflake Schema: Normalized dimension tables
-- More hierarchical structure
CREATE TABLE fact_sales_sf (
sale_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
store_id INT,
date_key INT,
quantity INT,
amount DECIMAL(10,2)
);
-- Normalized dimensions
CREATE TABLE dim_customer_sf (
customer_id INT PRIMARY KEY,
name VARCHAR(100),
city_id INT
);
CREATE TABLE dim_city_sf (
city_id INT PRIMARY KEY,
city_name VARCHAR(100),
region_id INT
);
CREATE TABLE dim_region_sf (
region_id INT PRIMARY KEY,
region_name VARCHAR(100)
);
-- Query requires more joins but uses less storage
SELECT DISTINCT name FROM fact_sales_sf
JOIN dim_customer_sf ON fact_sales_sf.customer_id = dim_customer_sf.customer_id;
Why it matters: Schema choice affects data warehouse performance and maintainability.
Real applications: BI systems, analytics platforms, data marts.
Common mistakes: Over-normalization in data warehouse (should use star), over-denormalization in OLTP.
Functional dependency means value of one attribute determines value of another (A → B). Understanding functional dependencies helps identify which normal form table violates. Systematic analysis prevents design errors.
-- Functional Dependency notation: A → B means A determines B
-- emp_id → emp_name: Each employee has one name
-- dept_id → dept_name: Each department has one name
-- (student_id, course_id) → grade: For each enrollment, one grade
-- Example: Find functional dependencies
CREATE TABLE student_enrollment (
student_id INT,
course_id INT,
semester INT,
grade CHAR(1),
professor_name VARCHAR(100)
);
-- Functional dependencies present:
// (student_id, course_id, semester) → grade (full key)
// course_id → professor_name (VIOLATION: 3NF)
// student_id → student_name (if existed)
-- This violates 3NF because professor_name depends only on course_id,
-- not on the full key
-- Fix: Separate tables
CREATE TABLE courses (
course_id INT PRIMARY KEY,
professor_name VARCHAR(100)
);
CREATE TABLE enrollment (
student_id INT,
course_id INT,
semester INT,
grade CHAR(1),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
-- Now functional dependencies are clean:
// (student_id, course_id, semester) → grade
// course_id → professor_name (in different table)
Why it matters: Functional dependency analysis provides systematic normalization approach.
Real applications: Formal design reviews, complex schema analysis.
Common mistakes: Not analyzing dependencies systematically, introducing implicit dependencies.
Primary keys uniquely identify records and form basis for dependencies. Foreign keys establish relationships and maintain referential integrity. Both are essential for normalization—they enforce relationships and prevent anomalies.
-- Primary key: Uniquely identifies records
CREATE TABLE employees (
emp_id INT PRIMARY KEY, -- Uniquely identifies each employee
name VARCHAR(100),
dept_id INT
);
-- Foreign key: Establishes relationships
CREATE TABLE departments (
dept_id INT PRIMARY KEY,
dept_name VARCHAR(100)
);
-- emp_id is primary key identifying employee
-- dept_id is primary key of departments table
-- dept_id in employees is foreign key linking to departments
ALTER TABLE employees
ADD FOREIGN KEY (dept_id) REFERENCES departments(dept_id);
-- Normalization enforced by keys:
// 1. Primary key ensures no duplicate records (1NF requirement)
// 2. Foreign key establishes proper relationships (reduces redundancy)
// 3. Keys prevent referential integrity violations
-- Composite primary key
CREATE TABLE course_student (
student_id INT,
course_id INT,
grade CHAR(1),
PRIMARY KEY (student_id, course_id)
);
-- Composite key:
// student_id identifies student
// course_id identifies course
// Together they uniquely identify enrollment record
-- Check key constraints
SHOW KEYS FROM employees;
SHOW CREATE TABLE employees;\G -- Shows all constraints
Why it matters: Keys are the foundation of normalization and data integrity.
Real applications: Every table needs appropriate keys for consistency.
Common mistakes: Missing foreign keys, unclear composite key purposes.
Schema design process involves gathering requirements, identifying entities, determining relationships, creating entity-relationship diagram (ER diagram), normalizing tables, and validating against anomalies.
-- Step 1: Identify entities
// Entities: Student, Course, Enrollment, Department, Teacher
-- Step 2: Identify attributes
// Student: student_id, name, email, phone
// Course: course_id, course_name, credits
// Enrollment: enrollment_id, grade, semester
// Department: dept_id, dept_name
// Teacher: teacher_id, teacher_name
-- Step 3: Determine relationships
// Student enrolls in Course (many-to-many)
// Student belongs to Department (many-to-one)
// Course taught by Teacher (many-to-one)
// Course belongs to Department (many-to-one)
-- Step 4: Create tables
CREATE TABLE departments (
dept_id INT PRIMARY KEY AUTO_INCREMENT,
dept_name VARCHAR(100) NOT NULL
);
CREATE TABLE students (
student_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(20),
dept_id INT NOT NULL,
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
CREATE TABLE teachers (
teacher_id INT PRIMARY KEY AUTO_INCREMENT,
teacher_name VARCHAR(100) NOT NULL,
dept_id INT NOT NULL,
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_name VARCHAR(100) NOT NULL,
credits INT NOT NULL,
teacher_id INT NOT NULL,
dept_id INT NOT NULL,
FOREIGN KEY (teacher_id) REFERENCES teachers(teacher_id),
FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
);
CREATE TABLE enrollments (
enrollment_id INT PRIMARY KEY AUTO_INCREMENT,
student_id INT NOT NULL,
course_id INT NOT NULL,
semester VARCHAR(10),
grade CHAR(1),
UNIQUE KEY unique_enrollment (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
-- Step 5: Verify normalization
// Check 1NF: No repeating groups ✓
// Check 2NF: No partial dependencies ✓
// Check 3NF: No transitive dependencies ✓
Why it matters: Systematic design produces robust, maintainable schemas.
Real applications: Every new database project follows this process.
Common mistakes: Skipping requirements gathering, rushing to code without design.
Common failures include insufficient normalization (update anomalies), over-normalization (excessive joins), poor key design, and lack of planning. Learning from mistakes improves future designs.
-- Failure 1: Mixing concerns in single table
-- Causes update anomalies, difficult to maintain
CREATE TABLE bad_design (
id INT PRIMARY KEY,
customer_name VARCHAR(100),
product_name VARCHAR(100), -- Multiple products per customer
supplier_name VARCHAR(100),
order_date DATE
);
-- Problem: To update supplier, must find all orders
// Solution: Use proper normalization
-- Failure 2: Over-normalization
-- Too many joins slow down queries
CREATE TABLE over_normalized (
id INT,
lookup_id INT,
FOREIGN KEY (lookup_id) REFERENCES lookups(id)
);
-- Problem: Even simple queries require many joins
// Solution: Balance normalization with performance
-- Failure 3: Missing foreign keys
-- Orphaned records, inconsistent data
CREATE TABLE no_fk (
order_id INT PRIMARY KEY,
customer_id INT -- No foreign key!
);
-- Problem: Can insert non-existent customer_id
// Solution: Add foreign key constraints
-- Failure 4: Poor naming conventions
-- Difficult to understand schema
CREATE TABLE t1 (
c1 INT, -- What is this?
c2 VARCHAR(100), -- What does it store?
c3 DATE -- Related to what?
);
-- Problem: Unmaintainable schema
// Solution: Use clear, consistent naming
-- Lessons learned checks:
// 1. Review requirements thoroughly
// 2. Use ER diagrams before coding
// 3. Test queries early
// 4. Involve DBA in design
// 5. Document decisions and assumptions
Why it matters: Learning from failures prevents repeating costly mistakes.
Real applications: Legacy systems often suffer from design decisions made years ago.
Common mistakes: Rushing to implementation, not learning from past projects.
Migration process involves analyzing current schema, designing normalized target, creating new tables, migrating data with transformations, implementing validation, and cutover. Data integrity and downtime minimization are critical.
-- Migration steps
-- Step 1: Analyze current (bad) structure
-- Table: employee_projects_bad (1NF violation)
-- Columns: emp_id, name, projects (comma-separated)
-- Step 2: Design normalized structure
CREATE TABLE employees_target (
emp_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE projects_target (
project_id INT PRIMARY KEY,
project_name VARCHAR(100)
);
CREATE TABLE employee_projects_target (
emp_id INT,
project_id INT,
PRIMARY KEY (emp_id, project_id),
FOREIGN KEY (emp_id) REFERENCES employees_target(emp_id),
FOREIGN KEY (project_id) REFERENCES projects_target(project_id)
);
-- Step 3: Migrate data with transformation
-- Extract employees
INSERT INTO employees_target (emp_id, name)
SELECT DISTINCT emp_id, name FROM employee_projects_bad;
-- Extract and normalize projects
-- Assuming projects stored as "proj1;proj2;proj3"
-- This would need application-level parsing or stored procedure
-- Step 4: Validate data integrity
SELECT COUNT(*) FROM employees_target;
SELECT COUNT(*) FROM projects_target;
SELECT COUNT(*) FROM employee_projects_target;
-- Step 5: Verify no data loss
-- Check original and new row counts align
-- Step 6: Decommission old table
-- DROP TABLE employee_projects_bad; -- After validation
-- Cutover strategy: Minimal downtime migration
// 1. Create new schema in parallel
// 2. Set up replication/sync process
// 3. Validate continuously during sync
// 4. Switch connections to new schema
// 5. Monitor for issues
// 6. Remove old schema after validation period
Why it matters: Safe migration preserves data while improving design.
Real applications: Legacy system modernization, database refactoring.
Common mistakes: Not testing migration thoroughly, inadequate data validation, too-tight cutover window.