Database normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them.
- Eliminate Data Redundancy: Reduces duplicate data storage
- Improve Data Integrity: Prevents inconsistent data
- Reduce Storage Space: More efficient data storage
- Easier Maintenance: Updates in one place affect the entire database
Data stored in a single table with repeating groups and no structure.
Problems:
- Data redundancy
- Update anomalies
- Insert anomalies
- Delete anomalies
Rules:
- Each column contains atomic (indivisible) values
- No repeating groups
- Each row is unique
Rules:
- Must be in 1NF
- All non-key attributes must be fully dependent on the primary key
- Eliminates partial dependencies
Rules:
- Must be in 2NF
- No transitive dependencies
- Non-key attributes should not depend on other non-key attributes
CREATE DATABASE students;
USE students;-- Unnormalized Data
CREATE TABLE students_raw(
student_id INT,
student_name VARCHAR(30),
course1 VARCHAR(50),
course2 VARCHAR(50),
course3 VARCHAR(50)
);
INSERT INTO students_raw (student_id, student_name, course1, course2, course3)
VALUES
(1, "rana", "java", "python", "javascript"),
(2, "buddha", "java", "c", "jython"),
(3, "rahul", "java", "c", "jython");
SELECT * FROM students_raw;Problems with Unnormalized Form:
- Repeating groups (course1, course2, course3)
- Wasted space if student takes fewer courses
- Difficult to add more courses
-- 1NF: Remove repeating groups
CREATE TABLE student_1nf (
student_id INT,
student_name VARCHAR(30),
course VARCHAR(30)
);
INSERT INTO student_1nf (student_id, student_name, course) VALUES
(1, 'rana', 'java'),
(1, 'rana', 'python'),
(1, 'rana', 'javascript'),
(2, 'buddha', 'java'),
(2, 'buddha', 'c'),
(2, 'buddha', 'jython'),
(3, 'rahul', 'java'),
(3, 'rahul', 'c'),
(3, 'rahul', 'jython');
SELECT * FROM student_1nf;Improvements:
- Atomic values in each column
- No repeating groups
- Flexible number of courses per student
Remaining Problems:
- Data redundancy (student names repeated)
- Update anomalies
-- 2NF: Remove partial dependencies
CREATE TABLE students(
student_id INT PRIMARY KEY,
student_name VARCHAR(30)
);
CREATE TABLE students_course(
student_id INT,
course VARCHAR(30),
PRIMARY KEY (student_id, course),
FOREIGN KEY (student_id) REFERENCES students(student_id)
);
INSERT INTO students (student_id, student_name) VALUES
(1, 'rana'),
(2, 'buddha'),
(3, 'rahul');
INSERT INTO students_course (student_id, course) VALUES
(1, 'java'),
(1, 'python'),
(1, 'javascript'),
(2, 'java'),
(2, 'c'),
(2, 'jython'),
(3, 'java'),
(3, 'c'),
(3, 'jython');
SELECT * FROM students;
SELECT * FROM students_course;Improvements:
- Eliminated partial dependencies
- Student names stored only once
- Separate tables for different entities
-- 3NF: Remove transitive dependencies
CREATE TABLE course(
course_id INT PRIMARY KEY,
course_name VARCHAR(30),
instructor VARCHAR(30)
);
-- Recreate students_course with course_id reference
DROP TABLE students_course;
CREATE TABLE students_course(
student_id INT,
course_id INT,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES course(course_id)
);
INSERT INTO course (course_id, course_name, instructor) VALUES
(101, 'java', 'Navin Sir'),
(102, 'python', 'Hitesh Sir'),
(103, 'javascript', 'Pyuish Sir'),
(104, 'c', 'Sourabh Sir'),
(105, 'jython', 'Avishek Sir');
INSERT INTO students_course (student_id, course_id) VALUES
(1, 101),
(1, 102),
(1, 103),
(2, 101),
(2, 104),
(2, 105),
(3, 101),
(3, 104),
(3, 105);
SELECT * FROM students;
SELECT * FROM course;
SELECT * FROM students_course;-- Join all tables to get complete information
SELECT s.student_id, s.student_name, c.course_id, c.course_name, c.instructor
FROM students s
INNER JOIN students_course sc ON s.student_id = sc.student_id
INNER JOIN course c ON sc.course_id = c.course_id;student_id(Primary Key)student_name
course_id(Primary Key)course_nameinstructor
student_id(Foreign Key)course_id(Foreign Key)- Primary Key: (
student_id,course_id)
✅ No Data Redundancy: Each piece of information stored once
✅ Data Integrity: Consistent data across tables
✅ Flexibility: Easy to add new students, courses, or instructors
✅ Maintainability: Updates in one place reflect everywhere
✅ Storage Efficiency: Minimal wasted space
- Normalization is a step-by-step process
- Each normal form builds upon the previous one
- 3NF is usually sufficient for most applications
- Higher normal forms exist (BCNF, 4NF, 5NF) for complex scenarios
- Balance normalization with query performance needs