Skip to content

Latest commit

 

History

History
228 lines (189 loc) · 5.51 KB

File metadata and controls

228 lines (189 loc) · 5.51 KB

Database Normalization

What is Normalization?

Database normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves dividing large tables into smaller, related tables and defining relationships between them.

Why Normalize?

  • Eliminate Data Redundancy: Reduces duplicate data storage
  • Improve Data Integrity: Prevents inconsistent data
  • Reduce Storage Space: More efficient data storage
  • Easier Maintenance: Updates in one place affect the entire database

Normal Forms

Unnormalized Form (0NF)

Data stored in a single table with repeating groups and no structure.

Problems:

  • Data redundancy
  • Update anomalies
  • Insert anomalies
  • Delete anomalies

First Normal Form (1NF)

Rules:

  • Each column contains atomic (indivisible) values
  • No repeating groups
  • Each row is unique

Second Normal Form (2NF)

Rules:

  • Must be in 1NF
  • All non-key attributes must be fully dependent on the primary key
  • Eliminates partial dependencies

Third Normal Form (3NF)

Rules:

  • Must be in 2NF
  • No transitive dependencies
  • Non-key attributes should not depend on other non-key attributes

Practical Example: Student Database

Step 1: Create Database

CREATE DATABASE students;
USE students;

Step 2: Unnormalized Data (0NF)

-- Unnormalized Data
CREATE TABLE students_raw(
    student_id INT,
    student_name VARCHAR(30),
    course1 VARCHAR(50),
    course2 VARCHAR(50),
    course3 VARCHAR(50)
);

INSERT INTO students_raw (student_id, student_name, course1, course2, course3)
VALUES
(1, "rana", "java", "python", "javascript"),
(2, "buddha", "java", "c", "jython"),
(3, "rahul", "java", "c", "jython");

SELECT * FROM students_raw;

Problems with Unnormalized Form:

  • Repeating groups (course1, course2, course3)
  • Wasted space if student takes fewer courses
  • Difficult to add more courses

Step 3: First Normal Form (1NF)

-- 1NF: Remove repeating groups
CREATE TABLE student_1nf (
    student_id INT,
    student_name VARCHAR(30),
    course VARCHAR(30)
);

INSERT INTO student_1nf (student_id, student_name, course) VALUES
(1, 'rana', 'java'),
(1, 'rana', 'python'),
(1, 'rana', 'javascript'),
(2, 'buddha', 'java'),
(2, 'buddha', 'c'),
(2, 'buddha', 'jython'),
(3, 'rahul', 'java'),
(3, 'rahul', 'c'),
(3, 'rahul', 'jython');

SELECT * FROM student_1nf;

Improvements:

  • Atomic values in each column
  • No repeating groups
  • Flexible number of courses per student

Remaining Problems:

  • Data redundancy (student names repeated)
  • Update anomalies

Step 4: Second Normal Form (2NF)

-- 2NF: Remove partial dependencies
CREATE TABLE students(
    student_id INT PRIMARY KEY,
    student_name VARCHAR(30)
);

CREATE TABLE students_course(
    student_id INT,
    course VARCHAR(30),
    PRIMARY KEY (student_id, course),
    FOREIGN KEY (student_id) REFERENCES students(student_id)
);

INSERT INTO students (student_id, student_name) VALUES
(1, 'rana'),
(2, 'buddha'),
(3, 'rahul');

INSERT INTO students_course (student_id, course) VALUES
(1, 'java'),
(1, 'python'),
(1, 'javascript'),
(2, 'java'),
(2, 'c'),
(2, 'jython'),
(3, 'java'),
(3, 'c'),
(3, 'jython');

SELECT * FROM students;
SELECT * FROM students_course;

Improvements:

  • Eliminated partial dependencies
  • Student names stored only once
  • Separate tables for different entities

Step 5: Third Normal Form (3NF)

-- 3NF: Remove transitive dependencies
CREATE TABLE course(
    course_id INT PRIMARY KEY,	
    course_name VARCHAR(30),
    instructor VARCHAR(30)
);

-- Recreate students_course with course_id reference
DROP TABLE students_course;

CREATE TABLE students_course(
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES course(course_id)
);

INSERT INTO course (course_id, course_name, instructor) VALUES
(101, 'java', 'Navin Sir'),
(102, 'python', 'Hitesh Sir'),
(103, 'javascript', 'Pyuish Sir'),
(104, 'c', 'Sourabh Sir'),
(105, 'jython', 'Avishek Sir');

INSERT INTO students_course (student_id, course_id) VALUES 
(1, 101),
(1, 102),
(1, 103),
(2, 101),
(2, 104),
(2, 105),
(3, 101),
(3, 104),
(3, 105);

SELECT * FROM students;
SELECT * FROM course;
SELECT * FROM students_course;

Step 6: Query Normalized Data

-- Join all tables to get complete information
SELECT s.student_id, s.student_name, c.course_id, c.course_name, c.instructor
FROM students s
INNER JOIN students_course sc ON s.student_id = sc.student_id
INNER JOIN course c ON sc.course_id = c.course_id;

Final Database Structure (3NF)

Table: students

  • student_id (Primary Key)
  • student_name

Table: course

  • course_id (Primary Key)
  • course_name
  • instructor

Table: students_course (Junction Table)

  • student_id (Foreign Key)
  • course_id (Foreign Key)
  • Primary Key: (student_id, course_id)

Benefits Achieved

No Data Redundancy: Each piece of information stored once
Data Integrity: Consistent data across tables
Flexibility: Easy to add new students, courses, or instructors
Maintainability: Updates in one place reflect everywhere
Storage Efficiency: Minimal wasted space

Key Takeaways

  • Normalization is a step-by-step process
  • Each normal form builds upon the previous one
  • 3NF is usually sufficient for most applications
  • Higher normal forms exist (BCNF, 4NF, 5NF) for complex scenarios
  • Balance normalization with query performance needs