Skip to content

Latest commit

Β 

History

History
120 lines (77 loc) Β· 6.04 KB

File metadata and controls

120 lines (77 loc) Β· 6.04 KB

πŸ”— Topic 1.4: Introduction to RDBMS (Relational Database Management System)

In 1970, a legendary paper titled "A Relational Model of Data for Large Shared Data Banks" by IBM researcher Dr. Edgar F. Codd changed computer science forever. Let's look exactly at the mathematical model that runs the massive RDBMS engines of today.


1. Definition

An RDBMS (Relational Database Management System) is a database engine strictly compliant with Codd's Relational Model.

Technical Definition: Data is represented strictly as mathematical Relations (2D Tables/Grids of rows and columns). Operations on this data rely heavily on Relational Calculus and Relational Algebra, executed via SQL, allowing dynamic linking (Joins) between different relations using matching identifiers (Keys).

(MySQL, PostgreSQL, Oracle, and MS SQL Server are all RDBMS engines.)


2. Why This Concept Exists

Before the RDBMS, early databases (Network and Hierarchical models) required developers to write specific C or Assembly code dictating the exact physical path on the hard drive to find related data. If the server administrator moved a file, the entire application's code broke.

Dr. Codd realized he could separate the "Physical layer" from the "Logical layer." You just specify what data you want conceptually (e.g., User ID 10), and the RDBMS's internal optimizer automatically calculates the physical magnetic disk sectors to search.


3. Why We Use It

  1. Data Independence: The application code is completely insulated from physical storage changes, memory management, or indexing strategies.
  2. Referential Integrity Constraints: The RDBMS guarantees mathematically pure data. If an Order relies on a User, it physically blocks anyone from deleting that User unless the Order is handled first (ON DELETE RESTRICT or CASCADE).
  3. Normalization: By breaking data down into specific relational entities (1st, 2nd, and 3rd Normal Forms), we entirely eliminate data redundancy, ensuring updates only process in one place, saving massive I/O overhead.
  4. ACID Transactions: Atomicity, Consistency, Isolation, and Durability. The bedrock of financial data integrity.

4. When to Use It

  • The Industry Standard: If the domains of your logic strongly overlap (e.g., A University has Classes, Professors, Students, and Grades), the relational math of an RDBMS will handle this complex web of queries exponentially better than anything else.
  • Exception: Do not use it for strictly unstructured BLOB logs or ultra-massive time-series sensor ingestion where absolute schema enforcement becomes a bottleneck.

5. How It Works

An RDBMS is built completely around SET THEORY mathematics:

  1. Relation (Table): A set of tuples showcasing an entity.
  2. Tuple (Row/Record): A single ordered list of elements.
  3. Attribute (Column/Field): The domain of allowed values.

The Engine: When you send an SQL Command, the RDBMS uses a Cost-Based Optimizer. It analyzes different mathematical permutations (Should I scan Table A then Table B? Or Table B then A?) and picks the strategy requiring the lowest CPU/Disk I/O cost to merge them.


6. Syntax / Implementation (The Core)

Here is the exact implementation of Referential Integrity.

Table 1: User_Table (Parent)

ID (Primary Key)| Name   | City
1               | John   | New York

Table 2: Car_Table (Child)

Car_ID | Brand  | Owner_ID (Foreign Key)
99     | Toyota | 1

The FOREIGN KEY physically chains them. A hacker cannot insert a car assigned to Owner_ID = 9 because user 9 does not exist. The RDBMS Engine throws an immediate algorithmic exception.


7. Real-Life Examples

The Modern HR System:

  • You don't type a Department's "Full Address" inside the Employee table. If the company moves headquarters, you'd have to physically update 10,000 employee rows (Massive Disk I/O cost, risky locks).
  • In an RDBMS, you put a basic Dept_ID in the Employee table, and maintain one single row in the Departments table. If the company moves, you update exactly 1 single cell. All 10,000 employees instantly automatically query the new address via JOIN.

8. SQL Examples (MySQL)

This is the SQL implementation of Codd's rule that data must be strictly defined and relationally bound:

-- Step 1: Create the Parent Relation
CREATE TABLE Users (
    User_ID INT PRIMARY KEY,
    Name VARCHAR(50) NOT NULL
);

-- Step 2: Create the Child Relation with strict CASCADE behavior
CREATE TABLE Orders (
    Order_ID INT PRIMARY KEY,
    User_ID INT,
    FOREIGN KEY (User_ID) REFERENCES Users(User_ID) 
    ON DELETE CASCADE  -- Pro trick: If User 1 is deleted, the RDBMS deletes their orders automatically!
);

9. Common Mistakes

  • Denormalizing Too Early: Beginners put everything into one flat table, or separate tables but get scared of JOIN commands and revert back. A well-indexed JOIN of properly normalized tables in an RDBMS is lightning fast; trust the Cost-Based Optimizer.
  • Ignoring Database Logic: Moving logic out of the RDBMS into Application code (Python). Trying to manually loop through lists in Python to match records instead of using a JOIN. The C++ engine written for MySQL is 1000x faster than your Python loop. Let the database do its job.

10. Tips & Best Practices

  • Entity-Relationship (ER) Diagramming: The mark of a senior engineer is drawing ER diagrams (identifying 1-to-1, 1-to-Many, Many-to-Many relationships) before a single CREATE TABLE is typed.
  • Indexing Foreign Keys: MySQL automatically indexes Primary Keys, but not all RDBMS systems automatically index Foreign keys. Always double-check and index Foreign Keys manually to drastically speed up your JOINS.

11. Mini Practice Tasks

  • Task 1: Look up "Codd's 12 Rules for Relational Databases". Keep in mind the strict academic foundation.
  • Task 2: Write down what the ON DELETE CASCADE constraint does in the SQL example. Why is that incredibly useful for enterprise software safety?