"SQL is the lingua franca of data. If you speak it fluently, every database in the world becomes your playground."
Welcome. Whether you're preparing for your first data analyst role or a senior backend engineering position at a FAANG company, this guide is designed to be the only SQL resource you need. We go beyond surface-level definitions — you'll find ASCII diagrams, real-world analogies, production-grade best practices, and 115 battle-tested Q&As.
What Makes This Guide Different?
- 🔍 Visual Learning: Every JOIN, index, and transaction is explained with ASCII art.
- 🏗️ Analogies: Complex concepts explained using everyday things (libraries, bank transfers, Venn diagrams).
- ⚡ Advanced Topics: Window functions, CTEs, execution plans, query optimization, and isolation levels with dirty-read examples.
- 🎯 Interview-First: Every question is something that has actually been asked in real interviews.
Assumption: Examples use MySQL/PostgreSQL syntax unless noted. Differences between MySQL, PostgreSQL, SQL Server, and Oracle are highlighted where relevant.
- Section 1: SQL Fundamentals
- Section 2: Querying Data Like a Pro
- Section 3: JOINs — The Heart of Relational Databases
- Section 4: Subqueries, CTEs & Window Functions
- Section 5: Data Definition & Manipulation (DDL, DML, DCL)
- Section 6: Indexes, Privileges & Security
- Section 7: Transactions, ACID & Concurrency
- Section 8: Views, Stored Procedures, Triggers & Cursors
- Section 9: Query Optimization & Execution Plans
- Section 10: Database Design & Normalization
SQL (Structured Query Language) is a declarative language — you describe what you want, and the database engine figures out how to get it. Think of it like ordering at a restaurant: you say "Bring me a medium-rare steak," not "Go to the fridge, take out the meat, heat the pan to 200°C..."
┌─────────────────────────────────────────────────────────────┐
│ IMPERATIVE (C, Python) DECLARATIVE (SQL) │
│ │
│ for each row: SELECT name, salary │
│ if salary > 50000: FROM employees │
│ print(name) WHERE salary > 50000; │
└─────────────────────────────────────────────────────────────┘
SQL was born at IBM in the 1970s (originally called SEQUEL) and is now an ANSI/ISO standard. However, every database vendor adds their own spice:
| Feature | MySQL | PostgreSQL | SQL Server | Oracle |
|---|---|---|---|---|
| Limit rows | LIMIT 10 |
LIMIT 10 |
TOP 10 |
FETCH FIRST 10 |
| Boolean | TINYINT(1) |
BOOLEAN |
BIT |
NUMBER(1) |
| Current date | CURDATE() |
CURRENT_DATE |
GETDATE() |
SYSDATE |
| String concat | CONCAT() |
|| |
+ |
|| |
Imagine a company. There are people who build the office (DDL), people who work in it (DML), people who ask questions (DQL), security guards who control access (DCL), and managers who approve or cancel decisions (TCL).
┌─────────────────────────────────────────────────────────────┐
│ SQL COMMAND FAMILIES │
├──────────┬──────────────────────────────────────────────────┤
│ DDL │ Data Definition Language │
│ │ CREATE, ALTER, DROP, TRUNCATE, RENAME │
│ │ → "Builds the structure" │
├──────────┼──────────────────────────────────────────────────┤
│ DML │ Data Manipulation Language │
│ │ INSERT, UPDATE, DELETE │
│ │ → "Changes the data" │
├──────────┼──────────────────────────────────────────────────┤
│ DQL │ Data Query Language │
│ │ SELECT │
│ │ → "Reads the data" │
├──────────┼──────────────────────────────────────────────────┤
│ DCL │ Data Control Language │
│ │ GRANT, REVOKE │
│ │ → "Controls permissions" │
├──────────┼──────────────────────────────────────────────────┤
│ TCL │ Transaction Control Language │
│ │ COMMIT, ROLLBACK, SAVEPOINT │
│ │ → "Manages transactions" │
└──────────┴──────────────────────────────────────────────────┘
Choosing the right data type is like choosing the right container — you don't store a sip of water in a swimming pool, and you don't store an ocean in a teacup.
| Category | Types | When to Use |
|---|---|---|
| Integer | TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT |
Whole numbers. Use the smallest that fits your range. |
| Decimal | FLOAT, DOUBLE, DECIMAL(p,s) |
Money = always DECIMAL. Never use FLOAT for currency! |
| String | CHAR(n), VARCHAR(n), TEXT |
CHAR for fixed length (e.g., country codes). VARCHAR for variable. TEXT for essays. |
| Date/Time | DATE, TIME, DATETIME, TIMESTAMP |
TIMESTAMP is UTC-aware and smaller. DATETIME for historical dates. |
| Binary | BLOB, BINARY, VARBINARY |
Images, files, encrypted data. |
| JSON | JSON (MySQL 5.7+), JSONB (PostgreSQL) |
Semi-structured data, config, logs. PostgreSQL's JSONB is indexed & faster. |
The Money Rule:
-- ❌ BAD: Float precision errors (0.1 + 0.2 ≠ 0.3)
CREATE TABLE bad_products (price FLOAT);
-- ✅ GOOD: Exact precision
CREATE TABLE good_products (price DECIMAL(10,2)); -- 99999999.99 max- 🎯 Never use
SELECT *in production. It breaks when columns are added, transfers unnecessary data, and prevents covering indexes. - 🎯 Always specify columns in INSERT. It makes your code robust against schema changes.
- 🎯 Use
COALESCE()instead ofIFNULL()— it's ANSI standard and works across databases. - 🎯 Be explicit about NULLs.
NULLis not0, not"", notFALSE. It is the "unknown."
1. SQL was developed as an integral part of...
Relational Database Management Systems (RDBMS). SQL is the ANSI-standard language for creating, querying, updating, and managing relational databases. Major implementations include MySQL, PostgreSQL, SQL Server, Oracle, and SQLite.
Real-world context: Every web application you use — Instagram, Netflix, Uber — stores its data in SQL databases (often sharded or replicated, but SQL at the core).
2. What does SQL stand for?
Structured Query Language. Originally called SEQUEL at IBM in the 1970s.
3. What type of language is SQL?
Declarative (non-procedural). You state what result you want, not how to compute it. The database's query optimizer decides the "how" — which indexes to use, which join order is fastest, etc.
Compare with PL/SQL (Oracle) or T-SQL (SQL Server), which are procedural extensions that add loops, variables, and conditional logic.
4. Which is NOT a valid SQL command: SELECT, REMOVE, UPDATE, INSERT?
REMOVE is not valid. To delete rows, use
DELETE.DROPremoves objects (tables, databases).
Command Purpose SELECT Read data INSERT Add new rows UPDATE Modify existing rows DELETE Remove rows CREATE/ALTER/DROP Manage schema objects
5. What is MySQL, and how does it differ from SQL?
SQL is the language (like English). MySQL is a database system that speaks SQL (like a person who speaks English).
Analogy: SQL is the recipe; MySQL is the kitchen that follows it.
6. What is PL/SQL, and how is it different from SQL?
PL/SQL = Procedural Language extensions to SQL (Oracle-specific). It adds variables, loops, IF-ELSE, and exception handling.
SQL PL/SQL Single statements Blocks of code No variables Variables & constants No loops FOR, WHILE loops Data-oriented Application-oriented Can embed in PL/SQL Cannot embed SQL inside itself
7. What are the possible values for a BOOLEAN field in MySQL?
MySQL doesn't have a native
BOOLEANtype. It usesTINYINT(1)where0= false,1= true, and any non-zero value is treated as true in conditions.CREATE TABLE users (is_active TINYINT(1) DEFAULT 1);PostgreSQL, however, has a true
BOOLEANtype acceptingTRUE,FALSE, andNULL.
8. Which data type for distance rounded to the nearest mile?
INTEGER(orINT). If you need fractional miles later, useDECIMAL(8,2).
9. Which are valid SQL keywords?
SELECT,FROM,WHERE,GROUP BY,HAVING,ORDER BY,JOIN,INSERT,UPDATE,DELETE,CREATE,ALTER,DROP,INDEX,UNION,LIMIT,DISTINCT,AS,AND,OR,NOT,IN,BETWEEN,LIKE,IS NULL,EXISTS,CASE,WHEN,THEN,ELSE,END.
10. Which are valid SQL comments?
-- Single-line comment (standard) /* Multi-line comment */ # Hash comment (MySQL only)
When you run a SELECT, the database doesn't execute clauses in the order you write them. Understanding the logical execution order is crucial — it explains why you can't reference an alias in WHERE but can in ORDER BY.
┌──────────────────────────────────────────────────────────────────┐
│ LOGICAL EXECUTION ORDER OF SELECT │
├────────┬─────────────────────────────────────────────────────────┤
│ 1 │ FROM + JOINs → Determine source tables │
│ 2 │ WHERE → Filter rows │
│ 3 │ GROUP BY → Aggregate into buckets │
│ 4 │ HAVING → Filter groups │
│ 5 │ SELECT → Compute columns & aliases │
│ 6 │ DISTINCT → Remove duplicates │
│ 7 │ ORDER BY → Sort results │
│ 8 │ LIMIT / OFFSET → Paginate │
└────────┴─────────────────────────────────────────────────────────┘
This is why this fails:
-- ❌ ERROR: alias "full_name" doesn't exist yet at step 2 (WHERE)
SELECT first_name || ' ' || last_name AS full_name
FROM employees
WHERE full_name = 'John Doe';
-- ✅ FIX: Repeat the expression or use a subquery/CTE
SELECT first_name || ' ' || last_name AS full_name
FROM employees
WHERE first_name || ' ' || last_name = 'John Doe';-- Comparison operators
WHERE salary > 50000
WHERE age BETWEEN 18 AND 65 -- Inclusive on both ends
WHERE department IN ('Engineering', 'Product', 'Design')
WHERE name LIKE 'A%' -- Starts with A
WHERE name LIKE '%son' -- Ends with son
WHERE name LIKE '_a%' -- Second letter is 'a'
WHERE name LIKE '%\_%' ESCAPE '\' -- Contains literal underscore
WHERE email IS NULL -- Missing email (NOT = NULL!)
WHERE email IS NOT NULLThe NULL Trap:
-- 🚨 WRONG: This returns NOTHING, not even rows where manager_id IS NULL
SELECT * FROM employees WHERE manager_id = NULL;
-- ✅ CORRECT:
SELECT * FROM employees WHERE manager_id IS NULL;
-- COALESCE gives you a fallback value:
SELECT name, COALESCE(phone, 'No phone') AS contact FROM users;
-- IFNULL is MySQL-specific; COALESCE is standard and accepts multiple args:
SELECT COALESCE(mobile, home_phone, work_phone, 'No phone') FROM contacts;-- Multi-column sort
SELECT * FROM products
ORDER BY category ASC, price DESC;
-- Pagination (Page 3, 20 items per page)
-- MySQL / PostgreSQL
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 40;
-- SQL Server
SELECT * FROM products ORDER BY id OFFSET 40 ROWS FETCH NEXT 20 ROWS ONLY;
-- Offset gets slower as page number grows!
-- For large tables, use keyset pagination:
SELECT * FROM products WHERE id > 1000 ORDER BY id LIMIT 20;Aggregate functions collapse multiple rows into one summary value:
| Function | Purpose | NULL Handling |
|---|---|---|
COUNT(*) |
Count all rows | Counts NULLs too |
COUNT(col) |
Count non-NULL values | Ignores NULLs |
SUM(col) |
Total | Ignores NULLs |
AVG(col) |
Average | Ignores NULLs |
MIN(col) / MAX(col) |
Smallest / Largest | Ignores NULLs |
GROUP_CONCAT() / STRING_AGG() |
Concatenate strings | Varies by DB |
-- Find departments with average salary > 60k, sorted
SELECT
department,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
MAX(salary) AS top_salary
FROM employees
WHERE hire_date >= '2020-01-01' -- Filter ROWS first
GROUP BY department
HAVING AVG(salary) > 60000 -- Filter GROUPS after aggregation
ORDER BY avg_salary DESC;WHERE vs HAVING — The Critical Difference:
┌─────────────────────────────────────────────────────────────────┐
│ WHERE filters ROWS (before grouping) │
│ HAVING filters GROUPS (after aggregation) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Raw Rows → [WHERE] → Filtered Rows → [GROUP BY] → Groups │
│ ↓ │
│ Result ← [SELECT] ← [HAVING] ← Filtered Groups │
│ │
└─────────────────────────────────────────────────────────────────┘
11. Which SQL statement extracts data?
SELECT. Optional clauses:WHERE,GROUP BY,HAVING,ORDER BY,LIMIT,JOIN.SELECT name AS full_name, salary * 12 AS annual_salary FROM employees WHERE department = 'Engineering' ORDER BY annual_salary DESC;
12. How to select all records from table Products?
SELECT * FROM products;
⚠️ Interview tip: Always follow up with "But in production, I list explicit columns to reduce network overhead and prevent breaking if columns are reordered or added."
13. Can we rename a column in output?
Yes, using
AS(alias):SELECT first_name AS fname, last_name AS lname FROM customers; -- AS is optional but recommended for readability: SELECT first_name fname FROM customers;
14. How to select FirstName from Customers?
SELECT FirstName FROM Customers;
15. What does the FROM clause do?
Specifies the source table(s). Can include subqueries, JOINs, or even multiple tables (implicit cross join, but avoid this).
16. Which statement returns only different values?
SELECT DISTINCT. It operates on the entire selected row.-- Returns unique city-state combinations SELECT DISTINCT city, state FROM addresses;
17. Display distinct cities in ADDRESSES(id, street_name, number, city, state)?
SELECT DISTINCT city FROM addresses;
18. Select all records where FirstName is "John"?
SELECT * FROM Customers WHERE FirstName = 'John';
19. OR vs AND?
ANDrequires ALL conditions to be true.ORrequires ANY to be true. Use parentheses to control precedence —ANDbinds tighter thanOR.-- Without parentheses, this is parsed as: -- WHERE (age > 18 AND city = 'NY') OR status = 'active' SELECT * FROM customers WHERE age > 18 AND (city = 'NY' OR city = 'LA');
20. Which comparison operators exist?
=,!=or<>,>,<,>=,<=. Some DBs support<=>(NULL-safe equal in MySQL).
21. Select records where FirstName="John" AND LastName="Jackson"?
SELECT * FROM Customers WHERE FirstName = 'John' AND LastName = 'Jackson';
22. How to select 10 random rows?
-- MySQL SELECT * FROM tbl ORDER BY RAND() LIMIT 10; -- PostgreSQL SELECT * FROM tbl ORDER BY RANDOM() LIMIT 10; -- SQL Server SELECT TOP 10 * FROM tbl ORDER BY NEWID();Performance note: This is O(n log n) and full table scan. For large tables, use a random ID approach.
23. What does ISNULL(price, 50) return if price is NULL?
50. The first non-NULL value is returned. Cross-database version:COALESCE(price, 50).
24. Which operator searches text patterns?
LIKEwith wildcards%(zero or more chars) and_(single char).WHERE name LIKE 'A%' -- starts with A WHERE name LIKE '%a%' -- contains a WHERE name LIKE '_a%' -- second char is a WHERE name LIKE '___' -- exactly 3 characters
25. Details of students whose FirstName starts with 'K'?
SELECT * FROM Students WHERE FirstName LIKE 'K%';
26. Which operator selects values in a range?
BETWEEN(inclusive):SELECT * FROM sales WHERE amount BETWEEN 100 AND 500; -- Equivalent to: amount >= 100 AND amount <= 500
27. Correct syntax for date range?
SELECT * FROM Sales WHERE Date BETWEEN '2017-12-01' AND '2018-01-01';Use ISO-8601 format (
YYYY-MM-DD) to avoid locale issues.
28. LastName alphabetically between "Brooks" and "Gray"?
SELECT * FROM Customers WHERE LastName BETWEEN 'Brooks' AND 'Gray';
29. What does the IN keyword do?
Checks if a value matches any in a list. It's a shorthand for multiple ORs and often more readable.
-- These are equivalent: WHERE state IN ('CA', 'NY', 'TX') WHERE state = 'CA' OR state = 'NY' OR state = 'TX'
INcan also contain a subquery:WHERE id IN (SELECT customer_id FROM orders).
30. What does UPPER do?
Converts to uppercase.
LOWERto lowercase.INITCAP(Oracle/PostgreSQL) capitalizes first letter of each word.SELECT UPPER('hello'); -- HELLO
31. Round up to the nearest integer?
CEILING()orCEIL():SELECT CEIL(25.01); -- 26 SELECT FLOOR(25.99); -- 25 SELECT ROUND(25.5); -- 26
32. Current date in MySQL without time?
SELECT CURDATE(); -- MySQL SELECT CURRENT_DATE; -- Standard / PostgreSQL
33. Keyword to retrieve maximum value?
MAX()aggregate function.SELECT MAX(salary) FROM employees;
34. Function to count results?
COUNT().COUNT(*)counts rows.COUNT(column)counts non-NULL values.SELECT COUNT(*) FROM orders; -- Total orders SELECT COUNT(shipped_date) FROM orders; -- Only shipped orders
35. Which are aggregate functions?
COUNT,SUM,AVG,MIN,MAX. Some DBs also haveSTDDEV,VARIANCE,GROUP_CONCAT/STRING_AGG.
36. Return number of records in Customers?
SELECT COUNT(*) FROM Customers;
37. Sorting direction keywords?
ASC(ascending, default) andDESC(descending).ORDER BY price DESC, name ASC;
38. Default sort order if not specified?
ASC.
39. Top 3 students by mark?
-- MySQL / PostgreSQL SELECT name, mark FROM Students ORDER BY mark DESC LIMIT 3; -- SQL Server SELECT TOP 3 name, mark FROM Students ORDER BY mark DESC; -- Oracle SELECT name, mark FROM Students WHERE ROWNUM <= 3 ORDER BY mark DESC;
40. Sort Customers descending by FirstName?
SELECT * FROM Customers ORDER BY FirstName DESC;
41. Correct GROUP BY syntax?
SELECT name, COUNT(name) FROM customers GROUP BY name;In standard SQL (and MySQL 5.7+ with
ONLY_FULL_GROUP_BY), any non-aggregated column in SELECT must appear in GROUP BY.
42. WHERE vs HAVING?
WHERE HAVING Filters rows Filters groups Executes before GROUP BY Executes after GROUP BY Cannot use aggregates Can use COUNT(),SUM(), etc.Can reference any column Can only reference grouped columns or aggregates SELECT department, AVG(salary) AS avg_sal FROM employees WHERE hire_date > '2020-01-01' -- Filter rows first GROUP BY department HAVING AVG(salary) > 60000; -- Then filter groups
Relational databases split data across tables to avoid duplication. JOINs are how you put Humpty Dumpty back together again.
Imagine two circles in a Venn diagram. The left circle is Table A, the right circle is Table B.
┌──────────────────────────────────────────────────────────────────────────────┐
│ THE JOIN FAMILY TREE │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ INNER JOIN LEFT JOIN RIGHT JOIN │
│ │
│ A B A B A B │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │███│ │███│ │███│███│ │ │ │███│███│ │
│ │███│███│███│ │███│███│ │ │ │███│███│ │
│ │███│ │███│ │███│███│ │ │ │███│███│ │
│ └───┘ └───┘ └───┘ └───┘ └───┘ └───┘ │
│ ▓▓▓ Only matching ▓▓▓ All from A ▓▓▓ All from B │
│ │
│ FULL OUTER JOIN CROSS JOIN SELF JOIN │
│ │
│ A B A B A ←→ A │
│ ┌───┐ ┌───┐ ┌───┐ ┌───┐ │
│ │███│███│ │ │███│███│███│ Employee │
│ │███│███│███│ │███│███│███│ ├─ id │
│ │ │███│███│ │███│███│███│ ├─ name │
│ └───┘ └───┘ └───┘ └───┘ └─ manager_id │
│ ▓▓▓ All from both ▓▓▓ Every row x │
│ every row (Cartesian) Join to itself │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
-- Sample tables
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(50)
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
amount DECIMAL(10,2)
);INNER JOIN — Only matching rows from both tables.
SELECT c.name, o.amount
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id;
-- Only customers who HAVE placed ordersLEFT JOIN — All rows from the left table, NULLs for right-table misses.
SELECT c.name, COALESCE(o.amount, 0) AS amount
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
-- ALL customers, even those who never ordered (amount shows 0)RIGHT JOIN — All rows from the right table. Rarely used; just swap table order and use LEFT JOIN.
FULL OUTER JOIN — All rows from both, NULLs where no match. Not supported in MySQL, but works in PostgreSQL/SQL Server/Oracle.
-- PostgreSQL
SELECT * FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;
-- MySQL workaround with UNION
SELECT * FROM customers c LEFT JOIN orders o ON c.id = o.customer_id
UNION
SELECT * FROM customers c RIGHT JOIN orders o ON c.id = o.customer_id;CROSS JOIN — Cartesian product. A × B rows.
-- Every color paired with every size
SELECT color, size FROM colors CROSS JOIN sizes; -- 5 colors × 4 sizes = 20 rowsSELF JOIN — Table joined to itself.
-- Find employees and their managers
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;-- UNION: Combine, remove duplicates
SELECT city FROM customers
UNION
SELECT city FROM suppliers;
-- UNION ALL: Combine, keep duplicates (FASTER — use when you know there are no dupes)
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;
-- INTERSECT: Only rows present in BOTH (not in MySQL)
SELECT city FROM customers
INTERSECT
SELECT city FROM suppliers;
-- EXCEPT / MINUS: In first set but NOT in second
SELECT city FROM customers
EXCEPT -- PostgreSQL / SQL Server
-- MINUS -- Oracle
SELECT city FROM suppliers;43. What is JOIN used for?
Combines rows from two or more tables based on a related column (foreign key relationship).
44. Most common type of join?
INNER JOIN. It's the default — if you writeFROM A JOIN B, most databases interpret it asINNER JOIN.
45. What are different JOINs in SQL?
- INNER JOIN — matching rows only
- LEFT (OUTER) JOIN — all left + matched right
- RIGHT (OUTER) JOIN — all right + matched left
- FULL OUTER JOIN — all rows from both
- CROSS JOIN — Cartesian product
- SELF JOIN — table joined to itself
46. Which is NOT TRUE about the ON clause?
The
ONclause specifies the join condition. It improves readability over the old comma-syntax, supports multi-column joins (ON a.id = b.id AND a.type = b.type), and is required for outer joins. Any statement claiming otherwise is false.
47. Inner join result?
Returns only rows where the join condition evaluates to true in both tables.
48. Can you join 3 tables with INNER JOIN?
Yes. There's no practical limit (though performance degrades with many joins).
SELECT f.name, d.name AS division, c.name AS country FROM faculty f INNER JOIN division d ON f.division_id = d.id INNER JOIN country c ON f.country_id = c.id;
49. Can you join a table to itself?
Yes — SELF JOIN. You must use aliases to distinguish the two instances.
SELECT a.name AS employee, b.name AS manager FROM employees a, employees b WHERE a.manager_id = b.id;
50. What is true about Cartesian Products?
A Cartesian product occurs when tables are joined without a condition. Every row in Table A pairs with every row in Table B:
m × nrows total. It's usually a bug, butCROSS JOINis explicit and valid for combinatorial data.
51. INTERSECTION in relational algebra?
Corresponds to rows present in both sets. SQL:
INTERSECT.
52. UNION in relational algebra?
Corresponds to rows in either set, duplicates removed. Think "A OR B" in set terms.
53. UNION vs UNION ALL?
UNION UNION ALL Removes duplicates Keeps duplicates Slower (sort + dedup) Faster Use when uniqueness matters Use when sets are already distinct or duplicates are meaningful
54. Which set operator for "searched but didn't buy"?
MINUS(Oracle) orEXCEPT(PostgreSQL/SQL Server):SELECT name FROM searchers WHERE product = 'X' EXCEPT SELECT name FROM buyers WHERE product = 'X';
A subquery is a query inside another query. It can return a scalar (one value), a single row, a column (multiple rows, one column), or a table (multiple rows, multiple columns).
-- Scalar subquery (returns one value)
SELECT name, salary,
(SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;
-- Correlated subquery (depends on outer query)
SELECT name, salary
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);
-- Subquery with IN
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- Subquery with EXISTS (often faster than IN for large datasets)
SELECT name FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);EXISTS vs IN:
EXISTSstops at the first match (short-circuits).INbuilds the entire subquery result set first.- Use
EXISTSfor correlated subqueries,INfor small static lists.
CTEs (Common Table Expressions) make complex queries readable. They're like temporary views for a single query.
WITH high_earners AS (
SELECT dept, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
HAVING AVG(salary) > 80000
),
recent_hires AS (
SELECT * FROM employees WHERE hire_date > '2023-01-01'
)
SELECT r.name, r.salary, h.avg_sal
FROM recent_hires r
JOIN high_earners h ON r.dept = h.dept;Recursive CTEs (for hierarchical data):
-- PostgreSQL / SQL Server / MySQL 8.0+
WITH RECURSIVE org_tree AS (
-- Anchor: top-level managers
SELECT id, name, manager_id, 0 AS level
FROM employees WHERE manager_id IS NULL
UNION ALL
-- Recursive: employees under each manager
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree;Window functions are the #1 topic that separates junior from senior SQL interviews. They compute across a "window" of rows related to the current row without collapsing rows like GROUP BY does.
┌──────────────────────────────────────────────────────────────────────────────┐
│ GROUP BY vs WINDOW FUNCTION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Input: GROUP BY result: Window result: │
│ ┌──────┬────────┬────────┐ ┌──────┬────────┐ ┌──────┬────────┐ │
│ │ Dept │ Name │ Salary │ │ Dept │ AVG │ │ Name │ AVG │ │
│ ├──────┼────────┼────────┤ ├──────┼────────┤ ├──────┼────────┤ │
│ │ A │ Alice │ 100 │ → │ A │ 150 │ │ Alice│ 150 │ │
│ │ A │ Bob │ 200 │ │ B │ 400 │ │ Bob │ 150 │ │
│ │ B │ Carol │ 400 │ └──────┴────────┘ │ Carol│ 400 │ │
│ └──────┴────────┴────────┘ └──────┴────────┘ │
│ │
│ GROUP BY collapses rows Window keeps all rows, adds calculation │
│ (3 rows → 2 rows) (3 rows → 3 rows) │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
-- ROW_NUMBER(): Unique rank, no ties
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS rank_num
FROM employees;
-- Alice: 50000 → 1, Bob: 50000 → 2, Carol: 40000 → 3
-- RANK(): Ties get same rank, next rank skips
SELECT name, salary,
RANK() OVER (ORDER BY salary DESC) AS rank_num
FROM employees;
-- Alice: 50000 → 1, Bob: 50000 → 1, Carol: 40000 → 3
-- DENSE_RANK(): Ties get same rank, next rank doesn't skip
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rank_num
FROM employees;
-- Alice: 50000 → 1, Bob: 50000 → 1, Carol: 40000 → 2
-- PARTITION BY: Reset window per group
SELECT dept, name, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS dept_rank
FROM employees;
-- LAG / LEAD: Previous / next row values
SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day,
revenue - LAG(revenue, 1) OVER (ORDER BY date) AS daily_change
FROM daily_sales;
-- Running total
SELECT name, salary,
SUM(salary) OVER (ORDER BY hire_date) AS running_total
FROM employees;
-- Moving average
SELECT date, temperature,
AVG(temperature) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM weather;55. A SELECT statement whose results are used in filtering the main query is called?
A subquery (inner query or nested query).
56. Subqueries can be nested in...
SELECT,INSERT,UPDATE,DELETE, or another subquery. They can appear inWHERE,FROM,SELECT(scalar), orHAVING.
57. Row comparison operators with subqueries?
IN,ANY,ALL,EXISTS,NOT EXISTS, and standard comparison operators (=,>,<,>=,<=,<>).WHERE salary > ALL (SELECT salary FROM employees WHERE dept = 'Sales') WHERE salary > ANY (SELECT salary FROM employees WHERE dept = 'Sales')
58. A subquery inside a subquery is called?
A nested subquery.
59. A subquery that references the outer query?
A correlated subquery. It executes once for each row of the outer query.
SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);
60. What is the CASE function?
SQL's version of IF-THEN-ELSE. Two forms: simple and searched.
-- Searched CASE (more flexible) SELECT name, CASE WHEN salary >= 100000 THEN 'Executive' WHEN salary >= 60000 THEN 'Senior' WHEN salary >= 40000 THEN 'Mid' ELSE 'Junior' END AS level FROM employees; -- Simple CASE SELECT name, CASE dept WHEN 'Engineering' THEN 'Tech' WHEN 'Sales' THEN 'Revenue' ELSE 'Other' END AS dept_category FROM employees;
-- Comprehensive CREATE TABLE
CREATE TABLE employees (
id INT AUTO_INCREMENT PRIMARY KEY COMMENT 'Surrogate key',
email VARCHAR(255) NOT NULL UNIQUE,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
salary DECIMAL(10,2) CHECK (salary >= 0),
department_id INT,
hire_date DATE DEFAULT (CURRENT_DATE),
is_active TINYINT(1) DEFAULT 1,
metadata JSON,
-- Foreign key with ON DELETE/UPDATE actions
CONSTRAINT fk_dept
FOREIGN KEY (department_id)
REFERENCES departments(id)
ON DELETE SET NULL -- Keep employee if dept deleted
ON UPDATE CASCADE, -- Update dept_id if departments.id changes
-- Table-level constraint
CONSTRAINT chk_names CHECK (first_name <> last_name)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COMMENT='Company employees';ALTER TABLE operations:
-- Add column
ALTER TABLE employees ADD phone VARCHAR(20);
-- Add column with position (MySQL)
ALTER TABLE employees ADD middle_name VARCHAR(50) AFTER first_name;
-- Modify column type
ALTER TABLE employees MODIFY phone VARCHAR(30);
-- Rename column (MySQL)
ALTER TABLE employees CHANGE phone mobile_phone VARCHAR(30);
-- Rename column (PostgreSQL / standard)
ALTER TABLE employees RENAME COLUMN phone TO mobile_phone;
-- Drop column
ALTER TABLE employees DROP COLUMN phone;
-- Add index
ALTER TABLE employees ADD INDEX idx_hire_date (hire_date);
-- Drop foreign key
ALTER TABLE employees DROP FOREIGN KEY fk_dept;-- INSERT: Add data
INSERT INTO employees (email, first_name, last_name, salary)
VALUES ('[email protected]', 'Alice', 'Smith', 75000);
-- Bulk insert
INSERT INTO employees (email, first_name, last_name, salary)
VALUES
('[email protected]', 'Bob', 'Jones', 80000),
('[email protected]', 'Carol', 'White', 90000);
-- INSERT from SELECT
INSERT INTO employees_archive (id, email, name)
SELECT id, email, CONCAT(first_name, ' ', last_name)
FROM employees WHERE is_active = 0;
-- UPDATE: Modify data (ALWAYS use WHERE!)
UPDATE employees
SET salary = salary * 1.1, last_raise = CURDATE()
WHERE performance_rating = 'Exceeds';
-- UPDATE with JOIN (MySQL)
UPDATE employees e
JOIN departments d ON e.department_id = d.id
SET e.salary = e.salary * 1.05
WHERE d.location = 'San Francisco';
-- DELETE: Remove rows (ALWAYS use WHERE!)
DELETE FROM employees WHERE id = 42;
-- DELETE with JOIN
DELETE e FROM employees e
JOIN departments d ON e.department_id = d.id
WHERE d.name = 'Dissolved Team';TRUNCATE vs DELETE vs DROP:
┌──────────────────────────────────────────────────────────────────────────────┐
│ DELETE vs TRUNCATE vs DROP — THE SHOWDOWN │
├─────────────┬─────────────┬─────────────┬────────────────────────────────────┤
│ Aspect │ DELETE │ TRUNCATE │ DROP │
├─────────────┼─────────────┼─────────────┼────────────────────────────────────┤
│ Type │ DML │ DDL │ DDL │
│ WHERE │ Yes │ No │ N/A (whole object) │
│ Speed │ Slow (row) │ Fast (page) │ Fast │
│ Logging │ Row-level │ Page-level │ Minimal │
│ Rollback │ Yes │ Sometimes │ No │
│ Triggers │ Fires │ Doesn't fire│ N/A │
│ Identity │ Preserved │ Reset │ N/A │
│ Structure │ Kept │ Kept │ DESTROYED │
│ Use case │ Remove some │ Empty table │ Remove table forever │
└─────────────┴─────────────┴─────────────┴────────────────────────────────────┘
| Constraint | Purpose | NULL Allowed? |
|---|---|---|
PRIMARY KEY |
Unique identifier | No |
UNIQUE |
No duplicates | Yes (one NULL usually) |
NOT NULL |
Must have value | — |
FOREIGN KEY |
Referential integrity | Yes (optional relationship) |
CHECK |
Value condition | Depends |
DEFAULT |
Auto-fill if omitted | — |
-- UNIQUE allows multiple NULLs (in most DBs)
CREATE TABLE users (
email VARCHAR(255) UNIQUE -- NULL ≠ NULL in SQL, so multiple NULLs OK
);
-- CHECK constraint
CREATE TABLE products (
price DECIMAL(10,2) CHECK (price > 0),
discount DECIMAL(3,2) CHECK (discount BETWEEN 0 AND 1)
);61. Different types of SQL statements?
DDL (CREATE, ALTER, DROP, TRUNCATE), DML (INSERT, UPDATE, DELETE, SELECT), DQL (SELECT — sometimes grouped with DML), DCL (GRANT, REVOKE), TCL (COMMIT, ROLLBACK, SAVEPOINT).
62. Which statements are DDL?
CREATE,ALTER,DROP,TRUNCATE,RENAME.
63. DML includes?
SELECT,INSERT,UPDATE,DELETE.
64. GRANT and REVOKE are under?
DCL (Data Control Language).
65. Which is NOT a DML statement?
COMMIT— it's TCL (Transaction Control Language). AlsoCREATE,GRANT,TRUNCATEare not DML.
66. Which statement inserts new data?
INSERT INTO.
67. Must you specify columns when inserting?
No, if you provide values for ALL columns in the exact table order. But best practice: always specify columns. It prevents errors when schema changes and is self-documenting.
-- Risky: breaks if columns are reordered INSERT INTO customers VALUES (1, 'John', 'Doe'); -- Safe: explicit and clear INSERT INTO customers (id, first_name, last_name) VALUES (1, 'John', 'Doe');
68. Insert a new record into Customers?
INSERT INTO Customers (id, first_name, last_name) VALUES (1, 'John', 'Doe');
69. Insert "Hawkins" as LastName in Customers?
INSERT INTO Customers (LastName) VALUES ('Hawkins');
70. How to create a temporary table in MySQL?
CREATE TEMPORARY TABLE temp_top_customers SELECT * FROM customers WHERE total_spent > 10000;It exists only for the current session/connection and is automatically dropped when the session ends.
71. Which statement updates data?
UPDATE.
72. Keyword in UPDATE to change values?
SET.
73. Change "Jackson" to "Hawkins" in LastName?
UPDATE Customers SET LastName = 'Hawkins' WHERE LastName = 'Jackson';
⚠️ Without WHERE, every row becomes Hawkins!
74. Which statement deletes data?
DELETE.
75. Delete records where FirstName is "John"?
DELETE FROM Customers WHERE FirstName = 'John';
76. What is FROM used for?
Specifies the source table(s) for
SELECT,UPDATE,DELETE, andINSERT.
77. DELETE vs TRUNCATE?
See the comparison table above. Key differences:
DELETEis DML (row-by-row, rollbackable, fires triggers, has WHERE).TRUNCATEis DDL (bulk page deallocation, faster, no WHERE, resets auto-increment, doesn't fire triggers).
78. Which statement creates a table?
CREATE TABLE.
79. What is Collation?
Collation defines the rules for comparing and sorting characters. It works with the character set.
utf8mb4_general_ci: Case-insensitive, fasterutf8mb4_unicode_ci: Case-insensitive, accurate for all Unicodeutf8mb4_bin: Binary comparison (case-sensitive)SELECT * FROM users WHERE name = 'john'; -- Matches 'John' with _ci collation
80. What is AUTO_INCREMENT?
Automatically generates sequential unique integers for a column, typically the PRIMARY KEY.
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50)); INSERT INTO users (name) VALUES ('Alice'); -- id becomes 1 automaticallyIn PostgreSQL, use
SERIALorGENERATED ALWAYS AS IDENTITY. In SQL Server,IDENTITY(1,1).
81. Valid constraints in MySQL?
NOT NULL,UNIQUE,PRIMARY KEY,FOREIGN KEY,CHECK,DEFAULT,AUTO_INCREMENT.
82. Which is NOT TRUE about constraints?
Any statement claiming constraints hurt data integrity or are optional in production is false. Constraints are essential for data quality.
83. NOT NULL constraint?
Forces a column to reject NULL values. Every row must have a value.
84. An unique (non-key) field?
Use the
UNIQUEconstraint. Unlike PRIMARY KEY, a table can have multiple UNIQUE constraints, and they typically allow one NULL.
85. What is CHECK constraint?
Enforces a boolean condition on every row:
CREATE TABLE persons (age INT CHECK (age >= 0 AND age <= 150));
86. UNIQUE vs PRIMARY KEY?
PRIMARY KEY UNIQUE One per table Multiple per table Cannot be NULL Usually allows one NULL Implicitly indexed Explicitly indexed Identifies the row Alternate identifier
87. What does DROP TABLE do?
Permanently removes the table definition, all data, indexes, triggers, and constraints. Cannot be undone (unless in a transaction that rolls back, in some DBs).
DROP TABLE IF EXISTS old_customers;
88. DROP vs TRUNCATE?
DROPremoves the table structure.TRUNCATEremoves only the data, keeping the structure.
89. Add order_date column to order table?
ALTER TABLE `order` ADD order_date DATE;Backticks escape reserved words. Better: rename the table to
orders.
90. Rename column Address to Addr in Customer?
-- MySQL ALTER TABLE Customer CHANGE Address Addr VARCHAR(50); -- PostgreSQL / Standard ALTER TABLE Customer RENAME COLUMN Address TO Addr;
91. Delete column CITY from ADDRESSES?
ALTER TABLE addresses DROP COLUMN city;
Imagine a library with 1 million books. Finding a book without a catalog means checking every shelf — that's a full table scan. An index is the card catalog, organized alphabetically, letting you jump directly to the right shelf.
┌──────────────────────────────────────────────────────────────────────────────┐
│ TABLE SCAN vs INDEXED LOOKUP │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Without Index (Full Table Scan) With Index (B-Tree) │
│ │
│ 📚 📚 📚 📚 📚 📚 📚 📚 📚 📚 [M]─────┐ │
│ 📚 📚 📚 📚 📚 📚 📚 📚 📚 📚 / \ │
│ 📚 📚 📚 📚 📚 📚 📚 📚 📚 📚 [G] [S] │
│ 📚 📚 📚 📚 📚 📚 📚 📚 📚 📚 / \ / \ │
│ 📚 📚 📚 📚 📚 📚 📚 📚 📚 📚 [C] [J] [P] [W] │
│ ↑ | | | | │
│ Check EVERY book Look up 'Smith': go right to S │
│ O(n) — slow for large n O(log n) — scales to billions │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
-- Create a basic index
CREATE INDEX idx_email ON users(email);
-- Create a UNIQUE index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_username ON users(username);
-- Composite index (multi-column)
CREATE INDEX idx_name_age ON users(last_name, first_name, age);
-- Prefix index (for long text columns)
CREATE INDEX idx_bio_prefix ON users(bio(100));
-- Covering index (includes all columns needed for a query)
CREATE INDEX idx_covering ON orders(customer_id, status, total)
INCLUDE (created_at); -- PostgreSQL / SQL Server syntax┌──────────────────────────────────────────────────────────────────────────────┐
│ CLUSTERED vs NON-CLUSTERED INDEX │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ CLUSTERED INDEX NON-CLUSTERED INDEX │
│ (Only ONE per table) (Many per table) │
│ │
│ The table IS the index. The index is a SEPARATE structure. │
│ │
│ ┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ ID │ Data │ │ Index Key │────→│ ID │ Data │ │
│ ├────┼────────────┤ ├──────────────┤ ├────┼────────────┤ │
│ │ 1 │ Alice, ... │ │ Alice → 1 │ │ 1 │ Alice, ... │ │
│ │ 2 │ Bob, ... │ │ Bob → 2 │ │ 2 │ Bob, ... │ │
│ │ 3 │ Carol, ... │ │ Carol → 3 │ │ 3 │ Carol, ... │ │
│ └────┴────────────┘ └──────────────┘ └─────────────────┘ │
│ │
│ Data stored IN ORDER. Index contains pointers to actual rows. │
│ Physically reorders table. Index + lookup required. │
│ Range queries are blazing fast. Extra storage overhead. │
│ │
│ MySQL InnoDB: PK is clustered. MySQL: Secondary indexes point to PK. │
│ SQL Server: Can choose. PostgreSQL: Only non-clustered (Heap). │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
SQL injection happens when user input is concatenated directly into a query. It's the most dangerous and common database vulnerability.
┌──────────────────────────────────────────────────────────────────────────────┐
│ THE SQL INJECTION ATTACK │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ VULNERABLE CODE: │
│ query = "SELECT * FROM users WHERE username = '" + input + "'"; │
│ │
│ ATTACKER INPUTS: RESULTING QUERY: │
│ ' OR '1'='1 SELECT * FROM users WHERE username = '' │
│ OR '1'='1' ← ALWAYS TRUE! │
│ │
│ RESULT: Returns ALL users. Attacker is now logged in as everyone. │
│ │
│ WORSE INPUT: │
│ '; DROP TABLE users; -- │
│ │
│ RESULTING QUERY: │
│ SELECT * FROM users WHERE username = ''; DROP TABLE users; --' │
│ │
│ RESULT: Your users table is GONE. │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
The Fix: Always Use Parameterized Queries / Prepared Statements
# ❌ BAD — String concatenation (NEVER DO THIS)
cursor.execute(f"SELECT * FROM users WHERE id = {user_input}")
# ✅ GOOD — Parameterized query (database escapes input)
cursor.execute("SELECT * FROM users WHERE id = %s", (user_input,))
# ✅ GOOD — Prepared statement
stmt = db.prepare("SELECT * FROM users WHERE id = ?")
stmt.execute(user_input)-- ❌ VULNERABLE
SELECT * FROM users WHERE email = '$email';
-- ✅ SAFE (Prepared statement)
SELECT * FROM users WHERE email = ?;92. What are Indexes?
Data structures (typically B-Trees) that speed up data retrieval at the cost of slower writes and extra storage. Think of them as a book's table of contents.
93. Clustered vs Non-Clustered?
- Clustered: Determines physical storage order. One per table. The data IS the index.
- Non-Clustered: Separate structure with pointers to data. Multiple allowed per table.
- Clustered indexes excel at range queries. Non-clustered are more flexible.
94. Statement for assigning privileges?
GRANT. To remove:REVOKE.GRANT SELECT, INSERT ON database.orders TO 'app_user'@'%'; REVOKE DELETE ON database.orders FROM 'app_user'@'%';
95. Types of privileges?
- System privileges: CREATE, DROP, ALTER (database-level)
- Object privileges: SELECT, INSERT, UPDATE, DELETE, EXECUTE (table/procedure-level)
96. Privileges a user can grant?
SELECT,INSERT,UPDATE,DELETE,REFERENCES,EXECUTE,USAGE,ALL PRIVILEGES.
115. What is SQL Injection?
A code injection attack where malicious SQL is inserted via user input. Prevention:
- Parameterized queries (prepared statements)
- Input validation
- ORM frameworks (most handle escaping automatically)
- Least privilege (app user shouldn't have DROP permissions)
- Stored procedures with parameters
The classic example: Alice sends $100 to Bob. Two operations must happen:
- Subtract $100 from Alice's account
- Add $100 to Bob's account
If the system crashes after step 1, $100 vanishes into thin air. A transaction wraps both steps into an atomic unit.
┌──────────────────────────────────────────────────────────────────────────────┐
│ THE BANK TRANSFER TRANSACTION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ BEGIN; │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Alice: $500 │ │ Bob: $300 │ │
│ │ │ │ │ │
│ │ [- $100] │────┐ │ │ │
│ │ $400 │ │ │ [+ $100] │ │
│ │ │◄───┘ │ $400 │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ ↓ If everything succeeds: ↓ If anything fails: │
│ COMMIT; ROLLBACK; │
│ (Permanent) (Undo everything) │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────────────┐
│ ACID PROPERTIES │
├──────────────┬───────────────────────────────────────────────────────────────┤
│ ATOMICITY │ All operations complete, or none do. │
│ │ "All or nothing." No partial commits. │
│ │ Analogy: Wedding vows — both say "I do" or neither does. │
├──────────────┼───────────────────────────────────────────────────────────────┤
│ CONSISTENCY │ Database moves from one valid state to another. │
│ │ Constraints, triggers, and cascades must all succeed. │
│ │ Analogy: A legal contract — every clause must be valid. │
├──────────────┼───────────────────────────────────────────────────────────────┤
│ ISOLATION │ Concurrent transactions don't interfere with each other. │
│ │ Each transaction sees a consistent snapshot. │
│ │ Analogy: Private changing rooms — you don't see others. │
├──────────────┼───────────────────────────────────────────────────────────────┤
│ DURABILITY │ Once committed, data survives power loss/crashes. │
│ │ Written to disk / WAL (Write-Ahead Log). │
│ │ Analogy: Etching in stone — permanent once done. │
└──────────────┴───────────────────────────────────────────────────────────────┘
When multiple transactions run at the same time, without isolation, three weird things can happen:
┌──────────────────────────────────────────────────────────────────────────────┐
│ THE THREE READ PHENOMENA (Bad Things That Happen) │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. DIRTY READ 2. NON-REPEATABLE READ │
│ │
│ T1 reads uncommitted data. T1 reads a row. T2 updates it. T1 │
│ T1: SELECT → 100 reads again → different value! │
│ T2: UPDATE → 200 (not committed) │
│ T1: SELECT → 200 ← DIRTY! │
│ │
│ 3. PHANTOM READ │
│ T1: SELECT WHERE age > 30 → 5 rows │
│ T2: INSERT new person age 35 (committed) │
│ T1: SELECT WHERE age > 30 → 6 rows ← PHANTOM! │
│ The "ghost" row appeared. │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
Isolation Levels (from least to most strict):
| Level | Dirty Read | Non-Repeatable | Phantom | Use Case |
|---|---|---|---|---|
| READ UNCOMMITTED | ✅ Allowed | ✅ Allowed | ✅ Allowed | Rarely used; reporting on huge tables |
| READ COMMITTED | ❌ Blocked | ✅ Allowed | ✅ Allowed | Oracle / SQL Server default |
| REPEATABLE READ | ❌ Blocked | ❌ Blocked | ✅ Allowed | MySQL InnoDB default |
| SERIALIZABLE | ❌ Blocked | ❌ Blocked | ❌ Blocked | Financial transactions |
-- Check/set isolation level (MySQL)
SELECT @@transaction_isolation;
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- PostgreSQL
SHOW default_transaction_isolation;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;┌──────────────────────────────────────────────────────────────────────────────┐
│ DEADLOCK ILLUSTRATION │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Transaction A Transaction B │
│ ───────────── ───────────── │
│ LOCK row 1 LOCK row 2 │
│ ↓ ↓ │
│ WAIT for row 2 ←──── BLOCKED ─→ WAIT for row 1 │
│ ↓ ↓ │
│ 💀 DEADLOCK 💀 │
│ │
│ Database detects this and kills one transaction (ROLLBACK). │
│ Your app should catch deadlock errors and retry. │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
-- Explicit locking (PostgreSQL)
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Optimistic locking with version column
UPDATE accounts
SET balance = balance - 100, version = version + 1
WHERE id = 1 AND version = 5; -- If version changed, 0 rows updated → retry97. What does "locking" refer to?
A mechanism to prevent concurrent transactions from interfering with each other. Locks can be shared (read locks, multiple allowed) or exclusive (write locks, only one allowed).
98. Main transaction controls?
COMMIT— Make changes permanentROLLBACK— Undo all changes in the transactionSAVEPOINT— Set a partial rollback pointSTART TRANSACTION; UPDATE accounts SET balance = balance - 100 WHERE id = 1; SAVEPOINT before_credit; UPDATE accounts SET balance = balance + 100 WHERE id = 2; -- Oops, wrong account! ROLLBACK TO SAVEPOINT before_credit; -- Now retry with correct account COMMIT;
99. A transaction that completes execution is said to be?
Committed.
100. What does ROLLBACK do after COMMIT?
Nothing. Once
COMMITis executed, changes are permanent.ROLLBACKonly works on uncommitted changes.
101. Valid properties of a transaction?
ACID: Atomicity, Consistency, Isolation, Durability.
102. What if autocommit is enabled?
Every individual statement is automatically committed. In MySQL, autocommit is ON by default. Turn it off for multi-statement transactions:
SET autocommit = 0; -- ... multiple statements ... COMMIT; SET autocommit = 1;
103. Default isolation level in MySQL?
REPEATABLE READ (InnoDB engine). Prevents dirty reads and non-repeatable reads, but phantom reads are possible (though InnoDB uses MVCC + gap locking to mostly prevent them).
104. Is autocommit enabled by default in MySQL?
Yes, for each new session.
A View is a stored SELECT query — a virtual table. It doesn't store data (unless it's a materialized view), but provides a consistent interface.
-- Simple view
CREATE VIEW active_customers AS
SELECT id, name, email FROM customers WHERE status = 'active';
-- Complex view with joins
CREATE VIEW order_summary AS
SELECT
o.id AS order_id,
c.name AS customer_name,
SUM(oi.quantity * oi.price) AS total
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY o.id, c.name;
-- Query it like a table
SELECT * FROM order_summary WHERE total > 1000;Updatable Views (can use INSERT/UPDATE/DELETE):
- Must reference exactly one base table
- Cannot contain DISTINCT, GROUP BY, aggregates, or UNION
- Must include all NOT NULL columns without defaults
-- Materialized view (PostgreSQL) — stores actual data, needs refresh
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT DATE_TRUNC('month', order_date) AS month, SUM(total)
FROM orders GROUP BY 1;
REFRESH MATERIALIZED VIEW monthly_sales;-- MySQL Stored Procedure
DELIMITER //
CREATE PROCEDURE TransferMoney(
IN from_account INT,
IN to_account INT,
IN amount DECIMAL(10,2)
)
BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION
BEGIN
ROLLBACK;
RESIGNAL;
END;
START TRANSACTION;
UPDATE accounts SET balance = balance - amount WHERE id = from_account;
UPDATE accounts SET balance = balance + amount WHERE id = to_account;
COMMIT;
END //
DELIMITER ;
-- Call it
CALL TransferMoney(1, 2, 100.00);| Stored Procedure | Function | |
|---|---|---|
| Returns | Can return multiple result sets | Must return exactly one value |
| Called with | CALL proc_name() |
Used in expressions: SELECT func() |
| Side effects | Yes (can modify data) | Usually no (deterministic) |
| Transactions | Can manage transactions | Cannot (in most DBs) |
Triggers auto-execute when an event occurs. Useful for auditing, but use sparingly — they hide logic.
-- Audit trigger: log all salary changes
CREATE TRIGGER trg_salary_audit
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
IF OLD.salary <> NEW.salary THEN
INSERT INTO salary_audit (emp_id, old_salary, new_salary, changed_at)
VALUES (NEW.id, OLD.salary, NEW.salary, NOW());
END IF;
END;
-- Trigger types:
-- BEFORE INSERT/UPDATE/DELETE → Validate/modify data before it hits the table
-- AFTER INSERT/UPDATE/DELETE → Log, cascade, notify
-- INSTEAD OF (views only) → Redirect operationsCursors iterate row-by-row. They're slow — avoid them if set-based operations work.
-- MySQL cursor example
DELIMITER //
CREATE PROCEDURE ProcessEmployees()
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE emp_name VARCHAR(100);
DECLARE emp_salary DECIMAL(10,2);
DECLARE emp_cursor CURSOR FOR
SELECT name, salary FROM employees WHERE department = 'Sales';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN emp_cursor;
read_loop: LOOP
FETCH emp_cursor INTO emp_name, emp_salary;
IF done THEN LEAVE read_loop; END IF;
-- Process each row here
END LOOP;
CLOSE emp_cursor;
END //
DELIMITER ;105. What is a virtual table in MySQL?
A View.
106. Are views updatable?
Some are. Requirements: based on a single table, no aggregates/DISTINCT/GROUP BY/UNION, includes primary key, and all NOT NULL columns.
107. Advantages of views?
- Simplify complex queries — hide JOINs and aggregations
- Security — expose only specific columns/rows
- Logical independence — shield apps from schema changes
- Consistency — same calculation logic everywhere
108. Does a view contain data?
Regular views: No (virtual, query executed on access). Materialized views: Yes (stored data, needs refresh).
109. How to remove a view?
DROP VIEW IF EXISTS order_summary;
110. Can a view be based on another view?
Yes — nested views. Be careful with deep nesting as it hurts performance and readability.
111. Iterate over rows in a stored procedure?
Use a CURSOR. Lifecycle:
DECLARE→OPEN→FETCH→CLOSE.
112. Process of finding a good query execution strategy?
Query optimization. The database's query optimizer generates multiple execution plans and picks the cheapest one based on statistics, indexes, and cost models. You can inspect plans with
EXPLAIN.
113. What is a trigger?
A stored procedure that automatically executes in response to a DML event (INSERT, UPDATE, DELETE) or DDL event (CREATE, ALTER, DROP). Components: Event (when), Condition (optional), Action (what to do).
114. A stored procedure auto-executing on INSERT/UPDATE/DELETE?
A Trigger.
The EXPLAIN command shows how the database plans to execute your query. It's the X-ray for SQL performance.
EXPLAIN SELECT * FROM employees WHERE email = '[email protected]';
-- PostgreSQL (more detailed)
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM employees WHERE email = '[email protected]';Key columns to watch (MySQL):
| Column | Meaning | Red Flags |
|---|---|---|
type |
Join type | ALL = full table scan (bad). Aim for const, eq_ref, ref, range. |
possible_keys |
Indexes considered | Empty = no useful indexes exist |
key |
Index actually used | NULL = no index used |
rows |
Rows examined | Should be much smaller than table size |
Extra |
Additional info | Using filesort (bad), Using temporary (bad), Using index (great!) |
┌──────────────────────────────────────────────────────────────────────────────┐
│ EXPLAIN OUTPUT CHEAT SHEET │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ TYPE column (best to worst): │
│ │
│ const → PK lookup on constant [BEST] │
│ eq_ref → PK/unique join ★★★★★ │
│ ref → Non-unique index match ★★★★☆ │
│ range → Index range scan ★★★☆☆ │
│ index → Full index scan ★★☆☆☆ │
│ ALL → Full TABLE scan (NO!) ★☆☆☆☆ │
│ │
│ EXTRA column: │
│ Using index → COVERING INDEX (doesn't touch table!) [GREAT] │
│ Using where → Filtering after index lookup [OK] │
│ Using filesort → Sorting without index [SLOW] │
│ Using temporary → Created temp table [SLOW] │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
SELECT *— Pulls unnecessary data; prevents covering indexes.- Functions on indexed columns —
WHERE YEAR(date_col) = 2023can't use index. UseWHERE date_col BETWEEN '2023-01-01' AND '2023-12-31'. - Leading wildcard LIKE —
LIKE '%smith'can't use B-Tree index. Use full-text search instead. - Implicit conversions — Comparing string column to number forces conversion; index ignored.
- Missing indexes on JOIN/FK columns — Foreign keys should almost always be indexed.
- N+1 queries — Fetching related data in a loop. Use JOINs or
IN (...). - Large OFFSET —
OFFSET 1000000scans and discards 1M rows. Use keyset pagination. - Sorting without index —
ORDER BYon unindexed columns triggers filesort. - Subqueries in SELECT — Correlated subqueries run once per row; use JOINs instead.
- Not analyzing tables — Stale statistics lead to bad plans. Run
ANALYZE TABLE.
Normalization eliminates redundancy and prevents anomalies (insert, update, delete). Think of it as organizing a closet — each item has one specific place.
┌──────────────────────────────────────────────────────────────────────────────┐
│ NORMALIZATION LEVELS VISUALIZED │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ BEFORE NORMALIZATION (A Mess) │
│ ┌────────────┬───────────┬────────────────────────────┬──────────┐ │
│ │ Student │ Course │ Instructor │ Office │ │
│ ├────────────┼───────────┼────────────────────────────┼──────────┤ │
│ │ Alice │ Math,Phys │ Prof. Smith, Prof. Jones │ 101, 205 │ │
│ │ Bob │ Math │ Prof. Smith │ 101 │ │
│ └────────────┴───────────┴────────────────────────────┴──────────┘ │
│ Problems: Repeating groups, redundancy, update anomaly (move office 3x) │
│ │
│ 1NF → Atomic values, no repeating groups │
│ 2NF → No partial dependencies (all non-key columns depend on WHOLE key) │
│ 3NF → No transitive dependencies (no column depends on non-key column) │
│ BCNF → Every determinant is a candidate key │
│ │
│ AFTER 3NF (Clean & Organized) │
│ Students(id, name) Courses(id, name) Instructors(id, name, office) │
│ Enrollments(student_id, course_id, instructor_id) │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
The Rules in Detail:
| Normal Form | Rule | Violation Example |
|---|---|---|
| 1NF | All columns atomic (no lists/arrays in a cell) | courses: "Math, Physics" |
| 2NF | No partial dependency (only relevant for composite keys) | Price depends on product_id but not order_id in an order_items table |
| 3NF | No transitive dependency | employee table has dept_name when it should just have dept_id |
| BCNF | Every determinant is a candidate key | A table with (student, course) → instructor and instructor → course |
Normalization optimizes for writes. Sometimes reads are so frequent you intentionally denormalize:
- Counters: Pre-computed
comment_counton a blog post table - Derived columns: Storing
total_pricewhen it'sqty * unit_price - Materialized views: Pre-aggregated reports
- Star schemas: Data warehouses intentionally use denormalized fact/dimension tables
Trade-off: Faster reads, but you must keep redundant data in sync (triggers, application logic, or batch jobs).
- Practice on paper — Many interviews are whiteboard-based. Write queries without autocomplete.
- Know your dialect — If the job lists PostgreSQL, know
LIMIT/OFFSET,::cast,JSONB, window functions. - Memorize execution order:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT - Be ready to optimize — "How would you make this faster?" → Indexes, query rewrite, partitioning.
- Clarify assumptions — Ask about NULL handling, duplicates, tie-breaking.
- Start simple, then refine — Write a working query first, then optimize.
- Talk through your logic — Explain why you chose JOIN over subquery, or which index you'd add.
- Mention edge cases — What if two employees tie for highest salary? Use
RANK()orDENSE_RANK().
| Pattern | Technique |
|---|---|
| Running total / moving average | Window functions (SUM() OVER, AVG() OVER) |
| Top N per group | ROW_NUMBER() OVER (PARTITION BY ...) |
| Gaps in sequences | LEAD() / LAG() or self-join |
| Hierarchical data | Recursive CTEs |
| Pivot data | CASE aggregates or PIVOT (SQL Server/Oracle) |
| Duplicate detection | GROUP BY ... HAVING COUNT(*) > 1 |
| Median calculation | PERCENTILE_CONT() or NTILE() |
- Practice: LeetCode Database, HackerRank SQL, SQLBolt
- Documentation: PostgreSQL Docs, MySQL Reference
- Books: SQL Performance Explained (Markus Winand), Learning SQL (Alan Beaulieu)
- Visualization: Use The Index, Luke! — Free online SQL indexing guide
"The most important SQL skill isn't memorizing syntax — it's knowing how to ask the right question of your data, and how the database will answer it."
Good luck with your interview! 🚀