An index in SQL is like an index in a book — it helps the database find rows faster without scanning the entire table.
📘 Imagine you’re searching for "Spring Boot" in a 1,000-page book:
- Without an index: You flip every page one by one.
- With an index: You go straight to page 573.
In SQL, the same thing happens with rows in a table.
- Speed up queries (especially those with
WHERE,JOIN,ORDER BY) - Improve performance on large tables
- Help databases avoid full table scans
CREATE INDEX index_name ON table_name (column1, column2, ...);CREATE INDEX idx_customer_name ON customers(name);Now, this helps:
SELECT * FROM customers WHERE name = 'John';go much faster.
| Type | Use Case |
|---|---|
| Single Column Index | Index on one column |
| Composite Index | Index on multiple columns |
| Unique Index | Ensures values are unique (like a unique constraint) |
| Full-text Index | For fast text search (e.g., searching keywords in articles) |
| Clustered Index | Data rows are physically stored in index order (SQL Server, MySQL InnoDB) |
| Non-Clustered Index | Index is separate from the actual data |
| Bitmap Index | Efficient for columns with few distinct values (mainly in data warehousing) |
| Partial Index | Index on a filtered subset of rows (e.g., only where status = 'active') |
| Covering Index | Contains all columns needed by the query so table access is avoided |
Great! Let’s go through each index type one by one with:
- ✅ Simple Explanation
- 🧠 Use Case
- 🧾 Example
- ✅ Index on one column
- ⚡ Used for filtering or searching on that column
Example:
CREATE INDEX idx_users_email ON users(email);Use Case: Search users by email quickly:
SELECT * FROM users WHERE email = 'a@example.com';- ✅ Index on multiple columns
⚠️ Order of columns matters
Example:
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);Use Case: This index helps with:
-- Uses full index (both user_id and order_date)
SELECT * FROM orders WHERE user_id = 5 AND order_date = '2023-01-01';
-- Uses only leading column (user_id)
SELECT * FROM orders WHERE user_id = 5;
-- ❌ This query won’t use the index
SELECT * FROM orders WHERE order_date = '2023-01-01';- ✅ Ensures all values are unique
- 🔒 Like a UNIQUE constraint, but can be added explicitly
Example:
CREATE UNIQUE INDEX idx_unique_username ON users(username);Use Case: Prevent duplicate usernames:
INSERT INTO users(username) VALUES ('john_doe'); -- allowed
INSERT INTO users(username) VALUES ('john_doe'); -- ❌ error- ✅ Optimized for text search
- 🔍 Useful for searching keywords or phrases in large text columns
Example (MySQL):
CREATE FULLTEXT INDEX idx_articles_content ON articles(content);Use Case:
SELECT * FROM articles
WHERE MATCH(content) AGAINST('database indexing');-
✅ Data is physically stored in index order
-
📦 Only one clustered index per table
-
Used by default in:
- MySQL InnoDB (by primary key)
- SQL Server (can define manually)
Example (SQL Server):
CREATE CLUSTERED INDEX idx_orders_id ON orders(order_id);Use Case: Speeds up retrieval by primary key, since data is already sorted that way.
- ✅ Index stored separately from data
- 📍 Has a pointer to actual row (row ID or primary key)
- You can have many non-clustered indexes per table
Example (SQL Server):
CREATE NONCLUSTERED INDEX idx_users_status ON users(status);Use Case:
SELECT * FROM users WHERE status = 'active';- ✅ Uses bits instead of row pointers
- 🔁 Great for low-cardinality columns like gender, status
- 📊 Used mostly in data warehouses
Example (Oracle):
CREATE BITMAP INDEX idx_users_gender ON users(gender);Use Case: Efficient for:
SELECT COUNT(*) FROM users WHERE gender = 'M' OR gender = 'F';- ✅ Index on filtered rows only
- 🔍 Reduces index size & improves performance
Example (PostgreSQL):
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';Use Case:
SELECT * FROM users WHERE status = 'active' AND email = 'a@example.com';→ Index is used only for active users.
- ✅ Index includes all columns needed by the query
- 🚫 Avoids going to the table → faster
Example (MySQL):
CREATE INDEX idx_orders_user_date_total ON orders(user_id, order_date, total);Use Case:
SELECT user_id, order_date, total FROM orders
WHERE user_id = 101 AND order_date = '2023-01-01';→ Since all selected columns are in the index, no table lookup needed.
| Type | Best For / When to Use |
|---|---|
| Single Column | Simple queries filtering on one column |
| Composite | Queries filtering/sorting by multiple columns (order matters) |
| Unique | Enforce uniqueness |
| Full-text | Search in large text data |
| Clustered | Fast primary key access (data stored in index order) |
| Non-Clustered | General-purpose filtering/search |
| Bitmap | Low-cardinality columns (like 'M'/'F', 'Y'/'N') |
| Partial | Index only where specific conditions apply |
| Covering | Query can be answered fully from the index |
✅ Use index when:
- The column is used in
WHERE,JOIN,ORDER BY, orGROUP BY - You frequently filter or sort by that column
- You want to speed up search on large tables
-- Frequently searched column
SELECT * FROM orders WHERE customer_id = 5;➡️ Create index on customer_id
❌ Avoid indexes on:
- Columns with highly unique queries on small tables (not much benefit)
- Frequently updated columns (index overhead)
- Low-cardinality columns (e.g.,
gender = 'M' or 'F') – not efficient - Temporary or staging tables
- ⚡ Faster SELECT queries
- 🔄 Speeds up joins and filters
- 📚 Can reduce disk I/O
- 🧠 Smart databases can choose the best index via query planner
- 🐌 Slower INSERT/UPDATE/DELETE – because the index must also be updated
- 🧠 Takes memory and disk space
- ❗ Can be unused if your query doesn’t match how the index is structured
- 🧼 Requires maintenance (stats, rebuild, etc.)
✅ Good:
CREATE INDEX idx_user_email ON users(email);
SELECT * FROM users WHERE email = 'test@example.com';❌ Bad:
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Index may not be used if function is applied to the columnTo fix:
-- Some DBs support function-based indexes
CREATE INDEX idx_lower_email ON users(LOWER(email));-- In PostgreSQL
\d tablename
-- In MySQL
SHOW INDEX FROM tablename;
-- Explain plan
EXPLAIN SELECT * FROM table WHERE ...These show whether an index is being used.
DROP INDEX idx_customer_name ON customers;| Topic | Key Takeaway |
|---|---|
| What | Index = Shortcut to quickly find data |
| Why | Speeds up SELECTs, JOINs, WHEREs |
| How | CREATE INDEX idx_name ON table(column); |
| Types | Single, Composite, Unique, Full-text, Clustered, etc. |
| Use When | Columns are often filtered, joined, or sorted |
| Avoid When | Small tables, low-cardinality columns, frequent writes |
| Downsides | Slower writes, more disk space |
Great question! Understanding how indexes work internally helps you write better, faster SQL queries. Let’s dive into it.
- Used by: PostgreSQL, MySQL (InnoDB), Oracle, SQL Server
- Default index type in most RDBMS
A B-Tree is a self-balancing tree data structure where:
-
Data is stored in sorted order
-
Each node contains multiple keys and child pointers
-
Allows logarithmic time complexity:
O(log n)for search, insert, delete
-
It keeps the tree height low, so less disk access is needed.
- Binary search is done at each level
- Branching factor is high (like 100s of keys per node)
- Disk reads are minimized (fewer levels to traverse)
Let’s say you run:
SELECT * FROM users WHERE email = 'a@example.com';If email has a B-Tree index:
- DB does binary search at the root
- Follows the appropriate pointer to the next level
- Repeats until it finds the correct leaf node that stores the actual row’s pointer
- Fetches the row from the table
- Used by: PostgreSQL (manually), MySQL (Memory engine)
- Fast for exact match lookups, like
WHERE id = 123 - Not used for range queries (
<,>,BETWEEN,LIKE)
- Not sorted → can’t do range scans
- Rebuilding needed on crash or upgrade in some engines (like older PostgreSQL versions)
-
Used by: Oracle, PostgreSQL (extensions)
-
Best for low-cardinality columns (like gender: M/F)
-
Uses bitmaps instead of row pointers:
- Each value gets a bitmap representing which rows contain it
- Efficient for complex
OR,ANDconditions
-
Used in PostgreSQL for advanced data types:
- Arrays
- JSON
- Geospatial data
-
GIN: Generalized Inverted Index – great for full-text search
-
R-Tree: used in spatial databases (like for maps, bounding boxes)
Here’s the secret:
-
Avoid full table scans
- Without an index: scan every row (O(n))
- With index: go directly (O(log n) for B-Tree)
-
Minimized disk I/O
- B-Tree is designed to work well with disk blocks
- Fewer reads = faster performance
-
Sorted order helps with range queries
- Useful for
ORDER BY,BETWEEN, etc.
- Useful for
Imagine you have a B-Tree index on age:
[30, 60]
/ | \
<30 30-59 >=60
[10,20] [35,45] [65,80]
To find age = 45:
- Start at root
[30, 60] - Go to middle branch (30–59)
- Then to leaf
[35,45] - Find row with 45 → return result
This is logarithmic and super efficient.
| Topic | Details |
|---|---|
| Default Structure | B-Tree – fast for equality + range queries |
| Exact Match Use | Hash Index |
| Low-Cardinality | Bitmap Index |
| Full-Text Search | GIN |
| Geospatial | R-Tree |
| Speed Reason | Binary search, sorted data, low tree height, minimal disk I/O |