Skip to content

Latest commit

 

History

History
521 lines (343 loc) · 12.9 KB

File metadata and controls

521 lines (343 loc) · 12.9 KB

✅ What is an Index in SQL?

An index in SQL is like an index in a book — it helps the database find rows faster without scanning the entire table.

📘 Imagine you’re searching for "Spring Boot" in a 1,000-page book:

  • Without an index: You flip every page one by one.
  • With an index: You go straight to page 573.

In SQL, the same thing happens with rows in a table.


🔍 Why Use Indexes?

  • Speed up queries (especially those with WHERE, JOIN, ORDER BY)
  • Improve performance on large tables
  • Help databases avoid full table scans

🛠️ How to Create an Index?

🧱 Basic Syntax:

CREATE INDEX index_name ON table_name (column1, column2, ...);

✅ Example:

CREATE INDEX idx_customer_name ON customers(name);

Now, this helps:

SELECT * FROM customers WHERE name = 'John';

go much faster.


🔢 Types of Indexes

Type Use Case
Single Column Index Index on one column
Composite Index Index on multiple columns
Unique Index Ensures values are unique (like a unique constraint)
Full-text Index For fast text search (e.g., searching keywords in articles)
Clustered Index Data rows are physically stored in index order (SQL Server, MySQL InnoDB)
Non-Clustered Index Index is separate from the actual data
Bitmap Index Efficient for columns with few distinct values (mainly in data warehousing)
Partial Index Index on a filtered subset of rows (e.g., only where status = 'active')
Covering Index Contains all columns needed by the query so table access is avoided

Great! Let’s go through each index type one by one with:

  • Simple Explanation
  • 🧠 Use Case
  • 🧾 Example

🔹 1. Single Column Index

  • ✅ Index on one column
  • ⚡ Used for filtering or searching on that column

Example:

CREATE INDEX idx_users_email ON users(email);

Use Case: Search users by email quickly:

SELECT * FROM users WHERE email = 'a@example.com';

🔹 2. Composite Index (Multi-column)

  • ✅ Index on multiple columns
  • ⚠️ Order of columns matters

Example:

CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);

Use Case: This index helps with:

-- Uses full index (both user_id and order_date)
SELECT * FROM orders WHERE user_id = 5 AND order_date = '2023-01-01';

-- Uses only leading column (user_id)
SELECT * FROM orders WHERE user_id = 5;

-- ❌ This query won’t use the index
SELECT * FROM orders WHERE order_date = '2023-01-01';

🔹 3. Unique Index

  • ✅ Ensures all values are unique
  • 🔒 Like a UNIQUE constraint, but can be added explicitly

Example:

CREATE UNIQUE INDEX idx_unique_username ON users(username);

Use Case: Prevent duplicate usernames:

INSERT INTO users(username) VALUES ('john_doe');  -- allowed
INSERT INTO users(username) VALUES ('john_doe');  -- ❌ error

🔹 4. Full-text Index

  • ✅ Optimized for text search
  • 🔍 Useful for searching keywords or phrases in large text columns

Example (MySQL):

CREATE FULLTEXT INDEX idx_articles_content ON articles(content);

Use Case:

SELECT * FROM articles
WHERE MATCH(content) AGAINST('database indexing');

🔹 5. Clustered Index

  • ✅ Data is physically stored in index order

  • 📦 Only one clustered index per table

  • Used by default in:

    • MySQL InnoDB (by primary key)
    • SQL Server (can define manually)

Example (SQL Server):

CREATE CLUSTERED INDEX idx_orders_id ON orders(order_id);

Use Case: Speeds up retrieval by primary key, since data is already sorted that way.


🔹 6. Non-Clustered Index

  • ✅ Index stored separately from data
  • 📍 Has a pointer to actual row (row ID or primary key)
  • You can have many non-clustered indexes per table

Example (SQL Server):

CREATE NONCLUSTERED INDEX idx_users_status ON users(status);

Use Case:

SELECT * FROM users WHERE status = 'active';

🔹 7. Bitmap Index

  • ✅ Uses bits instead of row pointers
  • 🔁 Great for low-cardinality columns like gender, status
  • 📊 Used mostly in data warehouses

Example (Oracle):

CREATE BITMAP INDEX idx_users_gender ON users(gender);

Use Case: Efficient for:

SELECT COUNT(*) FROM users WHERE gender = 'M' OR gender = 'F';

🔹 8. Partial Index

  • ✅ Index on filtered rows only
  • 🔍 Reduces index size & improves performance

Example (PostgreSQL):

CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Use Case:

SELECT * FROM users WHERE status = 'active' AND email = 'a@example.com';

→ Index is used only for active users.


🔹 9. Covering Index

  • ✅ Index includes all columns needed by the query
  • 🚫 Avoids going to the table → faster

Example (MySQL):

CREATE INDEX idx_orders_user_date_total ON orders(user_id, order_date, total);

Use Case:

SELECT user_id, order_date, total FROM orders
WHERE user_id = 101 AND order_date = '2023-01-01';

→ Since all selected columns are in the index, no table lookup needed.


✅ Summary Table

Type Best For / When to Use
Single Column Simple queries filtering on one column
Composite Queries filtering/sorting by multiple columns (order matters)
Unique Enforce uniqueness
Full-text Search in large text data
Clustered Fast primary key access (data stored in index order)
Non-Clustered General-purpose filtering/search
Bitmap Low-cardinality columns (like 'M'/'F', 'Y'/'N')
Partial Index only where specific conditions apply
Covering Query can be answered fully from the index

🕵️‍♂️ When Should You Create an Index?

✅ Use index when:

  • The column is used in WHERE, JOIN, ORDER BY, or GROUP BY
  • You frequently filter or sort by that column
  • You want to speed up search on large tables

Example:

-- Frequently searched column
SELECT * FROM orders WHERE customer_id = 5;

➡️ Create index on customer_id


🚫 When NOT to Use an Index

❌ Avoid indexes on:

  • Columns with highly unique queries on small tables (not much benefit)
  • Frequently updated columns (index overhead)
  • Low-cardinality columns (e.g., gender = 'M' or 'F') – not efficient
  • Temporary or staging tables

⭐ Benefits of Indexes

  • ⚡ Faster SELECT queries
  • 🔄 Speeds up joins and filters
  • 📚 Can reduce disk I/O
  • 🧠 Smart databases can choose the best index via query planner

⚠️ Side Effects (Disadvantages) of Indexes

  • 🐌 Slower INSERT/UPDATE/DELETE – because the index must also be updated
  • 🧠 Takes memory and disk space
  • ❗ Can be unused if your query doesn’t match how the index is structured
  • 🧼 Requires maintenance (stats, rebuild, etc.)

🧠 Tips for Using Indexes Effectively

✅ Good:

CREATE INDEX idx_user_email ON users(email);
SELECT * FROM users WHERE email = 'test@example.com';

❌ Bad:

SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Index may not be used if function is applied to the column

To fix:

-- Some DBs support function-based indexes
CREATE INDEX idx_lower_email ON users(LOWER(email));

📋 How to See and Analyze Indexes?

-- In PostgreSQL
\d tablename

-- In MySQL
SHOW INDEX FROM tablename;

-- Explain plan
EXPLAIN SELECT * FROM table WHERE ...

These show whether an index is being used.


🔄 How to Drop an Index?

DROP INDEX idx_customer_name ON customers;

✅ Final Summary

Topic Key Takeaway
What Index = Shortcut to quickly find data
Why Speeds up SELECTs, JOINs, WHEREs
How CREATE INDEX idx_name ON table(column);
Types Single, Composite, Unique, Full-text, Clustered, etc.
Use When Columns are often filtered, joined, or sorted
Avoid When Small tables, low-cardinality columns, frequent writes
Downsides Slower writes, more disk space

Great question! Understanding how indexes work internally helps you write better, faster SQL queries. Let’s dive into it.


✅ What Data Structures Do Indexes Use Internally?

🔷 1. B-Tree (Balanced Tree) – Most Common

  • Used by: PostgreSQL, MySQL (InnoDB), Oracle, SQL Server
  • Default index type in most RDBMS

💡 What is a B-Tree?

A B-Tree is a self-balancing tree data structure where:

  • Data is stored in sorted order

  • Each node contains multiple keys and child pointers

  • Allows logarithmic time complexity:

    • O(log n) for search, insert, delete
  • It keeps the tree height low, so less disk access is needed.

✅ Why B-Tree is Fast:

  • Binary search is done at each level
  • Branching factor is high (like 100s of keys per node)
  • Disk reads are minimized (fewer levels to traverse)

📍 How It Locates Data:

Let’s say you run:

SELECT * FROM users WHERE email = 'a@example.com';

If email has a B-Tree index:

  1. DB does binary search at the root
  2. Follows the appropriate pointer to the next level
  3. Repeats until it finds the correct leaf node that stores the actual row’s pointer
  4. Fetches the row from the table

🔷 2. Hash Index

  • Used by: PostgreSQL (manually), MySQL (Memory engine)
  • Fast for exact match lookups, like WHERE id = 123
  • Not used for range queries (<, >, BETWEEN, LIKE)

❌ Drawbacks:

  • Not sorted → can’t do range scans
  • Rebuilding needed on crash or upgrade in some engines (like older PostgreSQL versions)

🔷 3. Bitmap Index

  • Used by: Oracle, PostgreSQL (extensions)

  • Best for low-cardinality columns (like gender: M/F)

  • Uses bitmaps instead of row pointers:

    • Each value gets a bitmap representing which rows contain it
    • Efficient for complex OR, AND conditions

🔷 4. GiST / GIN / SP-GiST / R-Tree

  • Used in PostgreSQL for advanced data types:

    • Arrays
    • JSON
    • Geospatial data
  • GIN: Generalized Inverted Index – great for full-text search

  • R-Tree: used in spatial databases (like for maps, bounding boxes)


🚀 Why Are Indexes So Fast?

Here’s the secret:

  1. Avoid full table scans

    • Without an index: scan every row (O(n))
    • With index: go directly (O(log n) for B-Tree)
  2. Minimized disk I/O

    • B-Tree is designed to work well with disk blocks
    • Fewer reads = faster performance
  3. Sorted order helps with range queries

    • Useful for ORDER BY, BETWEEN, etc.

🧠 Example: B-Tree Visualization

Imagine you have a B-Tree index on age:

             [30, 60]
           /     |     \
        <30     30-59    >=60
      [10,20]  [35,45]   [65,80]

To find age = 45:

  • Start at root [30, 60]
  • Go to middle branch (30–59)
  • Then to leaf [35,45]
  • Find row with 45 → return result

This is logarithmic and super efficient.


✅ Summary

Topic Details
Default Structure B-Tree – fast for equality + range queries
Exact Match Use Hash Index
Low-Cardinality Bitmap Index
Full-Text Search GIN
Geospatial R-Tree
Speed Reason Binary search, sorted data, low tree height, minimal disk I/O