home writings projects

Mastering Database Indexes: When and How to Use Them

Nov 14, 2024

·

5 min read

tl;dr: A comprehensive guide to database indexing strategies, performance optimization, and common pitfalls to avoid when designing high-performance database schemas.

Database indexes are one of the most powerful tools for optimizing query performance, yet they’re often misunderstood or misused. In this guide, we’ll explore when to use indexes, what types exist, and how to avoid common pitfalls.

What Are Database Indexes?

Think of a database index like a book’s index. Instead of reading every page to find information, you consult the index to jump directly to the relevant pages. Similarly, database indexes allow the database engine to quickly locate rows without scanning the entire table.

Types of Indexes

1. B-Tree Indexes (Default)

The most common index type, perfect for equality and range queries.

CREATE INDEX idx_users_email ON users(email);

-- Optimizes queries like:
SELECT * FROM users WHERE email = '[email protected]';
SELECT * FROM users WHERE created_at > '2024-01-01';

Best for:

  • Equality comparisons (=)
  • Range queries (>, <, BETWEEN)
  • Sorting (ORDER BY)
  • Prefix matching (LIKE 'prefix%')

2. Hash Indexes

Optimized for exact match lookups, but can’t handle range queries.

CREATE INDEX idx_users_id_hash ON users USING HASH(id);

-- Fast for:
SELECT * FROM users WHERE id = 12345;

-- Won't help:
SELECT * FROM users WHERE id > 12345;

3. Composite Indexes

Indexes on multiple columns. Column order matters!

CREATE INDEX idx_users_country_city ON users(country, city);

-- Optimizes:
SELECT * FROM users WHERE country = 'US' AND city = 'NYC';
SELECT * FROM users WHERE country = 'US';

-- Doesn't optimize:
SELECT * FROM users WHERE city = 'NYC'; -- city is not first

Rule of thumb: Put the most selective column first.

4. Partial Indexes

Index only rows that match a condition, saving space.

CREATE INDEX idx_active_users ON users(email)
WHERE is_active = true;

-- Only indexes active users

5. Full-Text Indexes

For text search operations.

CREATE FULLTEXT INDEX idx_posts_content ON posts(content);

-- Enables:
SELECT * FROM posts WHERE MATCH(content) AGAINST('database performance');

When to Add Indexes

✅ You Should Index:

  1. Primary Keys (automatic in most databases)
  2. Foreign Keys (for JOIN performance)
  3. Columns in WHERE clauses (frequently filtered)
  4. Columns in ORDER BY (sorting operations)
  5. Columns in JOIN conditions

❌ You Shouldn’t Index:

  1. Small tables (< 1000 rows) - full table scan is faster
  2. High cardinality columns with frequent updates - index maintenance overhead
  3. Columns with low selectivity (e.g., boolean flags in large tables)
  4. Columns rarely used in queries

Real-World Example

Let’s optimize a slow query:

-- Slow query (1.2 seconds)
SELECT * FROM orders
WHERE user_id = 123
  AND status = 'pending'
  AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 10;

-- Add composite index
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);

-- Same query now runs in 15ms

Index Maintenance Costs

Indexes aren’t free. Every INSERT, UPDATE, or DELETE must update all relevant indexes.

Example:

-- Table with 5 indexes
INSERT INTO users (email, name, country, city, created_at)
VALUES ('[email protected]', 'John', 'US', 'NYC', NOW());

-- Behind the scenes:
-- 1. Insert row into table
-- 2. Update index on email
-- 3. Update index on country
-- 4. Update index on city
-- 5. Update index on created_at
-- 6. Update composite index on (country, city)

Rule: Only create indexes you actually need.

Analyzing Index Performance

PostgreSQL

-- Show query execution plan
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = '[email protected]';

-- Check index usage
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0; -- Unused indexes

MySQL

-- Show query execution plan
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- Check index statistics
SHOW INDEX FROM users;

Common Pitfalls

1. Over-Indexing

-- Bad: Too many indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_name ON users(name);
CREATE INDEX idx_users_city ON users(city);
CREATE INDEX idx_users_country ON users(country);
CREATE INDEX idx_users_email_name ON users(email, name);
-- ... 15 more indexes

-- Result: Slow writes, wasted space

2. Wrong Column Order in Composite Indexes

-- Bad: Low selectivity column first
CREATE INDEX idx_orders_status_user ON orders(status, user_id);
-- status has only 3-4 values, user_id is highly selective

-- Good: High selectivity column first
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

3. Function-Based Queries Without Matching Indexes

-- Index won't be used
SELECT * FROM users WHERE LOWER(email) = '[email protected]';

-- Solution: Create function-based index
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Best Practices

  1. Start with no indexes (except primary key), add as needed
  2. Monitor slow query logs to identify bottlenecks
  3. Use EXPLAIN to verify index usage
  4. Test with production-like data volumes
  5. Regularly review and remove unused indexes
  6. Consider covering indexes for frequently accessed columns
  7. Don’t index for micro-optimizations - focus on actual bottlenecks

Covering Indexes

A covering index includes all columns needed by a query, avoiding table lookups.

-- Query needs: id, email, name
SELECT id, email, name FROM users WHERE email = '[email protected]';

-- Create covering index
CREATE INDEX idx_users_email_covering ON users(email, id, name);

-- Database can answer query using only the index

Conclusion

Indexes are powerful but come with trade-offs. The key is finding the right balance:

  • Too few indexes: Slow reads
  • Too many indexes: Slow writes, wasted space

Start simple, measure performance, and add indexes strategically based on real usage patterns. Your database—and your users—will thank you.


Further Reading:

  • Understanding B-Tree index internals
  • Advanced indexing strategies for time-series data
  • Index-only scans and performance optimization

Tools:

  • pg_stat_statements (PostgreSQL)
  • EXPLAIN ANALYZE (PostgreSQL/MySQL)
  • Database query profilers (DataGrip, pgAdmin)