Part 3 - Deep Dive: How Database Indexing Works (The B-Tree)

16. Visualizing the “Freed Up” Space

Let’s visualize a row of pointers inside our 8 KB Page. By shrinking each “seat” from 8 bytes to 6 bytes, we allow the next pointer to start sooner.

Scenario A: 8-byte Pointers [Ptr 1: 8 bytes] [Ptr 2: 8 bytes] [Ptr 3: 8 bytes] Total used for 3 pointers: 24 bytes.

Scenario B: 6-byte Pointers [Ptr 1: 6 bytes] [Ptr 2: 6 bytes] [Ptr 3: 6 bytes] [Ptr 4: 6 bytes] Total used for 4 pointers: 24 bytes.

The “Free” Pointer: In the same 24 bytes of space, the 6-byte strategy fits one extra pointer. Scale this across an 8,192-byte page, and you gain roughly 137 extra entries per page.

17. The Bottleneck Reality

Why go through all this trouble to save 2 bytes? Because of the Latency Gap:

Reading from RAM: Takes ~100 nanoseconds.
Reading from Disk (SSD): Takes ~100,000 nanoseconds.

Reading from a disk is 1,000 times slower than reading from RAM. If saving 2 bytes per pointer allows the database to find your data in 3 disk reads instead of 4, you have saved 100,000 nanoseconds of time. In the world of database performance, we always trade CPU cycles to save Disk I/O.

Summary Checklist for Database Indexing

Logical Unit: Data is moved in 8 KB Pages.
OS Unit: Ubuntu moves memory in 4 KB Pages.
Pointer Size: 6 Bytes on disk (addresses up to 256 TB).
Hardware Alignment: 8 Bytes in RAM (64-bit CPU standard).
The Goal: Maximize Page Density to keep the B-Tree as short as possible.

18. The “Box in a Room” Analogy: Understanding Density

To visualize how a database index scales, imagine a room that is exactly 8,192 square feet (our 8 KB Page). Your goal is to fit as many “Index Boxes” as possible into this room.

Each box consists of two parts:

The Value: The data you are indexing (e.g., an ID or a Name).
The Pointer: The 6-byte “address” telling you where the full row lives.

Scenario A: The Lean Integer (4 Bytes)

Box Width: 4B (Value) + 6B (Pointer) = 10 bytes.
Room Capacity: $8192 / 10 = \mathbf{819}$ boxes.
The Result: High density. The database can find millions of rows in just 3 “jumps.”

Scenario B: The Bulky UUID (16 Bytes)

Many modern systems use UUIDs (Universally Unique Identifiers) because they are great for distributed systems. But they are “wide.”

Box Width: 16B (Value) + 6B (Pointer) = 22 bytes.
Room Capacity: $8192 / 22 \approx \mathbf{372}$ boxes.
The Result: Your capacity has dropped by over 50%.

19. The “Multiplier Effect” of Fan-out

Why does it matter if we have 819 boxes versus 372? It’s because of Fan-out. Every “box” in the Root node points to a whole new page.

Look at how the total capacity of a 2-level tree changes based on box size:

Box Size	Pointers per Page	Total Rows Found (2 Levels)
10 Bytes (Integer)	819	$819 \times 819 = \mathbf{670,761}$
22 Bytes (UUID)	372	$372 \times 372 = \mathbf{138,384}$

The Deep Insight: By choosing a wider data type (like UUID or a long String), your index becomes less efficient. The tree grows taller, requiring more “jumps” to the disk.

In computer time:

3 Jumps: Fast (likely fits in RAM).
5 Jumps: Slower (requires more trips to the physical SSD).

20. Conclusion: The Deep Planner’s Rules for Indexing

After diving deep into the hardware, the OS, and the math, we can summarize the “laws” of database performance:

The 8 KB Page is Fixed: Ubuntu always moves 4 KB blocks, and the database always groups them into 8 KB logical units. We cannot change the “room” size, only how we pack it.
Pointer Size is a Trade-off: We use 6-byte pointers instead of 8-byte pointers to lower the “denominator” and increase density.
Data Width is King: Small data types (Integers) make for “fat, short” trees that are incredibly fast. Large data types (UUIDs, Strings) make for “skinny, tall” trees that are slower.
Disk is the Bottleneck: Every design choice—from the 6-byte pointer to the 8 KB page—is made to avoid “Disk I/O.” Reading from an SSD is 1,000x slower than reading from RAM.

Final Pro-Tip: When designing a database, always choose the smallest data type that safely fits your data. You aren’t just saving disk space; you are increasing the “Fan-out” of your index and making your entire application faster.

21. The Chaos of the “Page Split”

What happens when our 8 KB “room” is perfectly packed with 819 boxes, and we try to shove one more inside? The database cannot simply “push” the other boxes; that would be too slow. Instead, it performs a Page Split.

How a Page Split Works:

Allocation: The DB creates a brand new, empty 8 KB page.
The 50/50 Rule: It moves the bottom 50% of the entries from the full page into the new page.
Insertion: It drops the new ID into the now-available space.
The Link: It updates the “Parent” node to point to this new child page.

The Performance Hit: While a SELECT only reads data, an INSERT that triggers a split must write two new pages to the disk and update the index hierarchy. This is why “heavy” indexing makes writes slower.

22. Case Study: The UUID Performance Trap

If you use Random UUIDs (v4) as primary keys in PostgreSQL, you are likely hitting a performance wall. Based on our “Deep Planning” math, here is why:

1. The “Width” Problem

A UUID is 16 bytes, whereas a standard Integer is 4 bytes.

Integer Index: ~819 entries per page.
UUID Index: ~372 entries per page.
Result: Your index is 2x larger and the tree is taller, requiring more Disk I/O.

2. The “Locality” Problem (Fragmentation)

Sequential IDs (1, 2, 3…): These always land at the “Right-Hand Edge” of the tree. The old pages stay perfectly packed and the DB only works on the last page.
Random UUIDs (A, Z, B…): These land in the middle of random pages. This triggers constant Page Splits across the entire index, leaving your pages half-empty (fragmented).

3. The RAM Exhaustion

Because random UUIDs can land anywhere, PostgreSQL must keep the entire index in RAM (the Buffer Pool) to stay fast. Once your UUID index grows larger than your available RAM, performance “falls off a cliff” as every single insert starts requiring a slow Disk Read.

23. The Modern Solution: UUID v7

If you need the uniqueness of a UUID but the speed of an Integer, use UUID v7.

The Secret: The first 48 bits of a UUID v7 are a Timestamp.
The Benefit: They are “Time-Ordered.” To the database, they behave like an Auto-Incrementing Integer, always landing at the end of the index and preventing chaotic Page Splits.

Final Post Summary: The Deep Planner’s Checklist

Feature	The Goal	The “Deep” Reason
Page Size	8 KB	Matches the OS (2x 4KB) while maximizing pointer density.
Pointer Size	6 Bytes	Large enough for 256TB, small enough to fit ~20% more data.
Key Type	Sequential	Prevents Page Splits and keeps index pages “hot” in RAM.
Data Width	Small	Maximizes Fan-out, keeping the search tree short and fast.

24. The Diagnosis: My PostgreSQL uses UUID v4

If you find that your primary keys are UUID v4, your database is currently playing a game of “Random Roulette” with your storage.

The Problem: Randomness vs. The 8 KB Page

Every time you insert a new UUID v4, PostgreSQL picks a random page in your index.

If the page is full: It triggers a Page Split, moving 4 KB of data to make room for one 16-byte ID.
If the page is in RAM: Great!
If the page is on Disk: The database must “seek” a random location on your SSD, which is much slower than reading sequential data.

25. The Solution: Transitioning to UUID v7

You don’t need to rebuild your entire database to fix this. You can start “healing” the index today by switching new inserts to UUID v7.

Why UUID v7 Wins

A UUID v7 looks exactly like a v4 (128 bits), but the first 48 bits are a Timestamp.

To the User: It looks like a random string.
To the B-Tree: It behaves like a sequence. Because the timestamps always increase, new IDs always land on the “Right-Hand Edge” of the tree, filling up the 8 KB pages perfectly and ending the cycle of random Page Splits.

Step-by-Step Implementation in PostgreSQL

1. Create the Generator

Since native support for v7 is coming in PostgreSQL 18, we can use a PL/pgSQL function to generate them now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
CREATE OR REPLACE FUNCTION uuid_generate_v7()
RETURNS uuid AS $$
DECLARE
  unix_ts_ms bytea;
  uuid_bytes bytea;
BEGIN
  -- 1. Get current timestamp in milliseconds (48 bits)
  unix_ts_ms := decode(lpad(to_hex(floor(extract(epoch from clock_timestamp()) * 1000)::bigint), 12, '0'), 'hex');
  -- 2. Generate random bytes for the rest
  uuid_bytes := unix_ts_ms || gen_random_bytes(10);
  -- 3. Set version (7) and variant bits
  uuid_bytes := set_byte(uuid_bytes, 6, (get_byte(uuid_bytes, 6) & 15) | 112);
  uuid_bytes := set_byte(uuid_bytes, 8, (get_byte(uuid_bytes, 8) & 63) | 128);
  RETURN encode(uuid_bytes, 'hex')::uuid;
END;
$$ LANGUAGE plpgsql VOLATILE;

The Final Result: A Balanced Ecosystem

By understanding the interaction between the 48-bit pointer, the 8 KB Database Page, and the 4 KB OS block, we have moved from a fragmented, slow index to a high-density, sequential one.

Density: We fit the maximum number of entries per page.

Locality: We keep related data physically close on the disk.

Performance: We minimize Disk I/O, ensuring our 64-bit CPU spent more time processing and less time waiting for the "slow" storage.

Final Note: If you want to "clean up" the mess left behind by old UUID v4s, run REINDEX TABLE your_table_name; during a maintenance window. This will repack all those 8 KB pages with maximum density.


## 27. Fixing the Ambiguous Join & Bloat Analysis

When querying PostgreSQL system catalogs, columns like indexrelid appear in multiple tables. You must use Aliases (e.g., s. or i.) to tell Postgres exactly which table to look at.

The Corrected Bloat & Density Query

Run this to see how your 8 KB pages are actually performing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
SELECT
    s.schemaname, 
    s.relname AS tablename, 
    s.indexrelname,
    -- Estimated density: Ratio of used space vs the 8KB page limit
    round((c.reltuples * 12 / 8192.0), 2) AS density_score,
    pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
    pg_relation_size(s.indexrelid) / 8192 AS total_8kb_pages
FROM pg_stat_user_indexes s
JOIN pg_class c ON s.indexrelid = c.oid
JOIN pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan > 0;

16. Visualizing the “Freed Up” Space#

17. The Bottleneck Reality#

Summary Checklist for Database Indexing#

18. The “Box in a Room” Analogy: Understanding Density#

Scenario A: The Lean Integer (4 Bytes)#

Scenario B: The Bulky UUID (16 Bytes)#

19. The “Multiplier Effect” of Fan-out#

20. Conclusion: The Deep Planner’s Rules for Indexing#

21. The Chaos of the “Page Split”#

How a Page Split Works:#

22. Case Study: The UUID Performance Trap#

1. The “Width” Problem#

2. The “Locality” Problem (Fragmentation)#

3. The RAM Exhaustion#

23. The Modern Solution: UUID v7#

Final Post Summary: The Deep Planner’s Checklist#

24. The Diagnosis: My PostgreSQL uses UUID v4#

The Problem: Randomness vs. The 8 KB Page#

25. The Solution: Transitioning to UUID v7#

Why UUID v7 Wins#

Step-by-Step Implementation in PostgreSQL#

1. Create the Generator#

The Corrected Bloat & Density Query#