Part 4 - Deep Dive: How Database Indexing Works (The B-Tree)

Continuation to the previous blog.

28. The “Heavy Data” Reality: TSVectors

While working on my topic table, I noticed I’m building a search_vector column using to_tsvector(). That immediately adds a completely new layer to the “8 KB page” math I discussed earlier.

Until now, my examples were based on small index keys like UUIDs.

But full-text search is a different beast.

Why Full-Text Search Becomes Heavy

A UUID is small:

UUID size: 16 bytes

But a tsvector generated from a description field can easily become:

500+ bytes, sometimes much more depending on the text

This changes everything.

The Page Density Problem Gets Worse

PostgreSQL pages are still 8 KB (8192 bytes).

So earlier, I could theoretically fit hundreds of UUID entries into a single index page.

But with TSVectors, the math collapses:

Instead of fitting 800+ entries per page,
a full-text index (especially a GIN index) might only fit 10–15 entries per page

What This Means in Real Life

This is why Full-Text Search is “heavy”:

The disk must fetch more pages
PostgreSQL must process larger blocks
CPU overhead increases because large index structures must be traversed

In short: full-text search is powerful, but it’s not “free”.

29. Final Implementation Checklist (Postgres 17.7)

At this stage, I already have pgcrypto installed and I’ve tested UUID generation successfully. Now I just need to apply the new standard consistently across my main tables.

Here’s the final checklist I’m following to migrate my tables (agent, topic, organisation) into UUID v7 defaults.

Migration Checklist

Step	Action	SQL Command
1	Apply Default	`ALTER TABLE agent ALTER COLUMN id SET DEFAULT generate_uuid_v7();`
2	Apply Default	`ALTER TABLE topic ALTER COLUMN id SET DEFAULT generate_uuid_v7();`
3	Repack Pages	`REINDEX TABLE agent;`
4	Verify	Run `pgstatindex()` again and check density

This ensures that all new inserts follow the correct UUID ordering behavior, and the index stops degrading over time.

30. The “Smoking Gun”: Explaining the 1.08% Density

When I ran pgstatindex() on my organisation table, the output showed:

avg_leaf_density: 1.08

This is basically the ultimate proof of the UUID v4 problem.

The Math Behind 1.08%

PostgreSQL index pages are 8192 bytes (8 KB).

So if my average leaf density is 1.08%, the actual space being used is:

Used Space:
$8192 \times 0.0108 = 88.47 \text{ bytes}$
Wasted Space:
$8192 - 88.47 \approx 8103 \text{ bytes}$

That means PostgreSQL allocated a full 8 KB page, but I’m only storing around 88 bytes of useful index data inside it.

Why This Happens

Because UUID v4 values are random, inserts get scattered across the B-Tree.

PostgreSQL keeps opening new pages to place new keys, but it can’t fill them sequentially. So I end up with thousands of nearly-empty index pages.

This is exactly how index bloat starts.

31. Fixing My Correlation Query

After seeing the density issue, I wanted to check another critical metric: correlation.

Correlation tells me how closely the physical order of rows on disk matches the index order.

This is a major planner performance indicator.

My original query failed because I mistakenly used public.organisation as the table name.

But pg_stats.tablename does not store schema-qualified names.

Correct Query

1
2
3
4
5
6
7
SELECT 
    tablename, 
    attname AS column_name, 
    correlation 
FROM pg_stats 
WHERE tablename = 'organisation'
AND schemaname = 'public';

How I Interpret Correlation Near 1.0 → nearly perfect sequential order (best case) Near 0.0 → totally random physical order (UUID v4 chaos) Near -1.0 → perfectly ordered but reversed

This metric is extremely useful because it tells me whether index scans will be cheap or whether PostgreSQL will end up jumping all over the disk.

32. Final Comparison: Topic vs Organisation

Once I understood the numbers, the difference between integer keys and random UUIDs became painfully obvious.

Feature	topic table	organisation table
ID Type	Integer (Serial)	UUID v4 (Random)
Insert Pattern	Sequential	Random scatter
Page Filling Strategy	Fills pages efficiently	Leaves pages mostly empty
Density Score	~9.62% (new table effect)	1.08% (serious scatter)
Verdict	Naturally healthy	Needs UUID v7

What Switching to UUID v7 Fixes

By moving organisation.id to generate_uuid_v7(), I’m ensuring that:

inserts become naturally ordered
correlation improves over time
index pages fill sequentially instead of fragmenting

In other words, I fixed the DNA of my primary key strategy before the system grows large.

33. Understanding the “High Bloat” Alarm

When I ran a bloat query on my database, I noticed almost every table and index was marked as HIGH BLOAT.

At first glance, that looks alarming.

But after looking closer, I realized this is mostly an illusion caused by a small dataset.

The “Empty Parking Garage” Effect

Most of my indexes were exactly 16 KB.

PostgreSQL indexes have a minimum footprint:

Page 1 (8 KB): Meta page (index metadata)
Page 2 (8 KB): First leaf page (actual key storage)

So even with just a few rows, PostgreSQL still allocates at least 2 pages.

If I only have 5 rows and they take ~100 bytes, then:

100 bytes used inside an 8192-byte leaf page
density becomes around 1.2%

So the bloat query screams “HIGH BLOAT”, but the truth is:

the index is not bloated — it’s just empty.

34. Current Index Health Report

At this point, my index situation looks like this:

Table Name	Index Name	Size	Density	Status	Verdict
organisation	PK_c725…	16 KB	~1.08%	HIGH BLOAT	Healthy (table too small)
topic	topic_id_seq	16 KB	~9.62%	HIGH BLOAT	Healthy (will fill sequentially)
agent	PK_…	16 KB	Low	HIGH BLOAT	Needs UUID v7 default

When Does Bloat Actually Matter?

Bloat becomes real only when index sizes become significant:

16 KB index with 1% density: not a problem
500 MB index with 1% density: massive waste of RAM and disk I/O

That’s the moment where running REINDEX becomes necessary.

35. The “Search Vector” Performance Check

One thing I wanted to confirm was whether my search_vector column is already inflating the topic table.

Interestingly, even after updating search_vector, the topic table is still around 16 KB.

That’s a good sign:

my search_vector content is still small
the table hasn’t expanded into multiple pages yet
autovacuum hasn’t needed to reshuffle lots of tuples

Verification Query

1
2
3
4
5
6
7
SELECT 
    relname AS table_name, 
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_table_size(relid)) AS data_size,
    pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_stat_user_tables 
WHERE relname = 'topic';

This helps me verify whether the full-text column is growing the table itself or mostly affecting index storage.

That is it for the blog. Thanks for reading till here.

28. The “Heavy Data” Reality: TSVectors#

Why Full-Text Search Becomes Heavy#

The Page Density Problem Gets Worse#

What This Means in Real Life#

29. Final Implementation Checklist (Postgres 17.7)#

Migration Checklist#

30. The “Smoking Gun”: Explaining the 1.08% Density#

The Math Behind 1.08%#

Why This Happens#

31. Fixing My Correlation Query#

Correct Query#

32. Final Comparison: Topic vs Organisation#

What Switching to UUID v7 Fixes#

33. Understanding the “High Bloat” Alarm#

The “Empty Parking Garage” Effect#

34. Current Index Health Report#

When Does Bloat Actually Matter?#

35. The “Search Vector” Performance Check#

Verification Query#