Continuation to the previous blog.


28. The “Heavy Data” Reality: TSVectors

While working on my topic table, I noticed I’m building a search_vector column using to_tsvector(). That immediately adds a completely new layer to the “8 KB page” math I discussed earlier.

Until now, my examples were based on small index keys like UUIDs.

But full-text search is a different beast.

Why Full-Text Search Becomes Heavy

A UUID is small:

  • UUID size: 16 bytes

But a tsvector generated from a description field can easily become:

  • 500+ bytes, sometimes much more depending on the text

This changes everything.

The Page Density Problem Gets Worse

PostgreSQL pages are still 8 KB (8192 bytes).

So earlier, I could theoretically fit hundreds of UUID entries into a single index page.

But with TSVectors, the math collapses:

  • Instead of fitting 800+ entries per page,
  • a full-text index (especially a GIN index) might only fit 10–15 entries per page

What This Means in Real Life

This is why Full-Text Search is “heavy”:

  • The disk must fetch more pages
  • PostgreSQL must process larger blocks
  • CPU overhead increases because large index structures must be traversed

In short: full-text search is powerful, but it’s not “free”.


29. Final Implementation Checklist (Postgres 17.7)

At this stage, I already have pgcrypto installed and I’ve tested UUID generation successfully. Now I just need to apply the new standard consistently across my main tables.

Here’s the final checklist I’m following to migrate my tables (agent, topic, organisation) into UUID v7 defaults.

Migration Checklist

StepActionSQL Command
1Apply DefaultALTER TABLE agent ALTER COLUMN id SET DEFAULT generate_uuid_v7();
2Apply DefaultALTER TABLE topic ALTER COLUMN id SET DEFAULT generate_uuid_v7();
3Repack PagesREINDEX TABLE agent;
4VerifyRun pgstatindex() again and check density

This ensures that all new inserts follow the correct UUID ordering behavior, and the index stops degrading over time.


30. The “Smoking Gun”: Explaining the 1.08% Density

When I ran pgstatindex() on my organisation table, the output showed:

  • avg_leaf_density: 1.08

This is basically the ultimate proof of the UUID v4 problem.

The Math Behind 1.08%

PostgreSQL index pages are 8192 bytes (8 KB).

So if my average leaf density is 1.08%, the actual space being used is:

  • Used Space:
    $8192 \times 0.0108 = 88.47 \text{ bytes}$

  • Wasted Space:
    $8192 - 88.47 \approx 8103 \text{ bytes}$

That means PostgreSQL allocated a full 8 KB page, but I’m only storing around 88 bytes of useful index data inside it.

Why This Happens

Because UUID v4 values are random, inserts get scattered across the B-Tree.

PostgreSQL keeps opening new pages to place new keys, but it can’t fill them sequentially. So I end up with thousands of nearly-empty index pages.

This is exactly how index bloat starts.


31. Fixing My Correlation Query

After seeing the density issue, I wanted to check another critical metric: correlation.

Correlation tells me how closely the physical order of rows on disk matches the index order.

This is a major planner performance indicator.

My original query failed because I mistakenly used public.organisation as the table name.

But pg_stats.tablename does not store schema-qualified names.

Correct Query

1
2
3
4
5
6
7
SELECT 
    tablename, 
    attname AS column_name, 
    correlation 
FROM pg_stats 
WHERE tablename = 'organisation'
AND schemaname = 'public';

How I Interpret Correlation Near 1.0 → nearly perfect sequential order (best case) Near 0.0 → totally random physical order (UUID v4 chaos) Near -1.0 → perfectly ordered but reversed

This metric is extremely useful because it tells me whether index scans will be cheap or whether PostgreSQL will end up jumping all over the disk.

32. Final Comparison: Topic vs Organisation

Once I understood the numbers, the difference between integer keys and random UUIDs became painfully obvious.

Featuretopic tableorganisation table
ID TypeInteger (Serial)UUID v4 (Random)
Insert PatternSequentialRandom scatter
Page Filling StrategyFills pages efficientlyLeaves pages mostly empty
Density Score~9.62% (new table effect)1.08% (serious scatter)
VerdictNaturally healthyNeeds UUID v7

What Switching to UUID v7 Fixes

By moving organisation.id to generate_uuid_v7(), I’m ensuring that:

  • inserts become naturally ordered
  • correlation improves over time
  • index pages fill sequentially instead of fragmenting

In other words, I fixed the DNA of my primary key strategy before the system grows large.


33. Understanding the “High Bloat” Alarm

When I ran a bloat query on my database, I noticed almost every table and index was marked as HIGH BLOAT.

At first glance, that looks alarming.

But after looking closer, I realized this is mostly an illusion caused by a small dataset.

The “Empty Parking Garage” Effect

Most of my indexes were exactly 16 KB.

PostgreSQL indexes have a minimum footprint:

  • Page 1 (8 KB): Meta page (index metadata)
  • Page 2 (8 KB): First leaf page (actual key storage)

So even with just a few rows, PostgreSQL still allocates at least 2 pages.

If I only have 5 rows and they take ~100 bytes, then:

  • 100 bytes used inside an 8192-byte leaf page
  • density becomes around 1.2%

So the bloat query screams “HIGH BLOAT”, but the truth is:

the index is not bloated — it’s just empty.


34. Current Index Health Report

At this point, my index situation looks like this:

Table NameIndex NameSizeDensityStatusVerdict
organisationPK_c725…16 KB~1.08%HIGH BLOATHealthy (table too small)
topictopic_id_seq16 KB~9.62%HIGH BLOATHealthy (will fill sequentially)
agentPK_…16 KBLowHIGH BLOATNeeds UUID v7 default

When Does Bloat Actually Matter?

Bloat becomes real only when index sizes become significant:

  • 16 KB index with 1% density: not a problem
  • 500 MB index with 1% density: massive waste of RAM and disk I/O

That’s the moment where running REINDEX becomes necessary.


35. The “Search Vector” Performance Check

One thing I wanted to confirm was whether my search_vector column is already inflating the topic table.

Interestingly, even after updating search_vector, the topic table is still around 16 KB.

That’s a good sign:

  • my search_vector content is still small
  • the table hasn’t expanded into multiple pages yet
  • autovacuum hasn’t needed to reshuffle lots of tuples

Verification Query

1
2
3
4
5
6
7
SELECT 
    relname AS table_name, 
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_table_size(relid)) AS data_size,
    pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_stat_user_tables 
WHERE relname = 'topic';

This helps me verify whether the full-text column is growing the table itself or mostly affecting index storage.

That is it for the blog. Thanks for reading till here.