Continuation to the previous blog.
28. The “Heavy Data” Reality: TSVectors
While working on my topic table, I noticed I’m building a search_vector column using to_tsvector(). That immediately adds a completely new layer to the “8 KB page” math I discussed earlier.
Until now, my examples were based on small index keys like UUIDs.
But full-text search is a different beast.
Why Full-Text Search Becomes Heavy
A UUID is small:
- UUID size: 16 bytes
But a tsvector generated from a description field can easily become:
- 500+ bytes, sometimes much more depending on the text
This changes everything.
The Page Density Problem Gets Worse
PostgreSQL pages are still 8 KB (8192 bytes).
So earlier, I could theoretically fit hundreds of UUID entries into a single index page.
But with TSVectors, the math collapses:
- Instead of fitting 800+ entries per page,
- a full-text index (especially a GIN index) might only fit 10–15 entries per page
What This Means in Real Life
This is why Full-Text Search is “heavy”:
- The disk must fetch more pages
- PostgreSQL must process larger blocks
- CPU overhead increases because large index structures must be traversed
In short: full-text search is powerful, but it’s not “free”.
29. Final Implementation Checklist (Postgres 17.7)
At this stage, I already have pgcrypto installed and I’ve tested UUID generation successfully. Now I just need to apply the new standard consistently across my main tables.
Here’s the final checklist I’m following to migrate my tables (agent, topic, organisation) into UUID v7 defaults.
Migration Checklist
| Step | Action | SQL Command |
|---|---|---|
| 1 | Apply Default | ALTER TABLE agent ALTER COLUMN id SET DEFAULT generate_uuid_v7(); |
| 2 | Apply Default | ALTER TABLE topic ALTER COLUMN id SET DEFAULT generate_uuid_v7(); |
| 3 | Repack Pages | REINDEX TABLE agent; |
| 4 | Verify | Run pgstatindex() again and check density |
This ensures that all new inserts follow the correct UUID ordering behavior, and the index stops degrading over time.
30. The “Smoking Gun”: Explaining the 1.08% Density
When I ran pgstatindex() on my organisation table, the output showed:
avg_leaf_density: 1.08
This is basically the ultimate proof of the UUID v4 problem.
The Math Behind 1.08%
PostgreSQL index pages are 8192 bytes (8 KB).
So if my average leaf density is 1.08%, the actual space being used is:
Used Space:
$8192 \times 0.0108 = 88.47 \text{ bytes}$Wasted Space:
$8192 - 88.47 \approx 8103 \text{ bytes}$
That means PostgreSQL allocated a full 8 KB page, but I’m only storing around 88 bytes of useful index data inside it.
Why This Happens
Because UUID v4 values are random, inserts get scattered across the B-Tree.
PostgreSQL keeps opening new pages to place new keys, but it can’t fill them sequentially. So I end up with thousands of nearly-empty index pages.
This is exactly how index bloat starts.
31. Fixing My Correlation Query
After seeing the density issue, I wanted to check another critical metric: correlation.
Correlation tells me how closely the physical order of rows on disk matches the index order.
This is a major planner performance indicator.
My original query failed because I mistakenly used public.organisation as the table name.
But pg_stats.tablename does not store schema-qualified names.
Correct Query
| |
How I Interpret Correlation Near 1.0 → nearly perfect sequential order (best case) Near 0.0 → totally random physical order (UUID v4 chaos) Near -1.0 → perfectly ordered but reversed
This metric is extremely useful because it tells me whether index scans will be cheap or whether PostgreSQL will end up jumping all over the disk.
32. Final Comparison: Topic vs Organisation
Once I understood the numbers, the difference between integer keys and random UUIDs became painfully obvious.
| Feature | topic table | organisation table |
|---|---|---|
| ID Type | Integer (Serial) | UUID v4 (Random) |
| Insert Pattern | Sequential | Random scatter |
| Page Filling Strategy | Fills pages efficiently | Leaves pages mostly empty |
| Density Score | ~9.62% (new table effect) | 1.08% (serious scatter) |
| Verdict | Naturally healthy | Needs UUID v7 |
What Switching to UUID v7 Fixes
By moving organisation.id to generate_uuid_v7(), I’m ensuring that:
- inserts become naturally ordered
- correlation improves over time
- index pages fill sequentially instead of fragmenting
In other words, I fixed the DNA of my primary key strategy before the system grows large.
33. Understanding the “High Bloat” Alarm
When I ran a bloat query on my database, I noticed almost every table and index was marked as HIGH BLOAT.
At first glance, that looks alarming.
But after looking closer, I realized this is mostly an illusion caused by a small dataset.
The “Empty Parking Garage” Effect
Most of my indexes were exactly 16 KB.
PostgreSQL indexes have a minimum footprint:
- Page 1 (8 KB): Meta page (index metadata)
- Page 2 (8 KB): First leaf page (actual key storage)
So even with just a few rows, PostgreSQL still allocates at least 2 pages.
If I only have 5 rows and they take ~100 bytes, then:
- 100 bytes used inside an 8192-byte leaf page
- density becomes around 1.2%
So the bloat query screams “HIGH BLOAT”, but the truth is:
the index is not bloated — it’s just empty.
34. Current Index Health Report
At this point, my index situation looks like this:
| Table Name | Index Name | Size | Density | Status | Verdict |
|---|---|---|---|---|---|
| organisation | PK_c725… | 16 KB | ~1.08% | HIGH BLOAT | Healthy (table too small) |
| topic | topic_id_seq | 16 KB | ~9.62% | HIGH BLOAT | Healthy (will fill sequentially) |
| agent | PK_… | 16 KB | Low | HIGH BLOAT | Needs UUID v7 default |
When Does Bloat Actually Matter?
Bloat becomes real only when index sizes become significant:
- 16 KB index with 1% density: not a problem
- 500 MB index with 1% density: massive waste of RAM and disk I/O
That’s the moment where running REINDEX becomes necessary.
35. The “Search Vector” Performance Check
One thing I wanted to confirm was whether my search_vector column is already inflating the topic table.
Interestingly, even after updating search_vector, the topic table is still around 16 KB.
That’s a good sign:
- my
search_vectorcontent is still small - the table hasn’t expanded into multiple pages yet
- autovacuum hasn’t needed to reshuffle lots of tuples
Verification Query
| |
This helps me verify whether the full-text column is growing the table itself or mostly affecting index storage.
That is it for the blog. Thanks for reading till here.