Part 2 - From $O(N^2)$ to $O(N)$ with Zero Memory Churn: A Go Optimization Story

In my previous post, I optimized a slice difference algorithm ($O(n+m)$) by pre-allocating memory. But as a Software Architect, I immediately asked myself:

Can I do this with zero extra memory for the result?

The answer is an in-place algorithm, implemented using the two-pointer technique.

In high-performance systems and big data pipelines, this approach is often the gold standard because it minimizes memory churn and reduces pressure on the garbage collector.

🚀 Part 2: Achieving 96% Memory Reduction with In-Place Algorithms in Go

The key idea is simple:

Instead of allocating a new result slice, I reuse the underlying array of the input slice A.

🛠️ The Architectural Shift: In-Place Modification

In this approach, I maintain a write pointer (k).

As I iterate through slice A, I overwrite elements I want to keep into the front of the same slice. At the end, I simply reslice A[:k].

Strategy

Build a lookup map from slice B for $O(1)$ membership checks.
Iterate through slice A.
If an element is not found in B, write it into position k.
Increment k.
Return the slice truncated to the final size.

✅ Go Implementation (In-Place Array Difference)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func ArrayDiffInPlace(a, b []int) []int {
    bmap := make(map[int]struct{}, len(b))
    for _, v := range b {
        bmap[v] = struct{}{}
    }

    k := 0 // Write pointer
    for i := 0; i < len(a); i++ {
        if _, found := bmap[a[i]]; !found {
            a[k] = a[i]
            k++
        }
    }
    return a[:k]
}

The “Before vs. After” Figures (Benchmark)

I ran these benchmarks on an 11th Gen Intel i5 (8 cores) with 1000+ elements. The results were staggering.

Metric	Optimized (Part 1)	In-Place (Part 2)	Improvement
Execution Time	10,214 ns/op	7,823 ns/op	23.4% faster
Memory Usage	8,520 B/op	328 B/op	96.1% less RAM
Allocations	4 allocs/op	3 allocs/op	25% fewer calls

🔍 Why the 96% Drop?

The 8,520 bytes in my previous version came from allocating a new slice to hold around 1000 integers.

In the in-place version, that ~8 KB allocation disappears entirely because I reuse the same underlying array of slice A.

The remaining 328 bytes is just the minimal overhead from building the lookup map for slice B.

🔍 Deep Dive: Why It Works

To understand where the CPU was spending its time, I ran:

1
go tool pprof

The profile showed that 74.45% of the time is now spent inside:

This is exactly what I wanted to see.

By eliminating memory churn (constant allocation, slice growth, and copying), the algorithm becomes strictly CPU-bound.

In a Big Data pipeline, this translates directly into:

lower infrastructure cost faster throughput fewer Garbage Collection (GC) pauses

Architect’s Takeaway

In Data Engineering, small changes in how I handle memory have massive compounding effects when scaled to millions of records.

In-place modification can be dangerous if I still need the original slice later, but when working with temporary buffers, it is the gold standard for high-performance systems.

#Golang #SystemDesign #BigData #SoftwareArchitecture #PerformanceOptimization #Algorithms #MemoryManagement

🚀 Part 2: Achieving 96% Memory Reduction with In-Place Algorithms in Go#

🛠️ The Architectural Shift: In-Place Modification#

Strategy#

✅ Go Implementation (In-Place Array Difference)#

The “Before vs. After” Figures (Benchmark)#

🔍 Why the 96% Drop?#

🔍 Deep Dive: Why It Works#