In my previous post, I optimized a slice difference algorithm ($O(n+m)$) by pre-allocating memory. But as a Software Architect, I immediately asked myself:

Can I do this with zero extra memory for the result?

The answer is an in-place algorithm, implemented using the two-pointer technique.

In high-performance systems and big data pipelines, this approach is often the gold standard because it minimizes memory churn and reduces pressure on the garbage collector.


๐Ÿš€ Part 2: Achieving 96% Memory Reduction with In-Place Algorithms in Go

The key idea is simple:

Instead of allocating a new result slice, I reuse the underlying array of the input slice A.


๐Ÿ› ๏ธ The Architectural Shift: In-Place Modification

In this approach, I maintain a write pointer (k).

As I iterate through slice A, I overwrite elements I want to keep into the front of the same slice. At the end, I simply reslice A[:k].

Strategy

  • Build a lookup map from slice B for $O(1)$ membership checks.
  • Iterate through slice A.
  • If an element is not found in B, write it into position k.
  • Increment k.
  • Return the slice truncated to the final size.

โœ… Go Implementation (In-Place Array Difference)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func ArrayDiffInPlace(a, b []int) []int {
    bmap := make(map[int]struct{}, len(b))
    for _, v := range b {
        bmap[v] = struct{}{}
    }

    k := 0 // Write pointer
    for i := 0; i < len(a); i++ {
        if _, found := bmap[a[i]]; !found {
            a[k] = a[i]
            k++
        }
    }
    return a[:k]
}

The “Before vs. After” Figures (Benchmark)

I ran these benchmarks on an 11th Gen Intel i5 (8 cores) with 1000+ elements. The results were staggering.

MetricOptimized (Part 1)In-Place (Part 2)Improvement
Execution Time10,214 ns/op7,823 ns/op23.4% faster
Memory Usage8,520 B/op328 B/op96.1% less RAM
Allocations4 allocs/op3 allocs/op25% fewer calls

๐Ÿ” Why the 96% Drop?

The 8,520 bytes in my previous version came from allocating a new slice to hold around 1000 integers.

In the in-place version, that ~8 KB allocation disappears entirely because I reuse the same underlying array of slice A.

The remaining 328 bytes is just the minimal overhead from building the lookup map for slice B.


๐Ÿ” Deep Dive: Why It Works

To understand where the CPU was spending its time, I ran:

1
go tool pprof

The profile showed that 74.45% of the time is now spent inside:

This is exactly what I wanted to see.

By eliminating memory churn (constant allocation, slice growth, and copying), the algorithm becomes strictly CPU-bound.

In a Big Data pipeline, this translates directly into:

lower infrastructure cost faster throughput fewer Garbage Collection (GC) pauses

Architectโ€™s Takeaway

In Data Engineering, small changes in how I handle memory have massive compounding effects when scaled to millions of records.

In-place modification can be dangerous if I still need the original slice later, but when working with temporary buffers, it is the gold standard for high-performance systems.

#Golang #SystemDesign #BigData #SoftwareArchitecture #PerformanceOptimization #Algorithms #MemoryManagement