In my previous post, I optimized a slice difference algorithm ($O(n+m)$) by pre-allocating memory. But as a Software Architect, I immediately asked myself:
Can I do this with zero extra memory for the result?
The answer is an in-place algorithm, implemented using the two-pointer technique.
In high-performance systems and big data pipelines, this approach is often the gold standard because it minimizes memory churn and reduces pressure on the garbage collector.
๐ Part 2: Achieving 96% Memory Reduction with In-Place Algorithms in Go
The key idea is simple:
Instead of allocating a new result slice, I reuse the underlying array of the input slice A.
๐ ๏ธ The Architectural Shift: In-Place Modification
In this approach, I maintain a write pointer (k).
As I iterate through slice A, I overwrite elements I want to keep into the front of the same slice. At the end, I simply reslice A[:k].
Strategy
- Build a lookup map from slice
Bfor $O(1)$ membership checks. - Iterate through slice
A. - If an element is not found in
B, write it into positionk. - Increment
k. - Return the slice truncated to the final size.
โ Go Implementation (In-Place Array Difference)
| |
The “Before vs. After” Figures (Benchmark)
I ran these benchmarks on an 11th Gen Intel i5 (8 cores) with 1000+ elements. The results were staggering.
| Metric | Optimized (Part 1) | In-Place (Part 2) | Improvement |
|---|---|---|---|
| Execution Time | 10,214 ns/op | 7,823 ns/op | 23.4% faster |
| Memory Usage | 8,520 B/op | 328 B/op | 96.1% less RAM |
| Allocations | 4 allocs/op | 3 allocs/op | 25% fewer calls |
๐ Why the 96% Drop?
The 8,520 bytes in my previous version came from allocating a new slice to hold around 1000 integers.
In the in-place version, that ~8 KB allocation disappears entirely because I reuse the same underlying array of slice A.
The remaining 328 bytes is just the minimal overhead from building the lookup map for slice B.
๐ Deep Dive: Why It Works
To understand where the CPU was spending its time, I ran:
| |
The profile showed that 74.45% of the time is now spent inside:
This is exactly what I wanted to see.
By eliminating memory churn (constant allocation, slice growth, and copying), the algorithm becomes strictly CPU-bound.
In a Big Data pipeline, this translates directly into:
lower infrastructure cost faster throughput fewer Garbage Collection (GC) pauses
Architectโs Takeaway
In Data Engineering, small changes in how I handle memory have massive compounding effects when scaled to millions of records.
In-place modification can be dangerous if I still need the original slice later, but when working with temporary buffers, it is the gold standard for high-performance systems.
#Golang #SystemDesign #BigData #SoftwareArchitecture #PerformanceOptimization #Algorithms #MemoryManagement