Memory Allocation & Fragmentation¶
A Python simulation of malloc/free -- the kind of systems-level understanding that matters when you are debugging OOM errors in GPU memory or optimizing tensor allocation.
Problem¶
Simulate dynamic memory allocation and deallocation within a fixed-size buffer. Support:
- alloc(size) -- find a free block large enough, split it if needed, return a pointer
- free(addr) -- mark the block as available and add it back to the free list
Each block carries 2 words of metadata (marker + size), so the actual memory overhead per allocation is size + 2.
Approach¶
The allocator uses a free list -- a linked list of available blocks threaded through the buffer itself (no external data structure needed).
- Initialization: The entire buffer is one large free block. Position 0 holds a start marker; position 1 holds the total size.
- alloc: Walk the free list looking for a block that fits.
- If the block is larger than needed, split it: carve off the requested size from the end, leaving the remainder as a smaller free block.
- If the block is an exact fit, unlink it from the free list entirely.
- free: Prepend the freed block to the head of the free list (O(1) insert).
- Limitation: Adjacent free blocks are not merged, so fragmentation can accumulate over time -- a real-world tradeoff between simplicity and memory efficiency.
Implementation¶
AVAILABLE = True
USED = False
METADATA_SIZE = 2
MEMORY_START_MARKER = -999
BLOCK_START_MARKER = -888
class MemoryManager:
def __init__(self, size):
self.buffer = [None] * size
self.buffer[0] = MEMORY_START_MARKER
self.buffer[1] = size
self.free_ptr = 0
self._allocate(self.free_ptr, AVAILABLE)
self.buffer[0] = -1
def alloc(self, size):
if self.free_ptr == -1:
return None
block_base = self.free_ptr
previous_block_base = None
while block_base != -1:
total_block_size = self.buffer[block_base + 1]
if size + METADATA_SIZE < total_block_size:
# Split: carve requested size from the end of the block
new_block_size = total_block_size - size - METADATA_SIZE
new_block_base = block_base + new_block_size
self.buffer[block_base + 1] = new_block_size
self.buffer[new_block_base] = BLOCK_START_MARKER
self.buffer[new_block_base + 1] = size + METADATA_SIZE
self._allocate(new_block_base, USED)
return new_block_base + METADATA_SIZE
if size + METADATA_SIZE == total_block_size:
# Exact fit: unlink from free list
if block_base == self.free_ptr:
self.free_ptr = self.buffer[block_base]
else:
self.buffer[previous_block_base] = self.buffer[block_base]
self._allocate(block_base, USED)
return block_base + METADATA_SIZE
previous_block_base = block_base
block_base = self.buffer[block_base]
return None
def free(self, addr):
block_base = addr - METADATA_SIZE
self.buffer[block_base] = self.free_ptr
self.free_ptr = block_base
self._allocate(self.free_ptr, AVAILABLE)
def _allocate(self, block_base, value):
block_size = self.buffer[block_base + 1] - METADATA_SIZE
for i in range(block_size):
self.buffer[block_base + METADATA_SIZE + i] = value
if __name__ == "__main__":
memory = MemoryManager(size=50)
a = memory.alloc(7)
b = memory.alloc(10)
c = memory.alloc(3)
d = memory.alloc(8)
e = memory.alloc(2)
memory.free(b)
memory.free(a)
memory.free(e)
print(memory.buffer)
Complexity¶
- Time: O(n) for alloc (worst-case free-list traversal), O(1) for free
- Space: O(1) beyond the buffer itself -- metadata is stored inline
Takeaway¶
Understanding memory allocation internals pays off when working with ML frameworks. PyTorch's CUDA caching allocator, TensorFlow's BFC allocator, and custom arena allocators for inference servers all use the same split-and-coalesce patterns. Knowing how fragmentation arises helps you diagnose mysterious OOM crashes on GPUs that still appear to have free memory.