Memory Allocation & Fragmentation¶

A Python simulation of malloc/free -- the kind of systems-level understanding that matters when you are debugging OOM errors in GPU memory or optimizing tensor allocation.

Problem¶

Simulate dynamic memory allocation and deallocation within a fixed-size buffer. Support:

alloc(size) -- find a free block large enough, split it if needed, return a pointer
free(addr) -- mark the block as available and add it back to the free list

Each block carries 2 words of metadata (marker + size), so the actual memory overhead per allocation is size + 2.

Approach¶

The allocator uses a free list -- a linked list of available blocks threaded through the buffer itself (no external data structure needed).

Initialization: The entire buffer is one large free block. Position 0 holds a start marker; position 1 holds the total size.
alloc: Walk the free list looking for a block that fits.
- If the block is larger than needed, split it: carve off the requested size from the end, leaving the remainder as a smaller free block.
- If the block is an exact fit, unlink it from the free list entirely.
free: Prepend the freed block to the head of the free list (O(1) insert).
Limitation: Adjacent free blocks are not merged, so fragmentation can accumulate over time -- a real-world tradeoff between simplicity and memory efficiency.

Implementation¶

AVAILABLE = True
USED = False
METADATA_SIZE = 2
MEMORY_START_MARKER = -999
BLOCK_START_MARKER = -888


class MemoryManager:
    def __init__(self, size):
        self.buffer = [None] * size
        self.buffer[0] = MEMORY_START_MARKER
        self.buffer[1] = size
        self.free_ptr = 0
        self._allocate(self.free_ptr, AVAILABLE)
        self.buffer[0] = -1

    def alloc(self, size):
        if self.free_ptr == -1:
            return None

        block_base = self.free_ptr
        previous_block_base = None

        while block_base != -1:
            total_block_size = self.buffer[block_base + 1]

            if size + METADATA_SIZE < total_block_size:
                # Split: carve requested size from the end of the block
                new_block_size = total_block_size - size - METADATA_SIZE
                new_block_base = block_base + new_block_size

                self.buffer[block_base + 1] = new_block_size
                self.buffer[new_block_base] = BLOCK_START_MARKER
                self.buffer[new_block_base + 1] = size + METADATA_SIZE

                self._allocate(new_block_base, USED)
                return new_block_base + METADATA_SIZE

            if size + METADATA_SIZE == total_block_size:
                # Exact fit: unlink from free list
                if block_base == self.free_ptr:
                    self.free_ptr = self.buffer[block_base]
                else:
                    self.buffer[previous_block_base] = self.buffer[block_base]

                self._allocate(block_base, USED)
                return block_base + METADATA_SIZE

            previous_block_base = block_base
            block_base = self.buffer[block_base]

        return None

    def free(self, addr):
        block_base = addr - METADATA_SIZE
        self.buffer[block_base] = self.free_ptr
        self.free_ptr = block_base
        self._allocate(self.free_ptr, AVAILABLE)

    def _allocate(self, block_base, value):
        block_size = self.buffer[block_base + 1] - METADATA_SIZE
        for i in range(block_size):
            self.buffer[block_base + METADATA_SIZE + i] = value


if __name__ == "__main__":
    memory = MemoryManager(size=50)
    a = memory.alloc(7)
    b = memory.alloc(10)
    c = memory.alloc(3)
    d = memory.alloc(8)
    e = memory.alloc(2)

    memory.free(b)
    memory.free(a)
    memory.free(e)
    print(memory.buffer)

Complexity¶

Time: O(n) for alloc (worst-case free-list traversal), O(1) for free
Space: O(1) beyond the buffer itself -- metadata is stored inline

Takeaway¶

Understanding memory allocation internals pays off when working with ML frameworks. PyTorch's CUDA caching allocator, TensorFlow's BFC allocator, and custom arena allocators for inference servers all use the same split-and-coalesce patterns. Knowing how fragmentation arises helps you diagnose mysterious OOM crashes on GPUs that still appear to have free memory.

Back to Software Design