not so long ago i wrote about memory allocators for the small objects. recently i took a part in the interesting talk regarding memory management inside the ACARM-ng system, during heavy loads. nice talk started on stackoverflow, regarding this topic as well. inspired by both i've decided to experiment a little on my linux box…
first of all a few words on the memory allocation. according to the malloc(3) it uses sbrk(2) internally. what does it mean from the practical point of view? when you allocate number of memory blocks, and free/delete all but last one, you'll not be able to give memory back from the process to the system, since trimming cannot occur.
there is also a noticeable difference between allocations of small objects (ex. 4 bytes) and big ones (ex. megabyte). the first one uses internal allocation, so that calling allocation of such a small piece of memory does not always require allocation new memory from the system. allocation of bigger memory areas uses mmap(2).
to test how it really is i've written a simple test program, that allocates 1GB of user-accessible memory – namely allocating_process. there are two ways of doing that: by means of many small chunks (here: 8B long) or alternatively longer ones (here: 1kB long). this is controlled by 'big_allocs' paramter (first). i've also added an option to keep last element available (i.e. do not free it), to see if memory is not returned back to the system. this is 'keep_last' option (second parameter). when run program also reports its own memory usage.
to see if the memory is usable, by the other processes, there are also 2 helper applications: try_alloc_some-big that allocates memory in one block and try_alloc_some-chunks allocating memory in 1MB chunks.
here is a complete source code, written in C++11: alloc_test suit.
since each program waits for pressing enter, before continuing, it is possible to run them in different combinations and observe what happens. i focused on testing allocations and deallocation with allocating_process, while checking if the memory is available from the other process.
NOTE: you should disable swap file/partition for the time of the testing – it will save you a lot of time in case you run out of memory and OOM Killer will arrive to save you. ;)
test machine was Debian linux, amd64 with 8GB of RAM.
big_allocs | keep_last | mem. requred | mem. used | mem. after dealloc |
---|---|---|---|---|
false | false | 1024MB | 5120MB | 4096MB |
false | true | 1024MB | 5120MB | 4096MB |
true | false | 1024MB | 1048MB | 0MB |
true | true | 1024MB | 1048MB | 1040MB |
all the time, when memory was marked as being used by the process, it was not allocable from external processes either as a big block, nor chunks.
second thing is huge overhead when allocating the same memory, but in multiple mico-allocations. when comparing 1GB allocated in 8B blocks and 1kB blocks, actual memory usage differs few times! this difference obviously includes greater number of pointers to store – to store 128M of pointers one needs 1G of RAM, but it still lefts us with 4 times overhead.
interesting thing happens when memory returned from the small allocations is freed. notice that memory usage dropped by 1GB, both cases. it appears that memory allocated by the vector<>, to hold pointers was released (big block), while memory used by small allocations was not. from this we can draw a conclusion that the memory used by the allocator for small objects is probably not (ever?) shrink.
notice that after freeing the memory taken by the bigger objects it is returned to the system… unless the something is still in use, at the end of the address space. when all objects were freed, memory was returned, but when last element was kept, no only 8MB was released – this is exactly how much is required to store 1M pointers on this architecture, so only vector<>'s internal block was released.