2012.07.28 - small object allocation

Loki's SOA code it is often said that default allocator in C++ is not well suited for small objects. by that it is usually meant that it wastes a lot of memory. to solve this Loki library provides own memory allocator, specially suited for small objects. i've decided to test it.

test environment

all test have been made with this code and its minor modifications. package also includes test results, as measured. in general this programs does 3 things:

  1. allocates 1 million elements into the vector.
  2. shuffles content1).
  3. deallocates elements one by one, always removing the last one.

my test environment is GCC 4.7 run on amd64, Core 2 Duo CPU, with 4GB of RAM. computer is running Linux. Loki version in use is 0.1.7 (i.e. latest).

execution time

first thing i've measured is execution time. N is size of the structure used for allocation (i.e. small object to be tested). SYSTEM_ALLOC means that default, system new and delete operators were used. USE_LOKI_SOA means that Loki's small object allocator (SOA) was used. all times are measured in seconds. each step of the algorithm has been measured separately.

here are the results for full test, that shuffles data before deallocation:

mode allocation shuffing deallocation
N=1, SYSTEM_ALLOC 0.081 0.103 0.148
N=10, SYSTEM_ALLOC 0.078 0.107 0.141
N=20, SYSTEM_ALLOC 0.077 0.108 0.143
N=30, SYSTEM_ALLOC 0.092 0.107 0.150
N=40, SYSTEM_ALLOC 0.091 0.107 0.150
N=50, SYSTEM_ALLOC 0.102 0.104 0.149
N=1, USE_LOKI_SOA 0.056 0.111 4.372
N=10, USE_LOKI_SOA 0.063 0.110 4.345
N=20, USE_LOKI_SOA 0.076 0.100 5.383
N=30, USE_LOKI_SOA 0.110 0.099 8.543
N=40, USE_LOKI_SOA 0.140 0.108 10.647
N=50, USE_LOKI_SOA 0.188 0.098 13.875

it appears that Loki's allocator is faster for VERY small objects, up to about 20 bytes. deallocation, on the other hand, is 40 times slower than when using default one.

one more common scenario is allocation and deallocation that is done in order. this can be simulated by turning off shuffling step. results of measurements are presented in the table:

mode allocation shuffing deallocation
N=1, SYSTEM_ALLOC 0.078 0.000 0.029
N=10, SYSTEM_ALLOC 0.078 0.000 0.032
N=20, SYSTEM_ALLOC 0.082 0.000 0.030
N=30, SYSTEM_ALLOC 0.090 0.000 0.033
N=40, SYSTEM_ALLOC 0.091 0.000 0.035
N=50, SYSTEM_ALLOC 0.102 0.000 0.040
N=1, USE_LOKI_SOA 0.057 0.000 0.050
N=10, USE_LOKI_SOA 0.063 0.000 0.052
N=20, USE_LOKI_SOA 0.075 0.000 0.061
N=30, USE_LOKI_SOA 0.108 0.000 0.066
N=40, USE_LOKI_SOA 0.137 0.000 0.056
N=50, USE_LOKI_SOA 0.190 0.000 0.058

one can see, that shuffling time is 0 (not being run). as in the previous test case, allocation is faster with Loki for N less than 20 bytes. deallocation is still significantly slower, but at least is in the same row of magnitude as the system allocator.

memory

the main thing about the Loki's SOA is minimizing memory footprint. therefor memory used by the process has been measured with /proc/<pid>/status file. notice that it is needed to be measured on system-level, since sum of each allocation requests will be the same for both versions. results follows (values are in kB):

N sys. alloc Loki
1 57196 30012
20 57196 45720
40 72904 65516

a great improvement in memory footprint is visible for very small objects, when Loki's SOA is used. in case of 1-byte objects it is almost 2x less memory in use. for 20 bytes it is still 20% less.

conclusions

first observation is that my implementation's/system's default allocator is doing well, when it comes to speed of (de)allocations. it is much worse, when talking about memory usage, for very small objects, but still it is not that bad, as one could expect.

it appears that it is indeed beneficial to use Loki's SOA, in case where many small (on my test environment this means less then 20 bytes) objects are to be allocated. it is also worth noticing that Loki's SOA is slower when deallocating in random order, thus memory restrictions must be of a greater concern than execution time, to make this solution beneficial.

after all, it is not a secret, that when it comes to the memory allocation, “one size fits all” does not apply. choose the right tool for your needs and base your reasoning on measurements.

1)
this step is optional.
blog/2012/07/28/1.txt · Last modified: 2013/05/17 19:08 (external edit)
Back to top
Valid CSS Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0