it is often said that default allocator in C++ is not well suited for small objects. by that it is usually meant that it wastes a lot of memory. to solve this Loki library provides own memory allocator, specially suited for small objects. i've decided to test it.
all test have been made with this code and its minor modifications. package also includes test results, as measured. in general this programs does 3 things:
my test environment is GCC 4.7 run on amd64, Core 2 Duo CPU, with 4GB of RAM. computer is running Linux. Loki version in use is 0.1.7 (i.e. latest).
first thing i've measured is execution time. N is size of the structure used for allocation (i.e. small object to be tested). SYSTEM_ALLOC means that default, system new and delete operators were used. USE_LOKI_SOA means that Loki's small object allocator (SOA) was used. all times are measured in seconds. each step of the algorithm has been measured separately.
here are the results for full test, that shuffles data before deallocation:
mode | allocation | shuffing | deallocation |
---|---|---|---|
N=1, SYSTEM_ALLOC | 0.081 | 0.103 | 0.148 |
N=10, SYSTEM_ALLOC | 0.078 | 0.107 | 0.141 |
N=20, SYSTEM_ALLOC | 0.077 | 0.108 | 0.143 |
N=30, SYSTEM_ALLOC | 0.092 | 0.107 | 0.150 |
N=40, SYSTEM_ALLOC | 0.091 | 0.107 | 0.150 |
N=50, SYSTEM_ALLOC | 0.102 | 0.104 | 0.149 |
N=1, USE_LOKI_SOA | 0.056 | 0.111 | 4.372 |
N=10, USE_LOKI_SOA | 0.063 | 0.110 | 4.345 |
N=20, USE_LOKI_SOA | 0.076 | 0.100 | 5.383 |
N=30, USE_LOKI_SOA | 0.110 | 0.099 | 8.543 |
N=40, USE_LOKI_SOA | 0.140 | 0.108 | 10.647 |
N=50, USE_LOKI_SOA | 0.188 | 0.098 | 13.875 |
it appears that Loki's allocator is faster for VERY small objects, up to about 20 bytes. deallocation, on the other hand, is 40 times slower than when using default one.
one more common scenario is allocation and deallocation that is done in order. this can be simulated by turning off shuffling step. results of measurements are presented in the table:
mode | allocation | shuffing | deallocation |
---|---|---|---|
N=1, SYSTEM_ALLOC | 0.078 | 0.000 | 0.029 |
N=10, SYSTEM_ALLOC | 0.078 | 0.000 | 0.032 |
N=20, SYSTEM_ALLOC | 0.082 | 0.000 | 0.030 |
N=30, SYSTEM_ALLOC | 0.090 | 0.000 | 0.033 |
N=40, SYSTEM_ALLOC | 0.091 | 0.000 | 0.035 |
N=50, SYSTEM_ALLOC | 0.102 | 0.000 | 0.040 |
N=1, USE_LOKI_SOA | 0.057 | 0.000 | 0.050 |
N=10, USE_LOKI_SOA | 0.063 | 0.000 | 0.052 |
N=20, USE_LOKI_SOA | 0.075 | 0.000 | 0.061 |
N=30, USE_LOKI_SOA | 0.108 | 0.000 | 0.066 |
N=40, USE_LOKI_SOA | 0.137 | 0.000 | 0.056 |
N=50, USE_LOKI_SOA | 0.190 | 0.000 | 0.058 |
one can see, that shuffling time is 0 (not being run). as in the previous test case, allocation is faster with Loki for N less than 20 bytes. deallocation is still significantly slower, but at least is in the same row of magnitude as the system allocator.
the main thing about the Loki's SOA is minimizing memory footprint. therefor memory used by the process has been measured with /proc/<pid>/status file. notice that it is needed to be measured on system-level, since sum of each allocation requests will be the same for both versions. results follows (values are in kB):
N | sys. alloc | Loki |
---|---|---|
1 | 57196 | 30012 |
20 | 57196 | 45720 |
40 | 72904 | 65516 |
a great improvement in memory footprint is visible for very small objects, when Loki's SOA is used. in case of 1-byte objects it is almost 2x less memory in use. for 20 bytes it is still 20% less.
first observation is that my implementation's/system's default allocator is doing well, when it comes to speed of (de)allocations. it is much worse, when talking about memory usage, for very small objects, but still it is not that bad, as one could expect.
it appears that it is indeed beneficial to use Loki's SOA, in case where many small (on my test environment this means less then 20 bytes) objects are to be allocated. it is also worth noticing that Loki's SOA is slower when deallocating in random order, thus memory restrictions must be of a greater concern than execution time, to make this solution beneficial.
after all, it is not a secret, that when it comes to the memory allocation, “one size fits all” does not apply. choose the right tool for your needs and base your reasoning on measurements.