====== 2014-12-16 - unordered array ====== recently i was in a need of fast and compact container to keep data, that changes very often. amount of insertions, removals and searches were pretty equal and very frequent. fortunately size was expected to be small - tens to hundreds of elements. of course the most cache-friendly data structure is linear array (read: //vector//). can we do even better? ===== assumptions ===== generally //vector// is fast, but removal of elements from the middle is a bit more painful. to deal with it i've decided to create an array that by design does not guarantee any particular order, after insertion and removal. when the size does not change, elements can be reorder, in terms of swapping/assigning. it's just size-changing operations that destroy any order. also all size-modifying operations invalidate all iterators. when container shrinks, unlike the //vector//, it can shrink as well. i've called the container [[https://github.com/el-bart/but/blob/master/But/UnorderedArray.hpp|unordered array]] (available as a part of [[https://github.com/el-bart/but|but]] library). ===== operations ===== since no particular order is assumed, addition is simple. it means appending one more element at the end. removal is also done in situ as well. instead of moving all elements from the given position until the end, one place backwards (as in //vector//), last element is being moved to the position of the element to be removed and the stub of last element is being removed. searching for an element is done in a linear fashion, as usual. ^ operation ^ UnorderedArray ^ vector ^ | add | O(1) | O(N) | | remove | O(1) | O(N) | | find | O(N) | O(N) | ===== test scenario ====== to see how it behaves i've done some measurements. they are all done using algorithm, that reassembles my actual use case. it goes like this: const vector valuesToAdd = /*...*/; Container c; c.reserve( valuesToAdd.size() ); // fill for(auto v: valuesToAdd) c.add(v); // search and remove const auto queries = randomizeOrder(valuesToAdd); for(auto q: queries) { const auto it = std::find( c.begin(), c.end(), q ); c.erase(it); } check out an [[https://github.com/el-bart/but/blob/master/But/UnorderedArray_speed.manual.cpp|actual implementation of test case]] for details. for measurements i've selected several containers to compare with //UnorderedArray//: * //vector// -- to see if it is much slower. * //deque// -- to check if it is significantly slower. * //set// -- to compare with balanced search tree. * //unordered_set// -- to compare with hash table and amortized O(1) access. * //list// -- just for the record... ;) ===== measurements ===== {{:blog:2014:12:16:data_points.png?500|all data points}} {{:blog:2014:12:16:data_points-no-list.png?500|all data points, without the list}} {{:blog:2014:12:16:data_points_0_to_2000.png?500|0-2k range}} {{:blog:2014:12:16:data_points_0_to_2000-no-list-no-deque.png?500|0-2k range w/o list and deque}} all measurements, containing //list// clearly show that... //list// just gets in the way. ;) there is just one interesting point there, that shows sudden speedup of the //list// around 7k elements -- looks like the //list// is not only slow but also haunted. ;) just stay away from it. //unordered_set// was the fastest container for more than ~1.2k elements. up to that point, [[https://github.com/el-bart/but/blob/master/But/UnorderedArray.hpp|UnorderedArray]] was the fastest container there is. in terms of speed, next non-associative container is //vector//. here is a comparison of these two: {{:blog:2014:12:16:data_points_small_vec_vs_ua.png?500|UnorderedArray vs. std::vector for small number of elements}} ===== final remarks ===== for my use case UnorderedArray was a clear winner. however please keep in mind that this comparison is far from being comprehensive. it shows a particular test case only, that i happened to need. it also does not include neither different compilers nor hardware platforms. consider it an inspiration more than an oracle... ;)