====== 2013.03.14 - vector<> vs. list<> ====== almost exactly year ago [[wp>Bjarne Stroustrup|Bjarne]] had a [[http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style|C++11 style]] presentation in [[wp>Wrocław]]. aside from many interesting pieces of information, there was very nice example of how [[docs:cache_usage_effects_on_modern_cpus:cache_usage_effects_on_modern_cpus|CPU caches and memory access]] affect applications' runtime, on modern hardware. ===== algorithm ===== the algorithm is simple: insert N random elements to a container, so that it remains sorted all the time. after that, N times remove 1 element, from a random position. the question is: which container will be faster? [[http://en.cppreference.com/w/cpp/container/list|list]] or [[http://en.cppreference.com/w/cpp/container/vector|vector]]? how big N will be the breaking point, where list is faster than vector? the answer is: never! due to constant cache misses, when using list, and constant cache hits, when using vector, the former one outperforms the other for all Ns! recently i was doing a training/presentation, on cache and memory effects, for colleagues at work, and was not believed it is, as Bjarne suggested. "how come moving thousands of elements is faster than iteration over list nodes"? well it is. but of course we do not take anything on one's word -- we [[blog:2012:04:06:1|do measure]] instead. ===== implementation ===== code implemented in C++, along with makefile for automatic generating images from gathered results, are {{:blog:2013:03:14:vector_vs_list.tar.bz2|available for download}}. for the reference, my test code follows: // // implemented by BaSz @ 2013/q2 // // g++-4.7 -Wall -std=c++11 -O3 -DNDEBUG -march=native -o vector_vs_list.out vector_vs_list.cpp // #include #include #include #include #include #include #include #include #include #include using namespace std; typedef chrono::high_resolution_clock Clock; typedef std::list List; typedef std::vector Vector; template uint64_t algo(R r, const uint64_t n) { C c; typename C::value_type out = 0; // insertion for(uint64_t i=0; i void run(const uint64_t n, const uint64_t seed) { // prepare rand mt19937 gen(seed); uniform_int_distribution dist(0, n-1); auto r = [&]{ return dist(gen); }; // work auto start = Clock::now(); auto ret = algo(r, n); auto stop = Clock::now(); // info auto diff = stop-start; cerr << typeid(C).name() << " = " << fixed << setprecision(3) << chrono::duration_cast(diff).count()/(1000*1000.0) << " [s] (xxx==" << ret%10 << ")\t"; } int main(void) { // initilize generator with some seed std::random_device rd; const auto initSeed = rd(); mt19937 gen(initSeed); cerr << ">> initial seed = " << initSeed << endl; for(uint64_t n=0; n<10*1000*1000l; n+=2000) { const auto seed = gen(); cerr << "N = " << n << "\t"; run (n, seed); run(n, seed); cerr << " \t (for seed " << seed << ")" << endl; } return 0; } ===== results ===== results were obtained from 4, random runs. figures are grouped in a "full" and "begin" sets. each run is presented on a separate figure. click on images to enlarge them. ==== full ==== set "full" contains all data gathered, from N=0 to N=80k. {{:blog:2013:03:14:out1-full.png?200|run 1 / full}} {{:blog:2013:03:14:out2-full.png?200|run 2 / full}} {{:blog:2013:03:14:out3-full.png?200|run 3 / full}} {{:blog:2013:03:14:out4-full.png?200|run 4 / full}} ==== begin ==== set "begin" shows just first 25% of obtained data, to show how does pictures look for small Ns. {{:blog:2013:03:14:out1-begin.png?200|run 1 / begin}} {{:blog:2013:03:14:out2-begin.png?200|run 2 / begin}} {{:blog:2013:03:14:out3-begin.png?200|run 3 / begin}} {{:blog:2013:03:14:out4-begin.png?200|run 4 / begin}} ===== conclusion ===== Bjarne did not lie. this statement is true for over 10 years now. as CPUs become constantly way faster than memory, the gap increases over time.