some time ago i made a proposal for an improvement, in one of the projects i participate in. internals aside, the main concerns i received were about the performance of doing dynamic addressing, instead of static one, used currently. the problem boils down to select 8-bit channel number, based on 16-bit address. current solution assumed certain bits of the address are to be used to construct channel names. for instance first nibbles of each bytes, so that address 0x1234 would be mapped to channel 0x24.
having dynamic addressing allows more flexible configurations, code that is simpler to maintain and decoupling of elements addressing from their physical location. it looks like something worth while… so how about this performance? well – first of all, we're talking about message sending, thus task is I/O-bound, not CPU-bound… but still – can this be done efficiently?
from logical point of view, what we need to do is to provide a mapping from uint16 to uint8. there are different approaches that can be taken, to achieve that. each of them is some balance between time, memory and complexity. selecting “the solution” obviously must be based on measurements, since for different input data, different algorithms might win. however since there were so many questions about performance, i decided to make a list of possible approaches to try out. here's what i came up with
std::map<uint16_t, uint8_t> – trivial, almost no implementation, but requires a lot of memory allocations and is not well suited for small number of elements. also not very cache-friendly.
boost::flat_map<uint16_t, uint8_t> – O(logN) search, at a cost of bit more expensive insertions and deletions. also not that cache-friendly.
unsorted std::vector< std::tuple<uint16_t, uint8_t> > – best for linear-search. quick insertions and deletions. even though it's O(N), due to caching on modern CPUs, it has quite fast lookup for small-to-medium elements count.
sorted std::vector< std::tuple<uint16_t, uint8_t> > – pretty much boost::flat_map. ;) useful if using boost is not an option, due to project policies, etc…
std::unordered_map<uint16_t, uint8_t> – amortized O(1) operations, O(N) memory. can be a nice candidate for mid-to-big size entries, though needs to allocate extra memory and is not very cache-friendly, due to usage of lists of elements, for collisions.
LUT – O(1) search, insertion and deletion. O(N) memory – here: 2^16=65536 entries, regardless of how many mappings are actually needed.
Trie-tree – O(logN) search, insertion and deletion. O(N*logN) memory. very well suited for addresses, that have similar prefixes. effectively O(1) at runtime, since uint16_t is highly limited in size.
LRU-cache – O(N) memory and O(N) operations, however for often repeating entries, they will be “hot” and found fast. this should be used as a cache, for one of the mentioned data structures
1), rather than a stand-alone solution.
hybrid – for different, unpredictable work-loads. depending on actual load, best-suited data structure can be used and replaced at runtime. for instance an implementation could start up with vector<tuple>, then switch to trie-tree for bigger tables and finally to LUT when trie starts to use too much memory. thresholds would then need to be adapted to be a best match for particular load.
…and so we have 9 different approaches within 10 minutes, or so. plenty to choose from. the best one to be selected based on actual, measurements data, from observed load. the nice thing is that most of them are trivial to implement. in fact most of them is already available as part of STL/boost and other libraries. more options are waiting on the internet, to be found. ;) it's quite a fun! :)