====== 2016-03-07 - comparing hashes ====== let's take a simple task -- check if user-provided password is equal to the one stored in a DB. of course we're not storing plain-text passwords and operate of properly slated hashes instead. how hard can it be? ===== attempt #1: just do it! ===== hey ma! it's trivial!


bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  return dbHash == userHash;
}

...and KABOOM! {{:blog:2016:03:07:nukeepicfail.jpg?400|Fpic Eail}} using [[wp>timing attack]] user's password hash can be broken in **O(N)**, where N is hash length, effectively giving... **O(1)** (with just a bit of statistical overhead)! classic -- fortunately more and more ppl are aware of this nowadays. let us proceed to... ===== attempt #2: const-time implementation ===== ok, ok - so we need timing-independent algorithm for comparisons. giving it a bit though, you'll come up with something like:


bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  std::string::value_type hasNonEquals = 0;
  for(std::string::size_type i=0; i

first -- step 1. didn't we just leaked hash length? well... this is common knowledge, thus not a problem here.

so -- step 2. implementation looks fine -- no branches, all characters always checked and final result is just simple check if there were no mismatches. it looks fine and may even work. unfortunately, if you happen to have a very clever compiler, it will notice that the above code is //functionally equivalent// to:


bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  for(std::string::size_type i=0; i

and even saves some stack space! well -- optimizations are awesome, except when they are not. ;)


===== attempt #3: disabling (some) optimizations =====

can we disable optimizations for certain variables? how about using a **volatile** variable...


bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  volatile std::string::value_type hasNonEquals = 0;      // <<<< ...RIGHT HERE
  for(std::string::size_type i=0; i

IMHO this should do. we could probably stop at this point. we are stepping on a thin ice here, though.

first of all volatile is crucial and non-obvious part here. if some1 in one year time will come and say //hey -- what a moron put this volatile here?//, we're screwed.

technically speaking, even though we're guaranteed compiler will not touch our volatile reads and writes, instructions can still get reorganized, a helper variable can be used to store intermediates, etc... in theory at least. i'd not reasonably expect it to happen, but when it comes to cryptography -- better be safe than sorry.

last but not least -- above approach is [[wp>C++]]-specific trick. note that //volatile// has different meanings in different programming languages! for instance in [[wp>C Sharp (programming language)|C#]] and [[wp>Java (Programming language)|Java]] it is equivalent to //std::atomic// variables in C++. you'll not be able to use similar method in [[wp>Python (programming language)|Python]] or [[wp>PHP]] too.

can there be more generic approach?


===== attempt #4: generic approach =====

can this be done? i've asked myself the same question this weekend and a solution crossed my mind. we need to have a logic complex enough to hide compile-time knowledge behind runtime-dependent computations. this way we'll know, we're not leaking any pieces of information and at the same time make it portable to other languages. logically we want something similar to:


std::string reasonableHash(std::string str); // <<< any one-way hash function should do here

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  return reasonableHash(dhHash) == reasonableHash(userHash);
}


it will cost much more than any of the above implementations, but we'll be able to do it safely, using any languages.

the above solution is however missing one more crucial point. having indexed rainbow tables and big storage, it might still be feasible to narrow down the search, based on output hashes and [[wp>timing attack]], to find proper input-string.

this can be however easily solved, by simply adding some extra noise to the system. consider the following implementation:


std::string reasonableHash(std::string str); // as before
std::string randomString();                  // PRNG can be used for that

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  const auto junk = randomString();
  return reasonableHash( junk + dhHash ) == reasonableHash( junk + userHash );
}


using [[wp>pseudo-random number generator|prng]] we're rendering timing attacks worthless, since each time algorithm is executed, it will return completely different hash, than before and string-compare will stop at "random" points.


===== afterword =====

it was both fun and learning experience to figure it all out. unfortunately for me, [[https://paragonie.com/blog/2015/11/preventing-timing-attacks-on-string-comparison-with-double-hmac-strategy|it has already been invented]] some time ago. ;)