2016-03-07 - comparing hashes

let's take a simple task – check if user-provided password is equal to the one stored in a DB. of course we're not storing plain-text passwords and operate of properly slated hashes instead.

how hard can it be?

attempt #1: just do it!

hey ma! it's trivial!

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  return dbHash == userHash;
}

…and KABOOM!

using timing attack user's password hash can be broken in O(N), where N is hash length, effectively giving… O(1) (with just a bit of statistical overhead)!

classic – fortunately more and more ppl are aware of this nowadays. let us proceed to…

attempt #2: const-time implementation

ok, ok - so we need timing-independent algorithm for comparisons. giving it a bit though, you'll come up with something like:

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  std::string::value_type hasNonEquals = 0;
  for(std::string::size_type i=0; i<dhHash.size(); ++i)
    hasNonEquals |= dhHash[i] ^ userHash[i];
  return not hasNonEquals;
}

first – step 1. didn't we just leaked hash length? well… this is common knowledge, thus not a problem here.

so – step 2. implementation looks fine – no branches, all characters always checked and final result is just simple check if there were no mismatches. it looks fine and may even work. unfortunately, if you happen to have a very clever compiler, it will notice that the above code is functionally equivalent to:

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  for(std::string::size_type i=0; i<dhHash.size(); ++i)
    if( dhHash[i] ^ userHash[i] )
      return false;
  return true;
}

and even saves some stack space! well – optimizations are awesome, except when they are not. ;)

attempt #3: disabling (some) optimizations

can we disable optimizations for certain variables? how about using a volatile variable…

bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  // step 1
  if( dhHash.size() != userHash.size() )
    return false;
  // step 2
  volatile std::string::value_type hasNonEquals = 0;      // <<<< ...RIGHT HERE
  for(std::string::size_type i=0; i<dhHash.size(); ++i)
    hasNonEquals |= dhHash[i] ^ userHash[i];
  return not hasNonEquals;
}

IMHO this should do. we could probably stop at this point. we are stepping on a thin ice here, though.

first of all volatile is crucial and non-obvious part here. if some1 in one year time will come and say hey – what a moron put this volatile here?, we're screwed.

technically speaking, even though we're guaranteed compiler will not touch our volatile reads and writes, instructions can still get reorganized, a helper variable can be used to store intermediates, etc… in theory at least. i'd not reasonably expect it to happen, but when it comes to cryptography – better be safe than sorry.

last but not least – above approach is C++-specific trick. note that volatile has different meanings in different programming languages! for instance in C# and Java it is equivalent to std::atomic variables in C++. you'll not be able to use similar method in Python or PHP too.

can there be more generic approach?

attempt #4: generic approach

can this be done? i've asked myself the same question this weekend and a solution crossed my mind. we need to have a logic complex enough to hide compile-time knowledge behind runtime-dependent computations. this way we'll know, we're not leaking any pieces of information and at the same time make it portable to other languages. logically we want something similar to:

std::string reasonableHash(std::string str); // <<< any one-way hash function should do here
 
bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  return reasonableHash(dhHash) == reasonableHash(userHash);
}

it will cost much more than any of the above implementations, but we'll be able to do it safely, using any languages.

the above solution is however missing one more crucial point. having indexed rainbow tables and big storage, it might still be feasible to narrow down the search, based on output hashes and timing attack, to find proper input-string.

this can be however easily solved, by simply adding some extra noise to the system. consider the following implementation:

std::string reasonableHash(std::string str); // as before
std::string randomString();                  // PRNG can be used for that
 
bool checkEqual(std::string const& dbHash, std::string const& userHash)
{
  const auto junk = randomString();
  return reasonableHash( junk + dhHash ) == reasonableHash( junk + userHash );
}

using prng we're rendering timing attacks worthless, since each time algorithm is executed, it will return completely different hash, than before and string-compare will stop at “random” points.

afterword

it was both fun and learning experience to figure it all out. unfortunately for me, it has already been invented some time ago. ;)

blog/2016/03/07/comparing_hashes.txt · Last modified: 2021/06/15 20:09 by 127.0.0.1

Back to top