2013.12.02 - got a change?

about two weeks ago a friend of mine sent me a link to a page with a problem-solving quest. of course this was a propaganda page. ;) anyway i was asked to provide a correct answer… since the deadline for submitting answers is officially over, let's have a look on what the solution looked like.

the task was to tell how many ways one can change 1024 into coins of 1,2,4,8,16,32,64,128,256,512. this is very famous problem of discrete mathematics – googling for an answer gives examples along with the source codes in any languages of your choice. anyway, not knowing “the best way” in advance i've figured out this will be a nice day to play…

first approach

after thinking a bit, the most obvious answer was to find number of all possible solutions to solve the equation of the form:

A*512 + B*256 + ... + I*2 + J*1 == 1024

looks like simple math. since A..J are non-negative numbers, and sum must be 1024, one can derive maximum values of each of parameters. simple. lets make it fun then!

prolog

gprolog has a CSP module for problems over finite domains (FD). so we end up with something like this (download):

change(A,B,C,D,E,F,G,H,I,J R) :-
  LD = [A,B,C,D,E,F,G,H,I,J],
  fd_domain(LD, 0, 1024),
  A*512 + B*256 + C*128 + D*64 + E*32 + F*16 + G*8 + H*4 + I*2 + J*1 #= R,
  fd_labeling(LD)
  .
 
len([], 0).
len([_|T], L) :-
  len(T, Lt),
  L #= 1 + Lt
  .
 
q(N, OUT) :-
  findall([A,B,C,D,E,F,G,H,I,J], change(A,B,C,D,E,F,G,H,I,J, N), LST),
  len(LST, OUT)
  .

nice, simple… but uses a bit too much memory. the problem is number of possible solutions. unfortunately prolog did not noticed it can generate lazy list and just count its length on a fly. any way it was generating about 22k answer per second, which is not mind-blowing-fast. the good news is that we have working reference implementation for finding number of changes for smaller numbers.

na(t)ive code

as a next step i decided to try luck with compiled languages. i've created a dead-simple implementation in C++ (download):

// ...
typedef unsigned Number;
int main(void)
{
  unsigned long count = 0;
  // b*512 + ... + k*1 = 1024,
    for(Number b=0; b<=2; ++b)
    {
      for(Number c=0; c<=4; ++c)
      {
 
      // ...
                      for(Number k=0; k<=1024; ++k)
                      {
                        if( b*512 + c*256 + d*128 + e*64 + f*32 + g*16 + h*8 + i*4 + j*2 + k*1 == 1024 )
                        {
                          ++count;
                          cout << b << "*512  + "
                               << c << "*256  + "
                               << d << "*128  + "
                               << e << "*64   + "
                               << f << "*32   + "
                               << g << "*16   + "
                               << h << "*8    + "
                               << i << "*4    + "
                               << j << "*2    + "
                               << k << "*1      "
                               << endl;
                        }
                      }
      // ...
      }
    }
  }
  cout << count << endl;
}

ok - got 122k answers per second. way faster… but still quite a bit of solutions to search.

na(t)ive code mk.II

next approach was to use a git of domain knowledge again. note that most of the combinations will sum to over 1024 on the outer loops. say, if b==2 in the above example, there can be only one solution and there is no point in scanning all possible values of remaining variables. this cuts search space in 3. applying similar thinking to other variables cuts space more and more. so the improved implementation looks like this (download):

// ...
int main(void)
{
  constexpr Number value = 1024;
  unsigned long count = 0;
  // b*512 + ... + k*1 = 1024,
  for(Number b=0; b<=2; ++b)
  {
    const auto pb = b*512;
    if( pb > value )
      break;
    // ...
                    for(Number k=0; k<=1024; ++k)
                    {
                      const auto pk = pj + k*1;
                      if( pk == value )
                      {
                        ++count;
                        break;
                      }
                    }
 
      // ...
      }
    }
  }
  cout << value << " -> " << count << endl;
}

this one gets solution in 4.5 minutes on my computer. ok – so we finally have a working version, providing the answer (!=42, btw ;)).

math

as mentioned at the beginning, there is “more mathematical” solution to this problem. not to go into details, it is based on the generating functions.

first approach

here is an example solution, using the above mentioned math-based results (download):

// ...
typedef unsigned      Number;
typedef unsigned long Count;
 
template<unsigned CFN>
Count cntFor(const Number n)
{
  if(n<CFN)
    return cntFor<CFN/2>(n);
  return cntFor<CFN>(n-CFN) + cntFor<CFN/2>(n);
}
 
template<>
Count cntFor<1>(const Number n)
{
  return 1;
}
 
template<>
Count cntFor<2>(const Number n)
{
  if(n<=1)
    return 1;
  if(n<=2)
    return 2;
  return cntFor<2>(n-2) + cntFor<1>(n-1);
}
 
int main(void)
{
  const Number i = 1024;
  cout << i << " == " << cntFor<512>(i) << endl;
}

now that's short. and fast. this one executes in ~2.4[s] on my machine. this is probably the solution you'll find on the internet. and yet.. can we do any better?

caching

if you look close enough, i'll see that a lot of these recursive calls are repeated over and over again. smells like fibonacci's recursive implementation – the same code with the same parameters over and over again. first idea? add cache (download).

// ... the same as previously ...
constexpr Number   value   = 1024;
constexpr unsigned maxCoin = 512;
 
typedef vector<Count> Internal;
typedef vector<Internal> External;
External cache( maxCoin+1, Internal(value+1, 0) );
 
template<unsigned CFN>
Count cntFor(const Number n)
{
  if(cache[CFN][n])
    return cache[CFN][n];
  if(n<CFN)
    return cache[CFN][n] = cntFor<CFN/2>(n);
  return cache[CFN][n] = cntFor<CFN>(n-CFN) + cntFor<CFN/2>(n);
}
// ... the same as previously ...

new time? 0.004[s] (4 milliseconds!). now that's nice. :D

can we do better?

yes. although 4[ms] was good enough for me for one day… ;)

if you look closely, you can easily optimize the cache size. the one above has short syntax, but uses way more memory than is really needed. you could go down to ~value*10*sizeof(number)==O(value) bytes. in fact – you could simply provide the counting table, with all the required coefficients, and so compute in O(N) time and memory. that would be both nice and fast at the same time. ;)

!! SPOILER ALERT !!

the answer is 2320518947 – better than sheep counting!

blog/2013/12/02/got_a_change.txt · Last modified: 2013/12/02 15:20 by basz
Back to top
Valid CSS Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0