Monday, November 4, 2013

Magics

Hopefully the whale has not been beached... let's provide some semi-motivational content.

When it comes to fast computing, there are few things as powerful as the humble lookup-table. It comes in many sizes and flavours, implemented on the fly or through some elaborate framework (depending on the task at hand).  Sometimes it is pre-calculated and sometimes it is lazily build up during the actual operation.  This incarnation is often referred to as a cache and is a bit different than a lookup table.  However, the pre-calculated lookup table is often ignored and can often result in significant speed improvements in applications.

A lookup table can be constructed by getting some hash value from the key and then using that numerical value in a straightforward hash table (HashTable or HashMap).  The biggest problem with a hash table is collisions. Good keys and hashing strategies can help minimise the collisions, but collisions will always occur.  Bucketing and linear probing are then often used to circumvent this problem.

Today I want to discuss a rather impressive technique called "Magics", which is based on De-Bruijn sequences. It is often used in chess engines and is currently responsible for the fastest move generators in existence.  It is used by the majority of top chess engines and gives them a real compatitive edge.  Fortunately, the same concept can also be applied to different applications. It does tend to be a bit niche, but so is many highly effective algorithms.

Consider the following problem:  You have a list of 500 numbers and you need to calculate a very complex operation on each number.  We'll use something boring like prime factorization for this discussion.  The numbers are all withing the range 0..499.  How would you do it?  This answer is trivial, of course.  A simple array with the number as the index value will give a collision-free lookup technique, something like:

array[0] = 0
array[1] = 1
array[2] = 1
array[3] = 1
array[4] = 2
...
array[400] = 2
array[401] = 1
...
array[499] = 1

To find the number of prime factors for 400, a simple, inexpensive array lookup is required.

Now, consider an arbitrary list of numbers, i.e. you still have 500 fixed numbers, but a number can be any number in the 32-bit number range. For a collision-free implementation, you could use an array of 2^32 locations, but that would be a terrible waste of memory and completely impractical. This is where Magics comes into play...

In essence, the process can be summarised as follows:

1.  Given a key (our number), k, multiply this key with a magic number to obtain an index mapping (using 64 bit integers). 

2. Right shift the index mapping with 64-n, where n is the size of the lookup table.  Smaller n for denser tables, larger n for less-dense.  The better the magic number, the smaller n will be.

3.  Use this shifted value as the index in the lookup table.

The biggest question of course is where can you get this magic number from?  There are techniques to do it precicely, but a simple brute-force technique will suffice in most cases.  The basic idea is as follows:

Given a list of 500 numbers to hash collision-free and a value for n  (larger n will be easier, smaller n harder), then:

1.  while not done
2.    genereate a random 64-bit magic number, m.
3.    create and initialise to false, a boolean array, array[n].
4.    set done to true
5.    for every number, n, in the list.
6.      index = (n * m) >>> (64 - n)
7.      if array[index] is true (collision, try again...)
8.        Set done to false
9.        break
10.     end-if
11.     array[index] = true
12.   end for
13.  end-while    

Hopefully some of the numbers, will hash to the same location (for example all the prime numbers could, potentially, hash to the same location as they all have only 1 prime factor). When the magic number is found, it can be used to populate the lookup table with the calculated values.  This can be stored in a file for quick retreival on program startup.

As an example, consider a fixed list of 500 numbers of which the number of prime factors are required.  The range of the numbers is from 0 to 2,147,483,647. For 500 numbers, it takes about 10 seconds to find a magic number for n=13 (i.e. a lookup table with 8192 locations).  Denser tables, with smaller n, should be possible.  The population takes a bit of time, but once done, the lookups are instantaneous.

I include a demo program with which to play.  Set the BITS and the COUNT at the top to change the lookup table density and the number of numbers to use.