11.8 Caching Examples

When accessing elements from sets of data, some elements are accessed much more frequently than others. In these cases, it is possible to apply caching techniques to speed up access to frequently accessed elements. This is best demonstrated with the following example.

Consider a CacheTest class that consists mainly of a Map populated with Integer objects. I use Integer objects for convenience to populate the Map with many elements, but the actual object type is of no significance because you use only the hashCode( ) and equals( ) methods, just as the Map does.

Basically, you provide two ways to access the elements of the Map. The first, plain_access( ), just calls the Map.get( ) method as usual. The second method, cached_access( ), uses the lower bits of the hash code of the object to obtain an index value into an array. This index is then checked to see whether the object is there. If it is, the corresponding value in a parallel value array is returned. If it's not, the object is placed there with the value in the corresponding value array.

This is about the simplest example of general cached access. It demonstrates the advantages and pitfalls of cached access. I have selected 10 integers that do not map to the same indexes for the example. Running the class gives a straightforward comparison between the two access methods, and I get the result that the cached access varies significantly depending on the VM used. The access speedups are illustrated in the following table of measurements. Times have been normalized to the JDK 1.2.2 case for using a HashMap. The first time of each entry is the measurement using a HashMap, and the second is the measurement using a Hashtable. For any one VM, cached access is significantly faster.

 

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

Plain access (HashMap/ Hashtable)

100%/320%

109%/138%

112%/152%

82.6%/155%

93.2%/89.3%

1964%/1596%

Cached access (HashMap/ Hashtable)

29.3%/29.2%

57.2%/57%

36.5%/42.3%

60.4%/60.2%

38.5%/48.1%

1218%/1294%

This test is artificial, in that I chose integers where no two map to the same index. If there is more than one integer that maps to the same cache array index, this is called a collision. Clearly, with collisions, performance is not as good because you are constantly entering the code that puts the objects into the cache. Collisions are a general problem with cached data, and you need to minimize them for optimal performance. This can be done by choosing an appropriate mapping function to generate indexes that minimize collisions:

package tuning.cache;
   
import java.util.HashMap;
import java.util.Hashtable;
import java.lang.Math;
   
public class CacheTest
{
  //The cache array for the keys
  static Object[  ] cache_keys = new Object[128];
  //The array for the values corresponding to cached keys
  static Object[  ] cache_values = new Object[128];
  //static Hashtable hash = new Hashtable(  );
  static HashMap hash = new HashMap(  );
   
  public static void main(String[  ] args)
  {
    try
    {
      System.out.println("started populating");
      populate(  );
      System.out.println("started accessing");
      access_test(  );
    }
    catch(Exception e){e.printStackTrace(  );}
  }
   
  public static void populate(  )
  {
    for (int i = 0; i < 100000; i++)
      hash.put(new Integer(i), new Integer(i+5));
  }
   
  public static Object plain_access(Integer i)
  {
    //simple get(  ) call to the hash table
    return hash.get(i);
  }
   
  public static Object cached_access(Integer i)
  {
    //First get access index
    int access = Math.abs(i.hashCode(  )) & 127;
    Object o;
    //if the access index has an object, and that object is equal to key
    //then return the corresponding value in the parallel values array.
    if ( (o = cache_keys[access]) =  = null || !o.equals(i))
    {
      //otherwise, we got a collision. We need to replace the
      //object at that access index with the new one that we
      //get from the hash table using normal Hashtable.get(  ),
      //and then return the value retrieved this way
      if (o != null)
        System.out.println("Collsion between " + o + " and " + i);
      o = hash.get(i);
      cache_keys[access] = i;
      cache_values[access] = o;
      return o;
    }
    else
    {
      return cache_values[access];
    }
  }
   
  public static void access_test(  )
  {
    //Ten integers that do not collide under the mapping scheme
    //This gives best performance behavior for illustration purposes
    Integer a0 = new Integer(6767676);
    Integer a1 = new Integer(33);
    Integer a2 = new Integer(998);
    Integer a3 = new Integer(3333);
    Integer a4 = new Integer(12348765);
    Integer a5 = new Integer(9999);
    Integer a6 = new Integer(66665);
    Integer a7 = new Integer(1234);
    Integer a8 = new Integer(987654);
    Integer a9 = new Integer(3121219);
    Object o1,o2,o3,o4,o5,o6,o7,o8,o9,o0;
    long time = System.currentTimeMillis(  );
    for (int i = 0; i < 1000000; i++)
    {
       o1 = plain_access(a0);
       o2 = plain_access(a1);
       o3 = plain_access(a2);
       o4 = plain_access(a3);
       o5 = plain_access(a4);
       o6 = plain_access(a5);
       o7 = plain_access(a6);
       o8 = plain_access(a7);
       o9 = plain_access(a8);
       o0 = plain_access(a9);
    }
    System.out.println("plain access took " +
        (System.currentTimeMillis(  )-time));
   
    time = System.currentTimeMillis(  );
    for (int i = 0; i < 1000000; i++)
    {
       o1 = cached_access(a0);
       o2 = cached_access(a1);
       o3 = cached_access(a2);
       o4 = cached_access(a3);
       o5 = cached_access(a4);
       o6 = cached_access(a5);
       o7 = cached_access(a6);
       o8 = cached_access(a7);
       o9 = cached_access(a8);
       o0 = cached_access(a9);
    }
    System.out.println("cached access took " +
        (System.currentTimeMillis(  )-time));
   
  }
}

In this example, we add an instance variable to the keys to provide the mapping into the cache. This example uses a circular cache that holds just the most recent 128 keys accessed. Because of more optimal cache access, this has an even larger speedup than the previous example:

package tuning.cache;
   
import java.util.Hashtable;
import java.lang.Math;
   
public class Test2
{
  //The cache array for the keys
  static Test2[  ] cache_keys = new Test2[128];
  //The array for the values corresponding to cached keys
  static Object[  ] cache_values = new Object[128];
  static Hashtable hash = new Hashtable(  );
   
  //The index to use for the next object added to the cache
  static int freeIndex = 0;
   
  //The current index in the cache referenced by this object
  int cacheRef = -1;
  //Unique integer for each object, can be used as hash code
  int value;
   
  public static void main(String[  ] args)
  {
    try
    {
      System.out.println("started populating");
      populate(  );
      System.out.println("started accessing");
      access_test(  );
    }
    catch(Exception e){e.printStackTrace(  );}
  }
   
  public Test2(int i)
  {
    value = i;
  }
   
  public int hashCode(  )
  {
    return value;
  }
  public boolean equals(Object obj)
  {
    //Equality test requires null check, type check, and value check
    if ((obj != null) && (obj instanceof Test2))
      return value =  = ((Test2) obj).value;
    else
      return false;
  }
   
  public static void populate(  )
  {
    for (int i = 0; i < 100000; i++)
      hash.put(new Test2(i), new Integer(i+5));
  }
   
  public static Object plain_access(Test2 i)
  {
    return hash.get(i);
  }
   
  public static Object cached_access(Test2 i)
  {
    //Access index into the cache is quick and easy to get
    int access = i.cacheRef;
    Object o;
   
    //If it is -1 then it is not in the cache
    if (access =  = -1)
    {
      //get the object using the hash table
      o = hash.get(i);
      //Get the next available index in the cache.
      //Wind round to the start of the cache if it is off the end
      if (freeIndex >= cache_keys.length)
        freeIndex = 0;
      //set the cache index; increment the next cache index too
      access = i.cacheRef = freeIndex++;
      //If there was already something in the cache at that location,
      //uncache it
      if (cache_keys[access] != null)
      {
        System.out.println("Collsion between " + cache_keys[access] +
                " and " + i);
        cache_keys[access].cacheRef = -1;
      }
      //And cache our new value.
      cache_keys[access] = i;
      cache_values[access] = o;
      return o;
    }
    else
    {
      return cache_values[access];
    }
  }
   
  public static void access_test(  )
  {
    Test2 a0 = new Test2(6767676);
    Test2 a1 = new Test2(33);
    Test2 a2 = new Test2(998);
    Test2 a3 = new Test2(3333);
    Test2 a4 = new Test2(12348765);
    Test2 a5 = new Test2(9999);
    Test2 a6 = new Test2(66665);
    Test2 a7 = new Test2(1234);
    Test2 a8 = new Test2(987654);
    Test2 a9 = new Test2(3121219);
    Object o1,o2,o3,o4,o5,o6,o7,o8,o9,o0;
    long time = System.currentTimeMillis(  );
    for (int i = 0; i < 1000000; i++)
    {
       o1 = plain_access(a0);
       o2 = plain_access(a1);
       o3 = plain_access(a2);
       o4 = plain_access(a3);
       o5 = plain_access(a4);
       o6 = plain_access(a5);
       o7 = plain_access(a6);
       o8 = plain_access(a7);
       o9 = plain_access(a8);
       o0 = plain_access(a9);
    }
    System.out.println("plain access took " + 
        (System.currentTimeMillis(  )-time));
   
    time = System.currentTimeMillis(  );
    for (int i = 0; i < 1000000; i++)
    {
       o1 = cached_access(a0);
       o2 = cached_access(a1);
       o3 = cached_access(a2);
       o4 = cached_access(a3);
       o5 = cached_access(a4);
       o6 = cached_access(a5);
       o7 = cached_access(a6);
       o8 = cached_access(a7);
       o9 = cached_access(a8);
       o0 = cached_access(a9);
    }
    System.out.println("cached access took " + 
        (System.currentTimeMillis(  )-time));
   
  }
   
}

These are examples of general data caching. Sometimes you will know beforehand exactly which objects will be frequently accessed. In this case, you can create a specialized class that provides an accessor that optimizes access for just these objects. This can be as simple as a switch statement or multiple if statements. For example:

public Object get(Object key)
{
  if (key =  = FAST_KEY1)
    return value1;
  else if (key.equals(FASTISH_KEY2))
    return value2;
  else if (key.equals(possibly_fast_key_assigned_at_runtime))
    return value3;
  else
    return hash.get(key);
}