11.3 Hashtables and HashMaps

Because Hashtables and HashMaps are the most commonly used nonlist structures, I will spend a little extra time discussing them. Hashtables and HashMaps are pretty fast and provide adequate performance for most purposes. I rarely find that I have a performance problem using Hashtables or HashMaps, but here are some points that will help you tune them or, if necessary, replace them:

  • Hashtable is synchronized. That's fine if you are using it to share data across threads, but if you are using it single-threaded, you can replace it with an unsynchronized version to get a small boost in performance. HashMap is an unsynchronized version available from JDK 1.2.

  • Hashtables and HashMaps are resized whenever the number of elements reaches the [capacity x loadFactor]. This requires reassigning every element to a new array using the rehashed values. This is not simply an array copy; every element needs to have its internal table position recalculated using the new table size for the hash function. You are usually better off setting an initial capacity that handles all the elements you want to add. This initial capacity should be the number of elements divided by the loadFactor (the default load factor is 0.75).

  • Hashtables and HashMaps are faster with a smaller loadFactor, but take up more space. You have to decide how this tradeoff works best for you.

  • The hashing function for most implementations should work better with a capacity that is a prime number. However, the 1.4 HashMap implementation (but not the Hashtable implementation) uses a different implementation that requires a power-of-two capacity so that it can use bit shifting and masking instead of the % operator. If you specify a non-power-of-two capacity, the HashMap will automatically find the nearest power-of-two value higher than the specified capacity. For other hash maps, always use a prime (preferably) or odd number capacity. A useful prime number to remember is 89. The sequence of numbers generated by successively multiplying by two and adding one includes several primes when the sequence starts with 89. But note also that speedups from prime number capacities are small at best.

  • Access to the Map requires asking the key for its hashCode( ) and also testing that the key equals( ) the key you are retrieving. You can create a specialized Map class that bypasses these calls if appropriate. Alternatively, you can use specialized key classes that have very fast method calls for these two methods. Note, for example, that Java String objects have hashCode( ) methods that iterate and execute arithmetic over a number of characters to determine the value, and the String.equals( ) method checks that every character is identical for the two strings being compared. Considering that strings are used as the most common keys in Hashtables, I'm often surprised to find that I don't have a performance problem with them, even for largish tables. From JDK 1.3, Strings cache their hash code in an instance variable, making them faster and more suited as Map keys.

  • If you are building a specialized Hashtable, you can map objects to array elements to preallocate HashtableEntry objects and speed up access as well. The technique is illustrated in the "Search Trees" section later in this chapter.

  • The hash function maps the entries to table elements. The fewer entries that map to the same internal table entry, the more efficient the map. There are techniques for creating more efficient hash maps, for instance, those discussed in my article "Optimizing Hash Functions For a Perfect Map" (see http://www.onjava.com/pub/a/onjava/2001/01/25/hash_functions.html).

Here is a specialized class to use for keys in a Hashtable. This example assumes that I am using String keys, but all my String objects are nonequal, and I can reference keys by identity. I use a utility class, tuning.dict.Dict, which holds a large array of nonequal words taken from an English dictionary. I compare the access times against all the keys using two different Hashtables, one using the plain String objects as keys, the other using my own StringWrapper objects as keys. The StringWrapper objects cache the hash value of the string and assume that equality comparison is the same as identity comparison. These are the fastest possible equals( ) and hashCode( ) methods. The access speedups are illustrated in the following table of measurements (times normalized to the JDK 1.2 case):

 

1.1.8

1.2.2

1.3.1_02

1.3.1_02-server

1.4.0

1.4.0-server

1.4.0-Xint

String keys[7]

112%

100%

40.8%

65.4%

41.2%

60.1%

181%

String-wrapped keys

87.4%

71.1%

40.3%

57.6%

38.4%

58.6%

159%

[7] The limited speedup from JDK 1.3 reflects the improved performance of Strings having their hash code cached in the String instance.

If you create a hash-table implementation specialized for the StringWrapper class, you avoid calling the hashCode( ) and equals( ) methods completely. Instead, the specialized hash table can access the hash-instance variable directly and use identity comparison of the elements. The speedup is considerably larger, and for specialized purposes, this is the route to follow:

package tuning.hash;
   
import java.util.Hashtable;
import tuning.dict.Dict;
   
public class SpecialKeyClass
{
   
  public static void main(String[  ] args)
  {
    //Initialize the dictionary
    try{Dict.initialize(true);}catch(Exception e){  }
    System.out.println("Started Test");
   
    //Build the two hash tables. Keep references to the
    //StringWrapper objects for later use as accessors.
    Hashtable h1 = new Hashtable(  );
    Hashtable h2 = new Hashtable(  );
    StringWrapper[  ] dict = new StringWrapper[Dict.DICT.length];
    for (int i = 0; i < Dict.DICT.length; i++)
    {
      h1.put(Dict.DICT[i], Boolean.TRUE);
      h2.put(dict[i] = new StringWrapper(Dict.DICT[i]), Boolean.TRUE);
    }
    System.out.println("Finished building");
   
    Object o;
   
    //Time the access for normal String keys
    long time1 = System.currentTimeMillis(  );
    for (int i = 0; i < Dict.DICT.length; i++)
      o = h1.get(Dict.DICT[i]);
    time1 = System.currentTimeMillis(  ) - time1;
    System.out.println("Time1 = " + time1);
   
    //Time the access for StringWrapper keys
    long time2 = System.currentTimeMillis(  );
    for (int i = 0; i < Dict.DICT.length; i++)
      o = h2.get(dict[i]);
    time2 = System.currentTimeMillis(  ) - time2;
    System.out.println("Time2 = " + time2);
   
  }
}
   
final class StringWrapper
{
  //cached hash code
  private int hash;
  private String string;
  public StringWrapper(String str)
  {
    string = str;
    hash = str.hashCode(  );
  }
  public final int hashCode(  )
  {
    return hash;
  }
  public final boolean equals(Object o)
  {
    //The fastest possible equality check
    return o =  = this;
   
/*
    //This would be the more generic equality check if we allowed
    //access of the same String value from different StringWrapper objects.
    //This is still faster than the plain Strings as keys.
    if(o instanceof StringWrapper)
    {
       StringWrapper s = (StringWrapper) o;
       return s.hash =  = hash && string.equals(s.string);
    }
    else
      return false;
*/
  }
}