The problem
considered here concerns a large number of string keys that need to
be accessed by full or partial match. Each string is unique, so the
full-match access can easily be handled by a standard hash-table
structure (e.g., `java.util.HashMap`). The
partial-match access needs to collect all objects that have string
keys starting with a particular substring.

Consider this hash table consisting of keys and values:

"hello" 1 "bye" 2 "hi" 3

The full match for key "`hi`" retrieves
`3`, and the partial match against strings starting
with "`h`" retrieves the collection
`{1,3}`. Using a hash-table structure for the
partial-match access is expensive because it requires that all keys
be iterated over, and then each key matching the corresponding object
needs to be collated.

Of course, I am considering here a large collection of strings. Alternatives are not usually necessary for a few (or even a few thousand) strings. But for large collections, performance-tuning techniques become necessary.

To tune, look for data structures that quickly match any partial
string. The task is somewhat simpler than the most generic version of
this type of problem because you need to match only the first few
consecutive characters. This means that some sort of tree structure
is probably ideal. Of the structures available from the JDK,
`TreeMap` looks like it can provide exactly the
required functionality; it gives a minimal baseline and, if the
performance is adequate, there is no more tuning to do. But
`TreeMap` is 5 to 10 times slower than
`HashMap` for access and update. The target is to
obtain `HashMap` access speed for single-key access.

Don't get carried away searching for the perfect data structure. Thinking laterally, you can consider other possibilities. If you have the strings in a sorted collection, you can apply a binary search to find the index of the string that is greater than or less than the partial string, and then obtain all the strings (and hence corresponding objects) in between.

More specifically, you can construct a sorted array of keys from the
hash table. Then, if you want to find all strings starting with
"`h`", you can run a binary search for the strings
"`h`" and "`h\uFFFF`". This gives
all the indexes of the band for all the keys that start with
"`h`". Note that a binary search can return the
index where the string would be even if it is not actually in the
array. (The correct solution actually goes from
"`h`" inclusive to "`i`" exclusive,
but this solution will do for strings that don't
include character `\uFFFF`.)

Having parallel collections can lead to all sorts of problems in making sure both collections contain the same elements. Solutions that involve parallel collections should hide all accesses and updates to the parallel collections through a separate object to ensure that all accesses and updates are consistent. The solution here is suitable mainly when the collections are updated infrequently, e.g., when they are built once or periodically and read from often. Here is a class implementing this solution:

package tuning.struct; import java.util.Hashtable; import java.util.Enumeration; public class PartialSearcher { Hashtable hash; String[ ] sortedArray; public static void main(String args[ ]) { //Populate a Hashtable with ten strings Hashtable h = new Hashtable( ); h.put("hello", new Integer(1)); h.put("hell", new Integer(2)); h.put("alpha", new Integer(3)); h.put("bye", new Integer(4)); h.put("hello2", new Integer(5)); h.put("solly", new Integer(6)); h.put("sally", new Integer(7)); h.put("silly", new Integer(8)); h.put("zorro", new Integer(9)); h.put("hi", new Integer(10)); //Create the searching object PartialSearcher p = new PartialSearcher(h); //Match against all string keys given by //the first command line argument Object[ ] objs = p.match(args[0]); //And print the matches out for(int i = 0; i<objs.length; i++) System.out.println(objs[i]); } public PartialSearcher(Hashtable h) { hash = h; createSortedArray( ); } public Object[ ] match(String s) { //find the start and end positions of strings that match the key int startIdx = binarySearch(sortedArray, s, 0, sortedArray.length-1); int endIdx = binarySearch(sortedArray, s+ '\uFFFF', 0, sortedArray.length-1); //and return an array of the matched keys Object[ ] objs = new Object[endIdx-startIdx]; for (int i = startIdx ; i < endIdx; i++) objs[i-startIdx] = sortedArray[i]; return objs; } public void createSortedArray( ) { //Create a sorted array of the keys of the hash table sortedArray = new String[hash.size( )]; Enumeration e = hash.keys( ); for (int i = 0; e.hasMoreElements( ); i++) sortedArray[i] = (String) e.nextElement( ); quicksort(sortedArray, 0, sortedArray.length-1); } /** * Semi-standard binary search returning index of match location or * where the location would match if it is not present. */ public static int binarySearch(String[ ] arr, String elem, int fromIndex, int toIndex) { int mid,cmp; while (fromIndex <= toIndex) { mid =(fromIndex + toIndex)/2; if ( (cmp = arr[mid].compareTo(elem)) < 0) fromIndex = mid + 1; else if (cmp > 0) toIndex = mid - 1; else return mid; } return fromIndex; } /** * Standard quicksort */ public void quicksort(String[ ] arr, int lo, int hi) { if( lo >= hi ) return; int mid = ( lo + hi ) / 2; String tmp; String middle = arr[ mid ]; if( arr[ lo ].compareTo(middle) > 0 ) { arr[ mid ] = arr[ lo ]; arr[ lo ] = middle; middle = arr[ mid ]; } if( middle.compareTo(arr[ hi ]) > 0) { arr[ mid ] = arr[ hi ]; arr[ hi ] = middle; middle = arr[ mid ]; if( arr[ lo ].compareTo(middle) > 0) { arr[ mid ] = arr[ lo ]; arr[ lo ] = middle; middle = arr[ mid ]; } } int left = lo + 1; int right = hi - 1; if( left >= right ) return; for( ;; ) { while( arr[ right ].compareTo(middle ) > 0) { right--; } while( left < right && arr[ left ].compareTo(middle ) <= 0) { left++; } if( left < right ) { tmp = arr[ left ]; arr[ left ] = arr[ right ]; arr[ right ] = tmp; right--; } else { break; } } quicksort(arr, lo, left); quicksort(arr, left + 1, hi); } }

Note that this solution has a wider application than only string keys. Any type of object can be used as a key as long as you can create a methodology to compare the order of the keys. This is a reasonable solution for several types of indexing.