As we saw in the last section, objects are expensive to create. Where it is reasonable to reuse the same object, you should do so. You need to be aware of when not to call new. One fairly obvious situation is when you have already used an object and can discard it before you are about to create another object of the same class. You should look at the object and consider whether it is possible to reset the fields and then reuse the object, rather than throw it away and create another. This can be particularly important for objects that are constantly used and discarded: for example, in graphics processing, objects such as Rectangles, Points, Colors, and Fonts are used and discarded all the time. Recycling these types of objects can certainly improve performance.
Recycling can also apply to the internal elements of structures. For example, a linked list has nodes added to it as it grows, and as it shrinks, the nodes are discarded. Holding onto the discarded nodes is an obvious way to recycle these objects and reduce the cost of object creation.
Most container objects (e.g., Vectors, Hashtables) can be reused rather than created and thrown away. Of course, while you are not using the retained objects, you are holding onto more memory than if you simply discarded those objects, and this reduces the memory available to create other objects. You need to balance the need to have some free memory available against the need to improve performance by reusing objects. But generally, the space taken by retaining objects for later reuse is significant only for very large collections, and you should certainly know which ones these are in your application.
Note that when recycling container objects, you need to dereference all the elements previously in the container so that you don't prevent them from being garbage-collected. Because there is this extra overhead in recycling, it may not always be worth recycling containers. As usual for tuning, this technique is best applied to ameliorate an object-creation bottleneck that has already been identified.
Pooling objects has become slightly controversial recently. In their HotSpot FAQ, Sun engineering states that pooling should definitely no longer be used because it actually gives worse performance with the latest HotSpot engines. This is rather a sweeping statement. Object pools are still useful even with HotSpot, but presumably not as often as before. Certainly for shared resources pooling will always be an option if the overhead associated with creating a shareable resource is expensive. Various recent tests have shown that the efficiency of pooling objects compared to creating and disposing of objects is highly dependent on the size and complexity of the objects. And in some applications where deterministic behavior is important, especially J2ME applications, it is worth noting that object pools have deterministic access and reclamation costs for both CPU and memory, whereas object creation and garbage collection can be less deterministic.
A good strategy for reusing container objects is to use your own container classes, possibly wrapping other containers. This gives you a high degree of control over each collection object, and you can design them specifically for reuse. You can still use a pool manager to manage your requirements, even without reuse-designed classes. Reusing classes requires extra work when you've finished with a collection object, but the effort is worth it when reuse is possible. The code fragment here shows how you could use a vector pool manager:
//An instance of the vector pool manager. public static VectorPoolManager vectorPoolManager = new VectorPoolManager(25); ... public void someMethod( ) { //Get a new Vector. We only use the vector to do some stuff //within this method, and then we dump the vector (i.e., it //is not returned or assigned to a state variable) //so this is a perfect candidate for reusing Vectors. //Use a factory method instead of 'new Vector( )' Vector v = vectorPoolManager.getVector( ); ... //do vector manipulation stuff //and the extra work is that we have to explicitly tell the //pool manager that we have finished with the vector vectorPoolManager.returnVector(v); }
Note that nothing stops the application from retaining a handle on a vector after it has been returned to the pool, and obviously that could lead to a classic "inadvertent reuse of memory" bug . You need to ensure that handles to vectors are not held anywhere: these Vectors should be used only internally within an application, not externally in third-party classes where a handle may be retained. The following class manages a pool of Vectors:
package tuning.reuse; import java.util.Vector; public class VectorPoolManager { Vector[ ] pool; boolean[ ] inUse; public VectorPoolManager(int initialPoolSize) { pool = new Vector[initialPoolSize]; inUse = new boolean[initialPoolSize]; for (int i = pool.length-1; i>=0; i--) { pool[i] = new Vector( ); inUse[i] = false; } } public synchronized Vector getVector( ) { for (int i = inUse.length-1; i >= 0; i--) if (!inUse[i]) { inUse[i] = true; return pool[i]; } //If we got here, then all the Vectors are in use. We will //increase the number in our pool by 10 (arbitrary value for //illustration purposes). boolean[ ] old_inUse = inUse; inUse = new boolean[old_inUse.length+10]; System.arraycopy(old_inUse, 0, inUse, 0, old_inUse.length); Vector[ ] old_pool = pool; pool = new Vector[old_pool.length+10]; System.arraycopy(old_pool, 0, pool, 0, old_pool.length); for (int i = old_pool.length; i < pool.length; i++) { pool[i] = new Vector( ); inUse[i] = false; } //and allocate the last Vector inUse[pool.length-1] = true; return pool[pool.length-1]; } public synchronized void returnVector(Vector v) { for (int i = inUse.length-1; i >= 0; i--) if (pool[i] = = v) { inUse[i] = false; //Can use clear( ) for java.util.Collection objects //Note that setSize( ) nulls out all elements v.setSize(0); return; } throw new RuntimeException("Vector was not obtained from the pool: " + v); } }
Because you reset the Vector size to 0 when it is returned to the pool, all objects previously referenced from the vector are no longer referenced (the Vector.setSize( ) method nulls out all internal index entries beyond the new size to ensure no reference is retained). However, at the same time, you do not return any memory allocated to the Vector itself, because the Vector's current capacity is retained. A lazily initialized version of this class simply starts with zero items in the pool and sets the pool to grow by one or more each time.
(Many JDK collection classes, including java.util.Vector, have both a size and a capacity. The capacity is the number of elements the collection can hold before that collection needs to resize its internal memory to be larger. The size is the number of externally accessible elements the collection is actually holding. The capacity is always greater than or equal to the size. By holding spare capacity, elements can be added to collections without having to continually resize the underlying memory. This makes element addition faster and more efficient.)
The previous example of a pool manager can be used by multiple threads in a multithreaded application, although the getVector( ) and returnVector( ) methods first need to be defined as synchronized. This may be all you need to ensure that you reuse a set of objects in a multithreaded application. Sometimes, though, there are objects you need to use in a more complicated way. It may be that the objects are used in such a way that you can guarantee you need only one object per thread, but any one thread must consistently use the same object. Singletons (see Section 4.2.4) that maintain some state information are a prime example of this sort of object.
In this case, you might want to use a ThreadLocal object. ThreadLocals have accessors that return an object local to the current thread. ThreadLocal use is best illustrated using an example like this, which produces:
[This is thread 0, This is thread 0, This is thread 0] [This is thread 1, This is thread 1, This is thread 1] [This is thread 2, This is thread 2, This is thread 2] [This is thread 3, This is thread 3, This is thread 3] [This is thread 4, This is thread 4, This is thread 4]
Each thread uses the same access method to obtain a vector to add some elements. The vector obtained by each thread is always the same vector for that thread: the ThreadLocal object always returns the thread-specific vector. As the following code shows, each vector has the same string added to it repeatedly, showing that it is always obtaining the same thread-specific vector from the vector access method. (Note that ThreadLocals are only available from Java 2, but it is easy to create the equivalent functionality using a Hashtable: see the getVectorPriorToJDK12( ) method.)
package tuning.reuse; import java.util.*; public class ThreadedAccess implements Runnable { static int ThreadCount = 0; public void run( ) { //simple test just accesses the thread local vector, adds the //thread specific string to it, and sleeps for two seconds before //again accessing the thread local and printing out the value. String s = "This is thread " + ThreadCount++; Vector v = getVector( ); v.addElement(s); v = getVector( ); v.addElement(s); try{Thread.sleep(2000);}catch(Exception e){ } v = getVector( ); v.addElement(s); System.out.println(v); } public static void main(String[ ] args) { try { //Four threads to see the multithreaded nature at work for (int i = 0; i < 5; i++) { (new Thread(new ThreadedAccess( ))).start( ); try{Thread.sleep(200);}catch(Exception e){ } } } catch(Exception e){e.printStackTrace( );} } private static ThreadLocal vectors = new ThreadLocal( ); public static Vector getVector( ) { //Lazily initialized version. Get the thread local object Vector v = (Vector) vectors.get( ); if (v = = null) { //First time. So create a vector and set the ThreadLocal v = new Vector( ); vectors.set(v); } return v; } private static Hashtable hvectors = new Hashtable( ); /* This method is equivalent to the getVector( ) method, * but works prior to JDK 1.2 (as well as after). */ public static Vector getVectorPriorToJDK12( ) { //Lazily initialized version. Get the thread local object Vector v = (Vector) hvectors.get(Thread.currentThread( )); if (v = = null) { //First time. So create a vector and set the thread local v = new Vector( ); hvectors.put(Thread.currentThread( ), v); } return v; } }
Reuse also applies when a constant object is returned for information. For example, the preferredSize( ) of a customized widget returns a Dimension object that is normally one particular dimension. But to ensure that the stored unchanging Dimension value does not get altered, you need to return a copy of the stored Dimension. Otherwise, the calling method accesses the original Dimension object and can change the Dimension values, thus affecting the original Dimension object itself.
Java provides a final modifier to fields that allows you to provide fixed values for the Dimension fields. Unfortunately, you cannot redefine an already existing class, so Dimension cannot be redefined to have final fields. The best solution in this case is that a separate class, FixedDimension, be defined with final fields (this cannot be a subclass of Dimension, as the fields can't be redefined in the subclass). This extra class allows methods to return the same FixedDimension object if applicable, or a new FixedDimension is returned (as happens with Dimension) if the method requires different values to be returned for different states. Of course, it is too late now for java.awt to be changed in this way, but the principle remains.
Note that making a field final does not make an object unchangeable. It only disallows changes to the field:
public class FixedDimension { final int height; final int width; ... } //Both the following fields are defined as final public static final Dimension dim = new Dimension(3,4); public static final FixedDimension fixedDim = new FixedDimension(3,4); dim.width = 5; //reassignment allowed dim = new Dimension(3,5);//reassignment disallowed fixedDim.width = 5; //reassignment disallowed fixedDim = new FixedDimension(3,5); //reassignment disallowed
An alternative to defining preferredSize( ) to return a fixed object is to provide a method that accepts an object whose values will be set, e.g., preferredSize(Dimension). The caller can then pass in a Dimension object, which would have its values filled in by the preferredSize(Dimension) method. The calling method can then access the values in the Dimension object. This same Dimension object can be reused for multiple components. This design pattern is beginning to be used extensively within the JDK. Many methods developed with JDK 1.2 and onward accept a parameter that is filled in, rather than returning a copy of the master value of some object. If necessary, backward compatibility can be retained by adding this method as extra, rather than replacing an existing method:
public static final Dimension someSize = new Dimension(10,5); //original definition returns a new Dimension. public Dimension someSize( ) { Dimension dim = new Dimension(0,0); someSize(dim); return dim; } //New method which fills in the Dimension details in a passed parameter. public void someSize(Dimension dim) { dim.width = someSize.width; dim.width = someSize.height; }
Wherever possible, you should replace multiple objects with a single object (or just a few). For example, if you need only one VectorPoolManager object, it makes sense to provide a static variable somewhere that holds this. You can even enforce this by making the constructor private and holding the singleton in the class itself; e.g., change the definition of VectorPoolManager to:
public class VectorPoolManager { public static final VectorPoolManager SINGLETON = new VectorPoolManager(10); Vector[ ] pool; boolean[ ] inUse; //Make the constructor private to enforce that //no other objects can be created. private VectorPoolManager(int initialPoolSize) { ... }
An alternative implementation is to make everything static (all methods and both the instance variables in the VectorPoolManager class). This also ensures that only one pool manager can be used. My preference is to have a SINGLETON object for design reasons.[3]
[3] The VectorPoolManager is really an object with behavior and state. It is not just a related group of functions (which is what class static methods are equivalent to). My colleague Kirk Pepperdine insists that this choice is more than just a preference. He states that holding onto an object as opposed to using statics provides more flexibility should you need to alter the use of the VectorPoolManager or provide multiple pools. I agree.
This activity of replacing multiple copies of an object with just a few objects is often referred to as canonicalizing objects. The Booleans provide an existing example of objects that should have been canonicalized in the JDK. They were not, and no longer can be without breaking backward compatibility. For Booleans, only two objects need to exist, but by allowing a new Boolean object to be created (by providing public constructors), you lose canonicalization. The JDK should have enforced the existence of only two objects by keeping the constructors private. Note that canonical objects have another advantage in addition to reducing the number of objects created: they also allow comparison by identity. For example:
Boolean t1 = new Boolean(true); System.out.println(t1= =Boolean.TRUE); System.out.println(t1.equals(Boolean.TRUE));
produces the output:
false true
If Booleans had been canonicalized, all Boolean comparisons could be done by identity: comparison by identity is always faster than comparison by equality, because identity comparisons are simply pointer comparisons.[4]
[4] Deserializing Booleans would have required special handling to return the canonical Boolean. All canonicalized objects similarly require special handling to manage serialization. Java serialization supports the ability, when deserializing, to return specific objects in place of the object that is normally created by the default deserialization mechanism.
You are probably better off not canonicalizing all objects that could be canonicalized. For example, the Integer class can (theoretically) have its instances canonicalized, but you need a map of some sort, and it is more efficient to allow multiple instances, rather than to manage a potential pool of four billion objects. However, the situation is different for particular applications. If you use just a few Integer objects in some defined way, you may find you are repeatedly creating the Integer objects with values 1, 2, 3, etc., and also have to access the integerValue( ) to compare them. In this case, you can canonicalize a few integer objects, improving performance in several ways: eliminating the extra Integer creations and the garbage collections of these objects when they are discarded, and allowing comparison by identity. For example:
public class IntegerManager { public static final Integer ZERO = new Integer(0); public static final Integer ONE = new Integer(1); public static final Integer TWO = new Integer(2); public static final Integer THREE = new Integer(3); public static final Integer FOUR = new Integer(4); public static final Integer FIVE = new Integer(5); public static final Integer SIX = new Integer(6); public static final Integer SEVEN = new Integer(7); public static final Integer EIGHT = new Integer(8); public static final Integer NINE = new Integer(9); public static final Integer TEN = new Integer(10); } public class SomeClass { public void doSomething(Integer i) { //Assume that we are passed a canonicalized Integer if (i = = IntegerManager.ONE) xxx( ); else if(i = = IntegerManager.FIVE) yyy( ); else ... } ... }
There are various other frequently used objects throughout an application that should be canonicalized. A few that spring to mind are the empty string, empty arrays of various types, and some dates.
There can be some confusion about whether Strings are already canonicalized. There is no guarantee that they are, although the compiler can canonicalize Strings that are equal and are compiled in the same pass. The String.intern( ) method canonicalizes strings in an internal table. This is supposed to be, and usually is, the same table used by strings canonicalized at compile time, but in some earlier JDK versions (e.g., 1.0), it was not the same table. In any case, there is no particular reason to use the internal string table to canonicalize your strings unless you want to compare Strings by identity (see Section 5.5). Using your own table gives you more control and allows you to inspect the table when necessary. To see the difference between identity and equality comparisons for Strings, including the difference that String.intern( ) makes, you can run the following class:
public class Test { public static void main(String[ ] args) { System.out.println(args[0]); //see that we have the empty string //should be true System.out.println(args[0].equals("")); //should be false since they are not identical objects System.out.println(args[0] = = ""); //should be true unless there are two internal string tables System.out.println(args[0].intern( ) = = ""); } }
This Test class, when run with the command line:
% java Test ""
gives the output:
true false true
Canonicalizing objects is best for read-only objects and can be troublesome for objects that change. If you canonicalize a changeable object and then change its state, then all objects that have a reference to the canonicalized object are still pointing to that object, but with the object's new state. For example, suppose you canonicalize a special Date value. If that object has its date value changed, all objects pointing to that Date object now see a different date value. This result may be desired, but more often it is a bug.
If you want to canonicalize changeable objects, one technique to make it slightly safer is to wrap the object with another one, or use your own (sub)class.[5] You then control all accesses and updates. If the object is not supposed to be changed, you can throw an exception on any update method. Alternatively, if you want some objects to be canonicalized but with copy-on-write behavior, you can allow the updater to return a noncanonicalized copy of the canonical object.
[5] Beware that using a subclass may break the superclass semantics.
Note that it makes no sense to build a table of millions or even thousands of strings (or other objects) if the time taken to test for, access, and update objects in the table is longer than the time you are saving canonicalizing them.
One technique for maintaining collections of objects that can grow too large is the use of WeakReferences (from the java.lang.ref package in Java 2). If you need to maintain one or more pools of objects with a large number of objects being held, you may start coming up against memory limits of the VM. In this case, you should consider using WeakReference objects to hold onto your pool elements. Objects referred to by WeakReferences can be automatically garbage-collected if memory gets low enough (see Section 4.3 later in this chapter).
Java 2 comes with a java.util.WeakHashMap class that implements a hash table with keys held by weak references.
A WeakReference normally maintains references to elements in a table of canonicalized objects. If memory gets low, any of the objects referred to by the table and not referred to anywhere else in the application (except by other weak references) are garbage-collected . This does not affect the canonicalization because only those objects not referenced anywhere else are removed. The canonical object can be re-created when required, and this new instance is now the new canonical object: remember that no other references to the object exist, or the original could not have been garbage-collected.
For example, a table of canonical Integer objects can be maintained using WeakReferences. This example is not particularly useful: unlike the earlier example, in which Integer objects from 1 to 10 can be referenced directly with no overhead, thus providing a definite speedup for tests, the next example has overhead that probably overshadows any benefits of having canonical Integers. I present it only as a clear and simple example to illustrate the use of WeakReferences.
The example has two iterations: one sets an array of canonical Integer objects up to a value set by the command-line argument; a second loops through to access the first 10 canonical Integers. If the first loop is large enough (or the VM memory is constrained low enough), the garbage collector kicks in and starts reclaiming some of the Integer objects that are all being held by WeakReferences. The second loop then reaccesses the first 10 Integer objects. Earlier, I explicitly held onto five of these Integer objects (integers 3 to 7 inclusive) in variables so that they could not be garbage-collected and so that the second loop would reset only the five reclaimed Integers. When running this test with the VM constrained to 4 MB:
% java -Xmx4M tuning.reuse.Test 100000
you get the following output:
Resetting integer 0 Resetting integer 1 Resetting integer 2 Resetting integer 8 Resetting integer 9
The example is defined here. Note the overhead. Even if the reference has not been garbage-collected, you have to access the underlying object and cast it to the desired type:
package tuning.reuse; import java.util.*; import java.lang.ref.*; public class Test { public static void main(String[ ] args) { try { Integer ic = null; int REPEAT = args.length > 0 ? Integer.parseInt(args[0]) : 10000000; //Hang onto the Integer objects from 3 to 7 //so that they cannot be garbage collected Integer i3 = getCanonicalInteger(3); Integer i4 = getCanonicalInteger(4); Integer i5 = getCanonicalInteger(5); Integer i6 = getCanonicalInteger(6); Integer i7 = getCanonicalInteger(7); //Loop through getting canonical integers until there is not //enough space, and the garbage collector reclaims some. for (int i = 0; i < REPEAT; i++) ic = getCanonicalInteger(i); //Now just re-access the first 10 integers (0 to 9) and //the 0, 1, 2, 8, and 9 integers will need to be reset in //the access method since they will have been reclaimed for (int i = 0; i < 10; i++) ic = getCanonicalInteger(i); System.out.println(ic); } catch(Exception e){e.printStackTrace( );} } private static Vector canonicalIntegers = new Vector( ); public static Integer getCanonicalInteger(int i) { //First make sure our collection is big enough if (i >= canonicalIntegers.size( )) canonicalIntegers.setSize(i+1); //Now access the canonical value. //This element contains null if the the value has never been set //or a weak reference that may have been garbage collected WeakReference ref = (WeakReference) canonicalIntegers.elementAt(i); Integer canonical_i; if (ref = = null) { //never been set, so create and set it now canonical_i = new Integer(i); canonicalIntegers.setElementAt(new WeakReference(canonical_i), i); } else if( (canonical_i = (Integer) ref.get( )) = = null) { //has been set, but was garbage collected, so recreate and set it now //Include a print to see that we are resetting the Integer System.out.println("Resetting integer " + i); canonical_i = new Integer(i); canonicalIntegers.setElementAt(new WeakReference(canonical_i), i); } //else clause not needed, since the alternative is that the weak ref was //present and not garbage collected, so we now have our canonical integer return canonical_i; } }
Another canonicalization technique often used is replacing constant objects with integers. For example, rather than use the strings "female" and "male", you should use a constant defined in an interface:
public interface GENDER { public static final int FEMALE=1; public static final int MALE=2; }
Used consistently, this enumeration can provide both speed and memory advantages. The enumeration requires less memory than the equivalent strings and makes network transfers faster. Comparisons are faster too, as the identity comparison can be used instead of the equality comparison. For example, you can use:
this.gender = = FEMALE;
instead of:
this.gender.equals("female");