3.8 Better Optimizing Compilers

Java code compilers that specifically target performance optimizations are increasingly available. (I maintain a list at http://www.JavaPerformanceTuning.com/resources.shtml. A list is also included in Chapter 19.) Of course, all compilers try to optimize code, but some are better than others. Some companies put a great deal of effort into making their compiler produce the tightest, fastest code, while others tend to be distracted by other aspects of the Java environment and put less effort into the compile phase.

There are also some experimental compilers around. For example, the JAVAR compiler (http://www.extreme.indiana.edu/hpjava/) is a prototype compiler that automatically parallelizes parts of a Java application to improve performance.

It is possible to write preprocessors to automatically achieve many of the optimizations you can get with optimizing compilers; in fact, you can think of an optimizing compiler as a preprocessor together with a basic compiler (though in many cases it is better described as a postprocessor and recompiler). However, writing such a preprocessor is a significant task. Even if you ignore the Java code parsing or bytecode parsing required,^[6] any one preprocessor optimization can take months to create and verify. Getting close to the full set of optimizations listed in the following sections could take years of development. Fortunately, it is not necessary for you to make that effort because optimizing-compiler vendors are making the effort for you.

^[6] Such parsing is a one-off task that can then be applied to any optimization. There are several free packages available for parsing class files, including CFParse from the IBM alphaWorks site, http://www.alphaworks.ibm.com/tech/cfparse.

3.8.1 What Optimizing Compilers Cannot Do

Optimizing compilers cannot change your code to use a better algorithm. If you are using an inefficient search routine, there may be a vastly better search algorithm that would give an order-of-magnitude speedup. But the optimizing compiler only tries to speed up the algorithm you are using (probably a small incremental speedup). It is still important to profile applications to identify bottlenecks even if you intend to use an optimizing compiler.

It is important to start using an optimizing compiler from the early stages of development in order to tailor your code to its restrictions. Integrating an optimizing compiler at a late stage of development can mean restructuring core routines and many disparate method calls, and may even require some redesign to work around limitations imposed by being unable to correctly handle reflection and runtime class resolution. Optimizing compilers have difficulty dealing with classes that cannot be identified at compile time (e.g., building a string at runtime and loading a class of that name). Basically, using Class.forName( ) is not (and cannot be) handled in any complete way, though several compilers try to manage as best they can. In short, managers with projects at a late stage of development are often reluctant to make extensive changes to either the development environment or the code. While code tuning can be targeted at bottlenecks and so normally affects only small sections of code, integrating an optimizing compiler can affect the entire project. If there are too many problems in this integration, most project managers decide that the potential risks outweigh the possible benefits and prefer to take the safe route of carrying on without the optimizing compiler.

3.8.2 What Optimizing Compilers Can Do

Compilers can apply many "classic" optimizations and a host of newer optimizations that apply specifically to object-oriented programs and languages with virtual machines. I list many optimizations in the following sections.

You can apply most classic compiler-optimization techniques by hand directly to the source. But usually you should not, as it makes the code more complicated to read and maintain. Individually, each of these optimizations improves performance only by small amounts. Collectively (as applied by a compiler across all the code), they can make a significant contribution to improving performance. This is important to remember: as you look at each individual optimization, in many cases you may think, "Well, that isn't going to make much difference." This is correct. The power of optimizing compilers comes in automatically applying many small optimizations that would be annoying or confusing to apply by hand. The combination of all those small optimizations can add up to a big speedup.

Optimizing-compiler vendors claim to see significant speedups: up to 50% for many applications. Most applications in serious need of optimization are looking for speedups even greater than this, but don't ignore the optimizing compiler for that reason: it may be doubling the speed of your application for a relatively cheap investment. As long as you do not need to restructure much code to take advantage of them, optimizing compilers can give you "the biggest bang for your buck" after JIT VMs in terms of performance improvements.

The next sections list many of the well-known optimizations these compilers may apply. This list can help you when selecting optimizing compilers or applying some of these optimizations by hand.

3.8.2.1 Remove unused methods and classes

When all application classes are known at compile time, an optimizing compiler can analyze the full runtime code-path tree, identifying all classes that can be used and all methods that can be called. Most method calls in Java necessarily invoke one of a limited set of methods, and by analyzing the runtime path, you can eliminate all but one of the possibilities. The compiler can then remove unused methods and classes, including removing superclass methods that have been overridden in a subclass and are never called in the superclass. The optimization makes for smaller download requirements for programs sent across a network and, more usefully, reduces the impact of method lookup at runtime by eliminating unused alternative methods.

3.8.2.2 Increase statically bound calls

An optimizing compiler can determine at compile time whether particular method invocations are necessarily polymorphic and so must have the actual method target determined at runtime, or whether the target for a particular method call can be identified at compile time. Many method calls that apparently need to have the target decided at runtime can, in fact, be uniquely identified (see the previous section). Once identified, the method invocation can be compiled as a static invocation, which is faster than a dynamic lookup. Static methods are statically bound in Java. The following example produces "in superclass" if method1( ) and method2( ) are static, but "in subclass" if method1( ) and method2( ) are not static:

public class Superclass {
  public static void main(String[  ] args) {(new Subclass(  )).method1(  );}
  public static void method1(  ){method2(  );}
  public static void method2(  ){System.out.println("in superclass ");}
}
class Subclass extends Superclass {
  public static void method2(  ){System.out.println("in subclass ");}
}

3.8.2.3 Cut dead code and unnecessary instructions, including checks for null

Section 14.9 of the Java specification requires compilers to carry out flow analysis on the code to determine the reachability of any section of code. The only valid unreachable code is the consequence of an if statement (see the later section Section 3.9.1.4). Invalid unreachable code must be flagged as a compile error, but the valid code from an if statement is not a compile error and can be eliminated. The if statement test can also be eliminated if the boolean result is conclusively identified at compile time. In fact, this is a standard capability of almost all current compilers.

This flow analysis can be extended to determine if other sections and code branches that are syntactically valid are actually semantically unreachable. A typical example is testing for null. Some null tests can be eliminated by establishing that the variable has either definitely been assigned to or definitely never been assigned to before the test is reached. Similarly, some bytecode instructions that can be generated may be unnecessary after establishing the flow of control, and these can also be eliminated.

3.8.2.4 Use computationally cheaper alternatives (strength reduction)

An optimizing compiler should determine if there is a computationally cheaper alternative to a set of instructions and replace those slower instructions with the faster alternative.

The classic version of this technique, termed "strength reduction," replaces an operation with an equivalent but faster operation. Consider the following lines of code:

x = x + 5;
y = x/2;
z = x * 4;

These lines can be replaced by faster operations without altering the meaning of any statements:

x += 5;     //Assignment in place is faster
y = x >> 1; //each right shift by one place is equivalent to dividing by 2
z = x << 2; //each left shift by one place is equivalent to multiplying by 2

These examples are the most common cases of strength reduction. All the shorthand arithmetic operators (++ , --, +=, -=, *=, /=, |= , &=) are computationally faster than the longer versions and should be used (by the coder) or replaced (by the compiler) where appropriate.^[7]

^[7] One of the technical reviewers for this book, Ethan Henry, pointed out to me that there is no actual guarantee that these strength reductions are more efficient in Java. This is true. However, they seem to work for at least some VMs. In addition, compilers producing native code (including JIT compilers) should produce faster code, as these techniques do work at the machine-code level.

3.8.2.5 Replace runtime computations with compiled results

An optimizing compiler can identify code that requires runtime execution if bytecodes are directly generated, but can be replaced by computing the result of that code during the compilation phase. The result can then replace the code.

This technique is applied by most compilers for the simple case of literal folding (see the later sections Section 3.9.1.1 and Section 3.9.1.2). And it can be extended to other structures by adding some semantic input to the compiler. A simple example is:

String S_NINETY = "90";
int I_NINETY = Integer.parseInt(S_NINETY);

Although it is unlikely that anyone would do exactly this, similar kinds of initializations are used. An optimizing compiler that understood what Integer.parseInt( ) did could calculate the resulting int value and insert that result directly into the compiled file, thus avoiding the runtime calculation.

3.8.2.6 Remove unused fields

Analysis of the application can identify fields of objects that are never used, and these fields can then be removed. This makes the runtime take less memory and improves the speeds of both the object creation and the garbage collection of these objects. The type of analysis described in the earlier section Section 3.8.2.1 improves the identification of unused fields.

3.8.2.7 Remove unnecessary parts of compiled files

Removing some unnecessary parts of compiled files is standard with most optimizing compilers. This option removes line number tables and local variable tables. The Java .class file structure allows extra information to be inserted, and some optimizing compilers make an effort to remove everything that is not necessary for runtime execution. This can be useful when it is important to minimize the size of the .class files. Frequently, compilers with this capability can remove unnecessary parts of files that are already compiled, e.g., from third-party .class files you do not have the source for.

3.8.2.8 Reduce necessary parts of compiled files

Some optimizing compilers can reduce the necessary parts of compiled files. For example, the .class file includes a pool of constants, and an optimizing compiler can minimize the size of the constant pool by combining and reducing entries.

3.8.2.9 Alter access control to speed up invocations

At least one optimizing compiler (the DashO optimizer by PreEmptive) provides the option to alter the access control to methods. The rationale for this is that any non-public method has access control on that method since it is access restricted (i.e., the runtime system must verify at some point that the caller to a method has access to calling that method). However, public methods require no such runtime checks. The thinking is that any non-public method must have some overhead compared to an identical method declared as public.

The result is that the compiler supports normal compilation (so that any incorrect accesses are caught at the compilation stage), and the subsequent compiled class can have all its methods changed to public. This is, of course, a security risk.

3.8.2.10 Inline calls

Every optimizing compiler supports inlining. However, the degree of inlining supported can vary enormously, as different compilers are more or less aggressive about inlining (see the extended discussion in the later section Section 3.9.2). Inlining is the technique in which a method call is directly replaced with the code for that method; for example, the code as written may be:

private int method1(  ) { return method2(  ); }
private int method2(  ) { return 5; }

With inlining operating to optimize method1( ), this code is compiled into the equivalent of:

//the call to method2(  ) is replaced with the code in method2(  )
private int method1(  ) { return 5; }
private int method2(  ) { return 5; }

3.8.2.11 Remove dynamic type checks

Every compiler removes dynamic type checks when the compiler can establish they are unnecessary. The JDK compiler removes casts that are obviously unnecessary. For example, consider the following two lines of code:

Integer i = new Integer(3);
Integer j = (Integer) i;

The JDK compiler removes the obviously unnecessary cast here, and the code gets compiled as if the source were:

Integer i = new Integer(3);
Integer j = i;

This is very basic. A more sophisticated optimizing compiler can analyze a program far more intensively and eliminate further casting operations that the compiler can ascertain are always true. The instanceof operation is similar to casting (the test applied by instanceof differs from a class cast test in that a cast on null always succeeds, but null instanceof SomeClass always returns false), and an optimizing compiler can also remove some tests involving instanceof.

3.8.2.12 Unroll loops

Loop unrolling makes the loop body larger by explicitly repeating the body statements while changing the amount by which the loop variable increases or decreases. This reduces the number of tests and iterations the loop requires to be completed. This is extensively covered in Chapter 7.

3.8.2.13 Code motion

Code motion moves calculations out of loops that need calculating only once. Consider the code example:

for (int i = 0; i < z.length; i++)
  z[i] = x * Maths.abs(y);

The elements of an array are always being assigned the same value, but the assignment expression is still calculating the value each time. Applying code motion, this code is automatically converted to:

int t1 = x * Maths.abs(y);
for (int i = 0; i < z.length; i++)
  z[i] = t1;

Code motion is also useful in eliminating or reducing redundant tests (though compilers are usually less effective at this). Consider the following method:

public String aMethod(String first, String passed)
{
  StringBuffer copy = new StringBuffer(passed);
  if (first =  = null || first.length(  ) =  = 0)
   return passed;
  else
  {
    ...//some manipulation of the string buffer to do something
    return copy.toString(  );
  }
}

This method creates an unnecessary new object if the first string is null or zero length. This should be recoded or bytecodes should be generated so that the new object creation is moved to the else clause:

public String aMethod(String first, String passed)
{
  if (first =  = null || first.length(  ) =  = 0)
   return passed;
  else
  {
    StringBuffer copy = new StringBuffer(passed);
    ...//some manipulation of the string buffer to do something
    return copy.toString(  );
  }
}

It would be nice, but difficult, for a compiler to apply this automatically, but this type of optimization probably needs to be applied manually. For the compiler to apply this sort of optimization, it needs to know that creating a new StringBuffer has no side effects so that the creation can reasonably be moved to a different part of the code.

Both this technique and the next one are good coding practices.

3.8.2.14 Eliminate common subexpressions

Eliminating common subexpressions is similar to code motion. In this case, though, the compiler identifies an expression that is common to more than one statement and does not need to be calculated more than once. The following example uses the same calculation twice to map two pairs of variables:

z1 = x * Maths.abs(y) + x;
z2 = x * Maths.abs(y) + y;

After a compiler has analyzed this code to eliminate the common subexpression, the code becomes:

int t1 = x * Maths.abs(y);
z1 = t1 + x;
z2 = t1 + y;

3.8.2.15 Eliminate unnecessary assignments

An optimizing compiler should eliminate any unnecessary assignments. The following example is very simplistic:

int x = 1;
x = 2;

This should obviously be converted into one statement:

int x = 2;

Although you won't often see this type of example, it is not unusual for chained constructors to repeatedly assign to an instance variable in essentially the same way. An optimizing compiler should eliminate all extra unnecessary assignments.

3.8.2.16 Rename classes, fields, and methods

Some compilers rename classes, fields, and methods for various reasons, such as to obfuscate the code (making it difficult to understand if it were decompiled). Renaming (especially to one-character names^[8]) can make everything compiled much smaller, significantly reducing classloading times and network download times.

^[8] For example, the DashO optimizer renames everything possible to one-character names.

3.8.2.17 Reorder or change bytecodes

An optimizing compiler can reorder or change bytecode instructions to make methods faster. Normally, this reduces the number of instructions, but sometimes making an activity faster requires increasing the number of instructions. An example is where a switch statement is used with a list of unordered, nearly consecutive values for case statements. An optimizing compiler can reorder the case statements so that the values are in order, insert extra cases to make the values fully consecutive, and then use a faster switch bytecode to execute the switch statement. The optimization for switch statements is extensively covered in Chapter 7.

3.8.2.18 Generate information to help a VM

The Java bytecode specification provides support for optional extra information to be included with class files. This information can be VM-specific: any VM that does not understand the codes must ignore them. Consequently, it is possible that a particular compiler may be optimized (in the future) to generate extra information that allows particular VMs to run code faster. For example, it would be possible for the compiler to add extra information that tells the VM the optimal way in which a JIT should compile the code, thus removing some of the JIT workload (and overhead).

A more extreme example might be where a compiler generates optimized native code for several CPUs in addition to the bytecode for methods in the class file. This would allow a VM to execute the native code immediately if it were running on one of the supported CPUs. Unfortunately, this particular example would cause a security loophole, as there would be no guarantee to the VM that the natively compiled method was the equivalent of the bytecode-generated one.

3.8.3 Managing Compilers

All the optimizations previously listed are optimizations that compilers should automatically handle. Unfortunately, you are not guaranteed that any particular compiler actually applies any single optimization. The only way I have found to be certain about the optimizations a particular compiler can make is to compile code with lines such as those shown previously, then decompile the bytecodes to see what comes out. There are several decompilers available on the Net: a web search for "java+decompile" should fetch a few. My personal favorite at the time of this writing is jad by Pavel Kouznetsov, which currently resides at http://kpdus.tripod.com/jad.html.

Several Java compilers are targeted at optimizing bytecode, and several other compilers (including all mainstream ones) have announced the intention to roll more optimizations into future versions. This highlights another point: ensure that you use the compiler's latest version. It may be that, for robustness, you do not want to go into production with the very latest compiler, as that will have been less tested than an older version, and your own code will have been more thoroughly tested on the classes generated by the older compiler. Nevertheless, you should at least test whether the latest compiler gives your application a boost (using whatever standard benchmarks you choose to assess your application's performance).

Finally, the compiler you select to generate bytecode may not be the same compiler you use while developing code. You may even have different compilers for different parts of development and even for different optimizations (though this is unlikely). In any case, you need to be sure the deployed application is using the bytecodes generated by the specific compilers you have chosen for the final version. At times in large projects, I have seen some classes recompiled with the wrong compiler. This has occasionally resulted in some of these classes finding their way to the deployed version of the application.

This alternate recompilation does not affect the correctness of the application because all compilers should be generating correct bytecodes, which means that such a situation allows the application to pass all regression test suites. But you can end up with the production application not running as fast as you expect for reasons that are difficult to track down.