3.9 Sun's Compiler and Runtime Optimizations

As you can see from the previous sections, knowing how the compiler alters your code as it generates bytecodes is important for performance tuning. Some compiler optimizations can be canceled out if you write your code so that the compiler cannot apply its optimizations. In this section, I cover what you need to know to get the most out of the compilation stage if you are using the JDK compiler (javac).

3.9.1 Optimizations You Get for Free

Several optimizations occur at the compilation stage without your needing to specify any compilation options. These optimizations are not necessarily required because of specifications laid down in Java. Instead, they have become standard compiler optimizations. The JDK compiler always applies them, and consequently almost every other compiler applies them as well. You should always determine exactly what your specific compiler optimizes as standard, from the documentation provided or by decompiling example code.

3.9.1.1 Literal constants are folded

This optimization is a concrete implementation of the ideas discussed in Section 3.8.2.5 earlier. In this implementation, multiple literal constants^[9] in an expression are "folded" by the compiler. For example, in the following statement:

^[9] Literals are data items that can be identified as numbers, double-quoted strings, and characters, for example, 3, 44.5e-22F, 0xffee, "h", "hello", etc.

int foo = 9*10;

the 9*10 is evaluated to 90 before compilation is completed. The result is as if the line read:

int foo = 90;

This optimization allows you to make your code more readable without having to worry about avoiding runtime overhead.

3.9.1.2 String concatenation is sometimes folded

With the Java 2 compiler, string concatenations to literal constants are folded. The line:

String foo = "hi Joe " + (9*10);

is compiled as if it read:

String foo = "hi Joe 90";

This optimization is not applied with JDK compilers prior to JDK 1.2. Some non-Sun compilers apply this optimization and some don't. The optimization applies where the statement can be resolved into literal constants concatenated with a literal string using the + concatenation operator. This optimization also applies to concatenation of two strings. In this last case, all compilers fold two (or more) strings since that action is required by the Java specification.

3.9.1.3 Constant fields are inlined

Primitive constant fields (those primitive data type fields defined with the final modifier) are inlined within a class and across classes, regardless of whether the classes are compiled in the same pass. For example, if class A has a public static final field, and class B has a reference to this field, the value from class A is inserted directly into class B, rather than a reference to the field in class A. Strictly speaking, this is not an optimization, as the Java specification requires constant fields to be inlined. Nevertheless, you can take advantage of it.

For instance, if class A is defined as:

public class A
{
  public static final int VALUE = 33;
}

and class B is defined as:

public class B
{
  static int VALUE2 = A.VALUE;
}

When class B is compiled, whether or not in a compilation pass of its own, it actually ends up as if it were defined as:

public class B
{
  static int VALUE2 = 33;
}

with no reference left to class A.

3.9.1.4 Dead code branches are eliminated

Another type of optimization automatically applied at the compilation stage is to cut code that can never be reached because of a test in an if statement that can be completely resolved at compile time. The discussion in the earlier section Section 3.8.2.3 is relevant to this section.

As an example, suppose classes A and B are defined (in separate files) as:

public class A
{
  public static final boolean DEBUG = false;
}
  
public class B
{
  static int foo(  )
  {
    if (A.DEBUG)
      System.out.println("In B.foo(  )");
    return 55;  
  }
}

Then when class B is compiled, whether or not on a compilation pass of its own, it actually ends up as if it were defined as:

public class B
{
  static int foo(  )
  {
    return 55;  
  }
}

No reference is left to class A, and no if statement is left. The consequence of this feature is to allow conditional compilation. Other classes can set a DEBUG constant in their own class the same way, or they can use a shared constant value (as class B used A.DEBUG in the earlier definition).

A problem is frequently encountered with this kind of code. The constant value is set when the class with the constant, say class A, is compiled. Any other class referring to class A's constant takes the value that is currently set when that class is being compiled, and does not reset the value if A is recompiled. So you can have the situation where A is compiled with A.DEBUG set to false, then B is compiled and the compiler inlines A.DEBUG as false, possibly cutting dead code branches. Then if A is recompiled to set A.DEBUG to true, this does not affect class B; the compiled class B still has the value false inlined, and any dead code branches stay eliminated until class B is recompiled. You should be aware of this possible problem if you compile your classes in more than one pass.

You should use this pattern for debug and trace statements and assertion preconditions, postconditions, and invariants. There is more detail on this technique in Section 6.1.4 in Chapter 6.

3.9.2 Optimizations Performed When Using the -O Option

The only standard compile-time option that can improve performance with the JDK compiler is the -O option. Note that -O (for Optimize) is a common option for compilers, and further optimizing options for other compilers often take the form -O1, -O2, etc. Check your compiler's documentation to find out what other options are available and what they do. Some compilers allow you to make the tradeoff between optimizing the compiled code for speed or minimizing the size.

The standard -O option does not currently apply a variety of optimizations in the Sun JDK (up to JDK 1.4). In future versions it may do more, though the trend has actually been for it to do less. Currently, the option makes the compiler eliminate optional tables in the class files, such as line number and local variable tables. This gives only a small performance improvement by making class files smaller and therefore faster to load. You should definitely use this option if your class files are sent across a network.

The main performance improvement of using the -O option used to come from the compiler inlining methods. When using the -O option with javac prior to SDK 1.3, the compiler considered inlining methods defined with any of the following modifiers: private, static, or final. Some methods, such as those defined as synchronized, are never inlined. If a method can be inlined, the compiler decides whether or not to inline it depending on its own unpublished considerations. These considerations seem mainly to be the simplicity of the method: in JDK 1.2 the compiler inlined only fairly simple methods. For example, one-line methods with no side effects, such as accessing or updating a variable, are invariably inlined. Methods that return just a constant are also inlined. Multiline methods are inlined if the compiler determines they are simple enough (e.g., a System.out.println("blah") followed by a return statement would get inlined). From 1.3, the -O option does not even inline methods. Instead, inlining is left to the HotSpot compiler, which can speculatively inline and is far more aggressive. The sidebar Why There Are Limits on Static Inlining discusses one of the reasons why optimizations such as inlining have been pushed back to the HotSpot compiler.

Why There Are Limits on Static Inlining

The compiler can inline only those methods that can be statically bound at compile time. To see why, consider the following example of class A and its subclass B, with two methods defined, foo1( ) and foo2( ). The foo2( ) method is overridden in the subclass:

class A {
  public int foo1(  ) {return foo2(  );}
  public int foo2(  ) {return 5;}
}
public class B extends A {
  public int foo2(  ) {return 10;}
}

If A.foo2( ) is inlined into A.foo1( ), (new B( )).foo1( ) incorrectly returns 5 instead of 10 because A is compiled incorrectly as if it read:

class A {
  public int foo1(  ) {return 5;}
  public int foo2(  ) {return 5;}
}

Any method that can be overridden at runtime cannot be validly inlined (it is a potential bug if it is). The Java specification states that final methods can be non-final at runtime. That is, you can compile a set of classes with one class having a final method, but later recompile that class without the method as final (thus allowing subclasses to override it), and the other classes must run correctly. For this reason, not all final methods can be identified as statically bound at compile time, so not all final methods can be inlined. Some earlier compiler versions incorrectly inlined some final methods, sometimes causing serious bugs.

Choosing simple methods to inline does have a rationale behind it. The larger the method being inlined, the more the code gets bloated with copies of the same code inserted in many places. This has runtime costs in extra code being loaded and extra space taken by the runtime system. A JIT VM would also have the extra cost of compiling more code. At some point, there is a decrease in performance from inlining too much code. In addition, some methods have side effects that can make them quite difficult to inline correctly. All this also applies to runtime JIT compilation.

The static compiler applies its methodology for selecting methods to inline, irrespective of whether the target method is in a bottleneck: this is a machine-gun strategy of many little optimizations in the hope that some inline calls may improve the bottlenecks. A performance tuner applying inlining works the other way around, first finding the bottlenecks, then selectively inlining methods inside bottlenecks. This latter strategy can result in good speedups, especially in loop bottlenecks. This is because a loop can be speeded up significantly by removing the overhead of a repeated method call. If the method to be inlined is complex, you can often factor out parts of the method so that those parts can be executed outside the loop, gaining even more speedup. HotSpot applies the latter rationale to inlining code only in bottlenecks.

I have not found any public document that specifies the actual decision-making process that determines whether or not a method is inlined, whether by static compilation or by the HotSpot compiler. The only reference given is to Section 13.4.21 of the Java language specification that specifies only that binary compatibility with preexisting binaries must be maintained. It does specify that the package must be guaranteed to be kept together for the compiler to allow inlining across classes. The specification also states that the final keyword does not imply that a method can be inlined since the runtime system may have a differently implemented method. The HotSpot documentation does state that simple methods are inlined, but again no real details are provided.

Prior to JDK 1.2, the -O option used with the Sun compiler did inline methods across classes, even if they were not compiled in the same compilation pass. This behavior led to bugs.^[10] From JDK 1.2, the -O option no longer inlines methods across classes, even if they are compiled in the same compilation pass.

^[10] Primarily methods that accessed private or protected variables were incorrectly inlined into other classes, leading to runtime authorization exceptions.

Unfortunately, there is no way to specify directly which methods should be inlined rather than relying on some compiler's internal workings. Possibly in the future, some compiler vendors will provide a mechanism that supports specifying which methods to inline, along with other preprocessor options. In the meantime, you can implement a preprocessor (or use an existing one) if you require tighter control. Opportunities for inlining often occur inside bottlenecks (especially in loops), as discussed previously. Selective inlining by hand can give an order-of-magnitude speedup for some bottlenecks, and no speedup at all in others. Relying on HotSpot to detect these kinds of situations is an option.

The speedup obtained purely from inlining is usually only a small percentage: 5% is fairly common. Some static optimizing compilers are very aggressive about inlining code. They apply techniques such as analyzing the entire program to alter and eliminate method calls in order to identify methods that can be coerced into being statically bound. Then these identified methods are inlined as much as possible according to the compiler's analysis. This technique has been shown to give a 50% speedup to some applications.

3.9.3 Performance Effects From Runtime Options

Some runtime options can help your application to run faster. These include:

Options that allow the VM to have a bigger footprint (-Xmx/-mx is the main one, which allows a larger heap space; but see the comments in the following paragraph).
-noverify, which eliminates the overhead of verifying classes at classload time (not available from 1.2).

Some options are detrimental to the application performance. These include:

The -Xrunhprof option, which makes applications run 10% to 1000% slower (-prof in 1.1).
Removing the JIT compiler (done with -Djava.compiler=NONE in JDK 1.2 and beyond, and with the -nojit option in 1.1).
-debug, which runs a slower VM with debugging enabled.
The various alternative garbage-collection strategies like -Xincgc and -Xconcgc are aimed at minimizing some aspect (pause times for these two), but the consequence is that total GC is slower.

Some options can be both detrimental to performance and help make a faster application, depending on how they are used. These include:

-Xcomp, which forces HotSpot to compile 100% of the code with maximum optimization. This makes the first pass through the code very slow indeed, but subsequent passes should be faster.
-Xbatch, which forces HotSpot to compile methods in the foreground. Normally methods are compiled in the foreground if they compile quickly. Compilation is moved to the background if it is taking too long (the method carries on executing in interpreted mode until the compilation is finished). This makes the first execution of methods slower, but subsequent executions can be faster if compilation would not have otherwise finished.

Increasing the maximum heap size beyond the default usually improves performance for applications that can use the extra space. However, there is a tradeoff in higher space-management costs to the VM (object table access, garbage collections, etc.), and at some point there is no longer any benefit in increasing the maximum heap size. Increasing the heap size actually causes garbage collection to take longer since it needs to examine more objects and a larger space. Up to now, I have found no better method than trial and error to determine optimal maximum heap sizes for any particular application. This is covered in more detail earlier in this chapter.

Beware of accidentally using VM options detrimental to performance. I once had a customer who had a sudden 40% decrease in performance during tests. Their performance harness had a configuration file that set up how the VM could be run, and this was accidentally set to include the -prof option on the standard tests as well as for the profiling tests. That was the cause of the sudden performance decrease, but it was not discovered until time had been wasted checking software versions, system configurations, and other things.