For that extra zing in your application (but probably not applet), try out calls to native code. Wave goodbye to 100% pure Java certification, and say hello to added complexity in your development environment and deployment procedure. (If you are already in this situation for reasons other than performance tuning, there is little overhead to taking this route in your project.)
I've seen native method calls used for performance reasons in earlier Java versions when doing intensive number-crunching for a scientific application and parsing large amounts of data in a restricted time. In these and other cases, the runtime application environment at the time could not get to the required speed using Java. I should note that a parsing application would now be able to run fast enough in pure Java, but the original application was built with quite an early version. In addition, some number crunchers find that the latest Java runtimes and optimizing compilers give them sufficient performance in Java without resorting to any native calls.[11]
[11] Serious number crunchers spend a large proportion of their time performance-tuning their code, whatever language it is written in. To gain sufficient performance in Java, they of course need to tune the application intensively. But this is also true if the application is written in C or Fortran. The amount of tuning required is now, apparently, similar for these three languages. Further information can be found at http://www.javagrande.org.
The JNI interface itself has its own overhead, which means that if a pure Java implementation comes close to the native call performance, the JNI overhead probably cancels any performance advantages from the native call. However, on occasion the underlying system can provide an optimized native call that is not available from Java and cannot be implemented to work as fast in pure Java. In this kind of situation, JNI is useful for tuning.
Another case in which JNI can be useful is reducing the number of objects created, though this should be less common: you should normally be able to do this directly in Java. I once encountered a situation where JNI was needed to avoid excessive objects. This was with an application that originally required the use of a native DLL service. The vendor of that DLL ported the service to Java, which the application developers would have preferred using, but unfortunately the vendor neglected to tune the ported code. This resulted in a native call to a particular set of services producing just a couple dozen objects, but the Java-ported code producing nearly 10,000 objects. Apart from this difference, the speeds of the two implementations were similar.[12] However, the overhead in garbage collection caused a significant degradation in performance, which meant that the native call to the DLL was the preferred option.
[12] This increase in object creation normally results in a much slower implementation. However, in this particular case, the methods required synchronizing to a degree that gave a larger overhead than the object creation. Nevertheless, the much larger number of objects created by the untuned Java implementation needed reclaiming at some point, and this led to greater overhead in the garbage collection.
If you are following the native function call route, there is little to say. You write your routines in C, plug them into your application using the native keyword, profile the resultant application, and confirm that it provides the required speedup. You can also use C (or C++ or whatever) profilers to profile the native code calls if it is complicated.
Other than this, only a few recommendations apply:
If you are calling the native routines from loops, you should move the loops down into the native routines and pass the loop parameters to the routine as arguments. This usually produces faster implementations.
In a similar but more generic vein, try to avoid crossing the JNI.
Avoid passing objects across JNI if possible. Where necessary, try to pass primitive types. If it is necessary to pass objects such as arrays, try to do as much data movement as possible in one transfer to minimize transfer overhead.
From 1.4, native ByteBuffer s (available with the java.nio packages) allow you to pass data to native libraries without necessarily passing the data through the JNI, which can be a significant gain. You can allocate a native ByteBuffer in the C code and pass the pointer through the JNI, avoiding the JNI data transfer overhead. (At least one animation application has actually allocated memory on the graphics card as a native ByteBuffer, and manipulated that ByteBuffer from the Java side.)
If you use JNI Get calls (e.g., GetStringCritical), you must always use the corresponding Release call (e.g., ReleaseStringCritical) when you have finished with the data, even if the isCopy parameter indicates that no copy was taken.
One other recommendation, which is not performance tuning-specific, is that it is usually good practice to provide a fallback methodology for situations when the native code cannot be loaded. This requires extra maintenance (two sets of code, extra fallback code) but is often worth the effort. You can manage the fallback at the time when the DLL library is being loaded by catching the exception when the load fails and providing an alternative path to the fallback code, either by setting boolean switches or by instantiating objects of the appropriate fallback classes as required.