The main measurement is always wall-clock time. You should use this measurement to specify almost all benchmarks, as it's the real-time interval that is most appreciated by the user. (There are certain situations, however, in which system throughput might be considered more important than the wall-clock time, e.g., for servers, enterprise transaction systems, and batch or background systems.)
The obvious way to measure wall-clock time is to get a timestamp using System.currentTimeMillis( ) and then subtract this from a later timestamp to determine the elapsed time. This works well for elapsed time measurements that are not short. Other types of measurements have to be system-specific and often application-specific. You can measure:
 System.currentTimeMillis( ) can take up to half a millisecond to execute. Any measurement including the two calls needed to measure the time difference should be over an interval greater than 100 milliseconds to ensure that the cost of the System.currentTimeMillis( ) calls are less than 1% of the total measurement. I generally recommend that you do not make more than one time measurement (i.e., two calls to System.currentTimeMillis( )) per second.
CPU time (the time allocated on the CPU for a particular procedure)
The number of runnable processes waiting for the CPU (this gives you an idea of CPU contention)
Paging of processes
Disk scanning times
Network traffic, throughput, and latency
Other system values
However, Java doesn't provide mechanisms for measuring these values directly, and measuring them requires at least some system knowledge, and usually some application-specific knowledge (e.g., what is a transaction for your application?).
You need to be careful when running tests with small differences in timings. The first test is usually slightly slower than any other tests. Try doubling the test run so that each test is run twice within the VM (e.g., rename main( ) to maintest( ), and call maintest( ) twice from a new main( )).
There are almost always small variations between test runs, so always use averages to measure differences and consider whether those differences are relevant by calculating the variance in the results.
For distributed applications , you need to break down measurements into times spent on each component, times spent preparing data for transfer and from transfer (e.g., marshalling and unmarshalling objects and writing to and reading from a buffer), and times spent in network transfer. Each separate machine used on the networked system needs to be monitored during the test if any system parameters are to be included in the measurements. Timestamps must be synchronized across the system (this can be done by measuring offsets from one reference machine at the beginning of tests). Taking measurements consistently from distributed systems can be challenging, and it is often easier to focus on one machine, or one communication layer, at a time. This is usually sufficient for most tuning.