Considerations other than performance frequently drive the choice of application server. That might not be as serious as it sounds, since all the most popular application servers target good performance as an important feature. I often read about projects in which the application server was exchanged for an alternative, with performance cited as a reason. However, these exchanges seem to be balanced: for each project that moved from application server A to application server B, there seems to be another that moved in the reverse direction.
Nevertheless, I would still recommend that application servers be evaluated with performance and scalability as primary criteria. The ECperf benchmark may help differentiate EJB server performance within your short list of application servers. Performance-optimizing features to look for in an application server include:
Application servers should offer multiple caches for session beans, EJBs, JNDI, web pages, and data access. Caching provides the biggest improvement in performance for most enterprise applications.
Load balancing is absolutely necessary to support clustered systems efficiently.
Clustering is necessary for large, high-performance systems.
If one part of the system goes down, a fault-tolerant system suffers performance degradation. However, a system without fault tolerance has no service until the system is restarted.
You can roll your own connection pool, but one should come standard with any application server.
Thread pooling should also be a standard feature. It is necessary to efficiently manage system resources if your application uses hundreds or thousands of threads or serves hundreds or thousands of users.
All subsystems, including RMI, JMS, JDBC drivers, JSP tags, and cacheable page fragments, should be optimized, and the more optimized, the better. Naturally, optimized subsystems provide better performance.
Distributing over VMs provides fault tolerance. The latest VMs with threaded garbage collection may not benefit from this option.
Supported directly by the application server, distributed caching with synchronization lets clustered servers handle sessions without requiring that a particular session always be handled by one particular server, enhancing load balancing.
Optimistic transactions reduce contention for most types of applications, enabling the application to handle more users.
If you need distributed transactions, they are usually handled more efficiently if the application server supports them.
Holding session state information in memory allows clustered servers to handle sessions without requiring that a particular session be handled by one particular server, enhancing load balancing.
Eliminating single points of failure helps fault tolerance. Of course, your application may have its own single points of failure.
You will need to upgrade your application multiple times. Hot-deployment lets you do so with almost no downtime, enhancing 24/7 availability.
A performance-monitoring API is useful if you need to monitor internal statistics, and an application server with a performance-monitoring API is more likely to have third-party products that can monitor it.
More is always better, I say.
A security layer affects response times adversely. Try to dedicate separate application servers to handle secure transactions. Most types of security (SSL, password authentication, security contexts and access lists, and encryption) degrade performance significantly. Many systems use the frontend load balancer to decrypt communications before passing on requests. If using it is feasible, it is worth considering. In any case, try to consider security issues as early as possible in the design.
The gross configuration of the system might involve several different servers: application servers, web servers, and database servers. An optimal configuration runs these servers on different machines so each has its own set of specifically tuned resources. This avoids access conflicts with shared resources.
When this separation is not possible, you need to be very careful about how the servers are configured. You must try to minimize resource conflicts. Allocate separate disks, not just separate partitions, to the various servers. Make sure that the operating-system page cache is on yet another disk. Limit memory requirements so it is not possible for any one server to take an excessive amount of memory. Set process priority levels to appropriately allocate CPU availability (see Chapter 14 for more details).
When request rates increase, you should be able to maintain performance by simply adding more resources (for instance, an extra server). This target requires both a well-designed application and correctly configured application servers. Try load-testing the system at higher scales with an extra application server to see how the configuration requirements change.
Application servers have multiple configuration parameters, and many affect performance: cache sizes, pool sizes, queue sizes, and so on. Some configurations are optimal for read-write beans, and others are for read-only beans, etc. The popular application-server vendors now show how to performance-tune their products (see http://www.JavaPerformanceTuning.com/tips/appservers.shtml). Several application servers also come with optional "performance packs." These may include performance-monitoring tools and optimal configurations, and are worth getting if possible.
The single most important tuneable parameter for an application server is the VM heap size. Chapter 2 and Chapter 3 cover this topic in detail. For long-lived server VMs, memory leaks (or, more accurately, object retention) are particularly important to eliminate. Another strategy is to distribute the application over several server VMs. This distribution spreads the garbage-collection impact, since the various VMs will most likely collect garbage at different times.
Optimal cache and pool sizing are the next set of parameters to target. Caches are optimized by trying to get a good ratio of hits to misses (i.e., when an attempt is made to access an object or data from the cache, the object or data is probably in the cache). Too small a cache can result in useful objects/data being thrown away to make way for new objects/data. Too large a cache uses up more memory than is required, taking that memory away from other parts of the system. Look at the increase in cache-hit rates as memory is increased, and when the rate of increase starts flattening out, the cache is probably at about the right size.
Each pool has its own criteria that identify when it is correctly sized. Well-sized bean pools minimize activation and passivation costs, as well as bean creation and destruction. A well-sized connection pool minimizes the amount of time requests have to wait for an available connection. If the connection pool can vary in size at runtime, the maximum and minimum sizes should minimize the creation and destruction of database connections. For thread pools, too many threads causes too much context switching; too few threads leaves the CPU underutilized and decreases response times because requests get queued.
Other parameters depend on what the application server makes available for tuning. For example, as connections come into the server, they are queued in the network stack "listen" queue. If many client connections are dropped or refused, the TCP listen queue may be too short. However, not all application servers allow you to alter the listen queue size . (See the backlog parameter, the second parameter of the java.net.ServerSocket constructor.)