12.11 Web Services

Web Services is yet another distributed computing architecture. As such, all of the general guidelines for efficient client/server systems from previous sections also apply to improving the performance of Web Services.

Table 12-2 lists the equivalent standards for Web Services, CORBA, and Java RMI.

Table 12-2. Equivalent standards for Web Services, CORBA, and Java RMI
Web Services	CORBA	RMI
Simple Object Access Protocol (SOAP)	Remote procedure calling (IIOP)	Remote method invocation (JRMP)
Universal Description, Discovery, and Integration (UDDI)	ORB Name Service plus an IDL data repository	JNDI plus all remote interfaces
Web Services Description Language (WSDL)	CORBA Interface Definition Language (IDL)	None needed (not language-independent)

The simplicity of the Web Services model has both advantages and disadvantages for performance (see Table 12-3). Web Services is too simple for many distributed application requirements. The many additional features in CORBA and RMI are not whimsical; they are there in response to recognized needs. This implies that as these needs are transferred to Web Services, the Web Services standards will evolve to support additional functionality. From a performance point of view, this is problematic. Typically, the more functionality that is added to the standard, the worse performance becomes because the architecture needs to handle more and more options. So consider the performance impact of each function added to the Web Services standards.

Table 12-3. Performance advantages and disadvantages of Web Services
Feature	Advantage	Disadvantage
No distributed garbage collection	Reduces communication overhead and resource management otherwise required to keep track of connected objects and signal reclaimable objects.	Objects have to time out (which means they are around longer than necessary) or are created for each request (which means they are created more often than necessary).
Transactions are not directly supported	Transactional overhead can be one of the highest costs of short, distributed communications, equivalent to the network communication latency. No transaction support improves performance.	If transactions are required, they have to be built on top of Web Services, which means they will be less efficient than transactions supported within Web Services.
Uses HTTP, a stateless protocol	Stateless protocols scale much better than stateful protocols, as the success of the Web proves.	Stateful requests are far more common. Only very simple services can be stateless. State must be maintained in the server, complicating server processes and making them less efficient, or be transferred with every request, increasing the communication cost.
Uses XML for the communication format	Communications can be compressed.	Data bloat means that communication overhead is increased and marshalling/unmarshalling costs are large.
No built-in security	No security overhead makes for faster performance. Security can be efficiently added by wrapping the Web Services server interface with an authentication layer.	None really, as long as security is easy to add when required.

12.11.1 Measuring Web Services Performance

As I write this, there is a market opportunity for Web Services profiling and measurement tools. You can use web measurement tools, such as load-testing tools and web-server monitoring tools, but these provide only the most basic statistics for Web Services, and are not normally sufficient to determine where bottlenecks lie. For developers, this means that you cannot easily obtain a Web Services profiling tool, and consequently breaking down end-to-end performance of a Web Service and finding bottlenecks may be challenging. Currently the best way to measure the component parts of Web Services seems to be to explicitly add logging points (see, for example, Steve Souza's Java Application Monitor at http://www.JavaPerformanceTuning.com/tools/jamon/index.shtml). The major Web Services component times to measure are the time taken by the server service, the time taken by the server marshalling, the time taken by the client marshalling, and the time taken to transport the message. Ideally you would like to measure times:

from the client starting the Web Service call
to when the SOAP message creation starts
to when the SOAP message creation ends (from 2, this is client marshalling time)
to when the transport layer starts sending the message
to when the server completes reception of the raw message (from 4, this is client-to-server transport time)
to when the message starts being decoded
to when the message finishes being decoded (from 6, this is server unmarshalling time)
to when the server method starts executing
to when the server method finishes executing (from 8, this is the time taken to execute the service)
to when the return SOAP message creation starts
to when the return SOAP message creation ends (from 10, this is server marshalling time)
to when the transport layer starts sending the message
to when the client completes reception of the raw message (from 12, this is server-to-client transport time)
to when the message starts being decoded
to when the message finishes being decoded (from 14, this is client unmarshalling time)
to when the client finishes the Web Service call

It is important (but difficult to determine) the time taken in marshalling and unmarshalling and the time taken for network transportation, so that you know where to focus your tuning effort. Of course, if you are worried only about the Web Service itself and you have arbitrary Web Service clients connecting to your service, as is the expected scenario, then you are interested in points 4 to 13. Note that I include these points because the client perception of your service is affected not only by how long the server takes to process it but also by any delays in the server receiving the message, and because the time taken to receive the message depends on the size of the returned message. Specifically, if the TCP data has arrived at the server (or starts to arrive at the server if it requires several TCP packets) but the server does not start reading because it is busy, this service wait time is an overhead that adds to the time taken to service the request. In the same way, the larger the size of the returned data, the more time it may take to be assembled on the client side before unmarshalling can begin, which again adds overhead to the total service time.

In practice, what tends to get measured is either the full round-trip time (client to server and back) with no breakdown, or only the server-side method call. But there are a number of different ways to infer some of the intermediate measurements. The following sections detail various ways to directly measure or infer some Web Service request times.

12.11.1.1 Measuring server-side method execution time

Server-side method execution is the simplest measurement to take. Simply wrap the original method with a timer. For example, if the server method is getBlah(params), then rename it to _getBlah(params) and implement getBlah(params) as:

public whatever getBlah(params){
  Thread t;
  Log.start(t = Thread.currentThread(  ),"getBlah");
  whatever returnValue = getBlah(params);
  Log.end(t, "getBlah");
  return returnValue;
}

12.11.1.2 Measuring the full round-trip time

To measure the full round-trip time, employ the wrapping technique that we just described, but this time, in the client.

12.11.1.3 Inferring round-trip overhead

To infer round-trip overhead, simply measure the time taken to execute a call to an "echo" Web Service, i.e., the Web Service implemented as:

public String echo(String val) {
  return val;
}

12.11.1.4 Inferring network communication time

You can infer the combined time taken to transfer the data to and from the server by executing the Web Service in two configurations: across the network, and with both client and server executing on the local machine. Be sure to use the numeric IP address in both cases to specify the service (i.e., 10.20.21.22 rather than myservice.myhost.mycomp.com) to eliminate DNS lookup costs. Note that since this is likely to be communication over the Internet, you can measure only average times or daily profile times. You should repeat the measurements many times and either take the average or generate a profile of transport times at different times of the day.

12.11.1.5 Inferring DNS lookup time

To find out how long DNS lookups are taking, compare times using the numeric IP address with time found using the name for the service (i.e., using 10.20.21.22 versus using myservice.myhost.mycomp.com). DNS lookup time can vary depending on network congestion and DNS server availability, so averages are helpful.

12.11.1.6 Inferring marshalling time

From the previous measurements, you can subtract network communication time, DNS time, and server-side method execution time from the total round-trip time to obtain the remaining overhead time, which includes marshalling and other actions such as object resolution, proxy method invocation, etc. The majority of this overhead time is expected to come from marshalling.

If your Web Service is layered behind a web server that runs a Java servlet, you can add logging to the web server layer in the doGet( ) and doPost( ) methods. Since these servlet methods are called before any marshalling is performed, they provide more direct measurements of marshalling and unmarshalling times.

In addition to measuring individual calls, you should also load-test the Web Service, testing it as if multiple, separate clients were making requests. It is not difficult to create a client to run multiple requests to the Web Service, but there are also free load-testing utilities that you can use, such as Load (available from http://www.pushtotest.com).

Web Services Versus CORBA

Web Services provides a simple, language-independent client/server communication model. In a sense, this means that Web Services is an alternative to CORBA, which strives for a similar language-independent distributed architecture. At the core, this is true, but Web Services standards target a simpler type of architecture and are already more widely accepted and used. Table 12-2 shows how some of the standards map between Web Services, CORBA, and RMI (note that RMI is not language-independent, so it is not really equivalent to the other two technologies).

A more comprehensive comparison between these technologies as well as DCOM can be found in the article "Web Services and Distributed Component Platforms" in the Web Services Journal, Issue 3, Volume 1 (available at http://www.sys-con.com/webservices/article.cfm?id=110).

12.11.2 High-Performance Web Services

It is worth emphasizing that the previous sections of this chapter, as well as other chapters in this book, also apply to performance-tuning Web Services. As with all distributed computing, caching is especially important and should be applied to data and metadata such as WSDL (Web Services Description Language) files. The generation and parsing of XML is a Web Service overhead that you should try to minimize by using specialized XML processors. Additionally, a few techniques are particularly effective for high-performance Web Services:

Service granularity
Load balancing
Asynchronous processing

These techniques are discussed in the following sections.

12.11.2.1 Service granularity

If you read the "Message Reduction" section, it should come as no surprise that Web Service methods should have a large granularity. A Web Service should provide monolithic methods that do as much work as possible rather than many methods that perform small services. The intention is to reduce the number of client/server requests required to satisfy the client's requirements. For example, the classic example of a Web Service is providing the current share price of a company quoted on a stock exchange:

public interface IStockQuoteService {
  public String getQuote(String exchangeSymbol);
  public String getSymbol(String companyName);
}

Amusingly, this "classic" example is bad; it is too fine-grained for optimal efficiency. If you wanted to create a Web Service that provides share price quotes, you are far better off providing a service that can return multiple quotes in one request, as it is likely that anyone requesting one share price would also want others. Here is a more efficient interface:

public interface IStockQuoteService {
  public String[  ] getQuotes(String[  ] exchangeSymbols);
  public String[  ] getSymbols(String[  ] companyNames);
  public String[  ] getQuotesIfResolved(String[  ] companyNames);
}

Note that there are three changes to this interface. First, as already explained, I have changed the methods to accept and return an array of Strings so that multiple prices for multiple companies can be obtained in one request. Second, I have not retained the previous interfaces that handle only one company at a time. This is a deliberate attempt to influence the thinking of developers using the service. I want developers of clients using this Web Service to immediately think in terms of multiple companies per request so that they build their client more efficiently. As the server Web Services manager, this benefits me twice over: once by influencing clients to be more efficient, ultimately giving my service a better reputation, and again by reducing the number of requests sent to my Web Service. Note that if a client is determined to be inefficient, he can still send one request per company, but at least I've tried my best to influence his thinking.

The third change I've made is to add a new method. The original interface had two methods: one to get quotes using the company symbol and the other to get the company symbol using the company name. In case you are unfamiliar with stock market exchanges, I should explain that a company may have several recognizable names (for example, Big Comp., Big Company, Big Company Inc., The Big Company). The stock exchange assigns one unique symbol to identify the company (for example, BIGC). The getSymbol( ) method provides a mechanism to get the unique symbol from one of the many alternative company names. With only the two methods, if a client has a company name without the symbol, it needs to make two requests to the server to obtain the share price: a request for the unique symbol and a request for the price. By adding a third method that gives a price directly from one of the various valid company names, I've provided the option to reduce requests for those clients that need this service.

Think through the service you provide, and try to design a service that minimizes client requests. Similarly, if you are writing a Web Services client and the service provides alternative ways to get the information you need, use the methods that minimize the number of requests required. Think in terms of individual methods that do a lot of work and return a lot of information rather than the recommended object-oriented methodology of many small methods that each do a little bit and combine to do a lot. Unfortunately, you also need to be aware that if the interface is too complex, developers may use a competing Web Service provider with a simpler (but less efficient) interface that they can more easily understand.

12.11.2.2 Load balancing

The most efficient architecture for maximal scalability is a load-balanced server system. This architecture allows the client to connect to a frontend load balancer, which performs the minimum of activity and whose main job is to pass the request onto one of several backend servers (or cluster of servers) that perform the real work. Load balancing is discussed in more detail in Chapter 10.

Since Web Services already leverages the successful HTTP protocol, you can immediately use a web-server load balancer without altering any other aspect of the Web Service. A typical load-balancing Web Service would have the client connect to a frontend load balancer, which is a proxy web server, and have that load balancer pass on requests to a farm of backend Web Services. The main alternative to this architecture is to use round-robin DNS, where the DNS server supplies a different IP address from a list of servers for each request to resolve a hostname. The client automatically connects to a random server in a farm of replicated Web Services.

A different load-balancing scheme is possible by controlling the WSDL document and sending WSDL containing different binding addresses (that is, different URLs for the Web Service location). In fact, all three of the load-balancing schemes mentioned here can be used simultaneously if necessary to scale the load-balancing and reduce failure points in the system.

Where even load balancing is insufficient to provide the necessary throughput to efficiently handle all Web Service requests, priority levels should be added to Web Service requests. Higher-priority requests should be handled first, leaving lower-priority requests queued until server processing power is available.

12.11.2.3 Asynchronous processing

There are a number of characteristics of Web Services that suggest that asynchronous messaging may be required to use Web Services optimally. HTTP is a best-efforts delivery service. This means that requests can be dropped, typically for network congestion or server overload. The client Web Service will get an error in this situation, but nevertheless needs to handle it and retry.

Traffic on the Internet follows a distinct usage pattern and regularly provides better service at certain times. Web Service usage is likely to follow this pattern, as times of peak congestion are also likely to be peak Web Service usage (unless your service is targeted at an off-peak activity). This means that at peak times the average Web Service gets a double hit of a congested network and a higher number of requests reaching the service.

Many client/server projects over the years have shown that if your application can put up with increased latency, asynchronous messaging maximizes the throughput of the system. Requiring synchronous processing over the Internet is a heavy overhead. Consider that synchronous calls are most likely to fail from congestion when other synchronous calls are also failing. The response for a synchronous protocol, such as TCP, is simply to send more attempts to complete the synchronous call. The repeated attempts only increase congestion, as they occur in addition to all the new synchronous calls that are now starting up.

Consequently, supporting asynchronous requests, especially for large, complicated services, is a good design option. You can do this using an underlying messaging protocol, such as JMS, or independently of the transport protocol using the design of the Web Service. The latter option means that you need to provide an interface that accepts requests and stores the results of processing the request for later retrieval by the client. Similarly, the client of the Web Service should strive to use an asynchronous model where possible.

Finally, some Web Services combine other Web Services in some value-added way to provide what are called aggregation services. Aggregation services should try to retrieve the data they require from other services during off-peak hours in large, coarse-grained requests.