17.8 Case Study: Ace's Hardware SPECmine Tool

At the end of December 2001, Ace's Hardware published a report[5] on how they optimized their SPECmine tool. The procedure they followed to achieve very fast response time is instructive.

[5] The full report is available from http://www.aceshardware.com/read.jsp?id=45000251.

The SPECmine tool itself is a JSP that allows a user to query the SPEC database of benchmarks (http://www.spec.org/). The query page (at http://www.aceshardware.com/SPECmine/index.jsp) allows the user to specify all the parameters for the query, including how to sort the results. The query is so efficient that most of the transaction time is taken by network communication and browser page display.

The first issue was the database data. The SPEC database is accessible in a number of different ways, but none provides the full set of data required by the SPECmine tool. In addition, some data items needed cleaning. Querying the SPEC database each time the SPECmine tool was used would have required multiple connections, data transformations, and parsed and cleaned data. Holding the data locally was an obvious solution, but more than that, holding the data locally in a format optimal for the SPECmine tool query was the best solution. This required the SPEC database to be checked periodically for new entries. New entries must be cleaned and transformed for the SPECmine database. To clean and transform the data, parses and regular expression conversions were changed to table maps, which are easier to maintain, cleaner, and faster. The advantages are enormous:

  • Data was now held locally, so the SPECmine query was local rather than remote (across the Internet).

  • Data was held in an optimal format for the SPECmine query so that only one query was required to obtain the query result, rather than multiple queries together with data processing

  • New SPEC entries could be added to the SPECmine database asynchronously, at off-peak time, with no performance degradation to the SPECmine query engine.

The only disadvantage was that the SPECmine tool would occasionally be out of synch with the SPEC database; i.e., the SPEC database would occasionally hold data that was not available from a SPECmine query. This is perfectly acceptable for the application, and the user was warned of this pitfall. The delay between SPEC data entry and SPECmine update could be minimized by increasing the frequency of checking for new data, if this option was ever desired.

Next, the database query itself was considered. The amount of data in the SPECmine database (and the projected future amount of data) was quite small: megabytes rather than hundreds of megabytes or gigabytes. Consequently, mapping the entire database into memory was feasible. Furthermore, rather than simply map in the data directly, Ace's Hardware decided to convert the data into a Java object format when mapping it in instead of converting data for each query. The result was a very fast in-memory query for the SPECmine tool, requiring minimal extra processing when a query was executed. The main disadvantage was that the application was now more complex than it was with a traditional JDBC query: custom querying and sorting capabilities were required. Locking, data integrity, and scaling would have become issues had the database been larger (or had it required concurrent updates). In that case, the in-memory custom solution would have been less practical, and in-memory caching would be used instead (and used for other sections of the web site).

Further optimizations were then applied to the servlet. There were two main types of optimizations: optimizing query requests with precalculation and reducing String manipulation costs. The precalculation optimizations are interesting. One optimization presorts the result set into various orders. You can do this in a small amount of memory by holding an array of sorted elements, with each element pointing to its main entry holding the full data corresponding to that element. Filtering the presorted array for the elements matching the search criteria gives you a sorted result set.

Another optimization used the fact that a list of strings presented to the user in a list selection box can return indexes to the servlet instead of returning the selected strings. This means that you can use the indexes in an array, rather than the strings, as keys to a Map. For SPECmine, the indexes were used with a boolean array to determine which strings were "on" in the search filter.

The remaining String manipulation optimizations eliminated duplicate String objects, avoided unnecessary String concatenations, and precalculated HTML String elements that do not need to be dynamically generated. The final optimization applied was the GZIP-compression support outlined earlier in this chapter. The application's speed was such that the search itself was the fastest part of the service, HTML generation took significantly more time, and compression, network transfer, and browser display took most of the total time.

The original report also discussed other parts of the web site, including optimizing parts of the site that need disk-based databases. Ace's Hardware goes into the overall architecture of their JSP-based web site in more detail at http://www.aceshardware.com/read.jsp?id=45000240. The site serves about 1 million users per month and displays ten times as many pages, illustrating that high performance can be achieved using servlets and JSPs, without excessive resources or tuning.