17.5 Persistent Entities and JDO

In the J2EE environment, you have a choice of using native file I/O, serialization, JDBC, entity beans, session beans, or JDO persistent classes as the implementation strategy for persistence of your application object model (persistent entities). In many cases, you can use more than one strategy in the same application.

File I/O and serialization based on files are not robust or scalable enough for application-server use beyond trivial storage of a simple class state, and we will not describe these options further. The choice between the other strategies depends on your requirements for the persistence abstraction.

17.5.1 Local Persistent Storage

Using JDBC or JDO directly allows your application to store entities using a local-persistence interface with minimum security and transaction-association options. That is, the security context of the caller of each business method governs access to the resources, and the transaction context of the caller is the transaction context of all the calls made to the local-persistence interface. In our example implementation of CashierBean, the transaction and security checks are performed only when the container receives an invocation on checkout( ) and calls your application code.

The local-persistence alternatives do not allow transparent execution of the implementation methods in different tiers of the architecture. All calls are local and use resource managers in the same JVM as the caller.

17.5.1.1 JDO

We have already seen how using JDO as your implementation strategy allows you to use your application-domain object model directly, including features such as inheritance, polymorphic relationships, dynamic queries, and modeling List and Map types. And we have already discussed in detail the requirements of the EJB components that use JDO directly to implement business methods.

17.5.1.2 JDBC

JDBC gives you the most flexibility to customize database access and the most work to do. With JDBC, you implement every JDBC call to create, read, update, and delete instances in the datastore. Thus, you can handcraft the model and the datastore accesses to use all features of the datastore, including generation of primary keys, extensions to SQL, datastore-specific types, and stored procedures.

But this flexibility comes at a significant cost. Much of the code you write is repetitive and error-prone. The server cannot help you by caching data, because it doesn't know the data-access patterns of your application.

You might reasonably choose to use JDBC in some specific part of your application that has requirements that are not satisfied by other alternatives. For example, JDO doesn't provide for UNION or GROUP BY functions available in SQL. You can implement queries that need these features by coding the queries in SQL and using JDBC as the connection vehicle to the database.

To implement our CashierBean using JDBC, the first task is to understand the entity-relationship model implemented in the relational database. The most interesting part of the model involves the relationships between the Customer, MediaContent, Movie, Game, RentalItem, Transaction, Rental, and Purchase entities. Since JDBC does not support inheritance, in order for your application to access any of the classes modeled as subclasses, you need to code the appropriate joins into the SQL code used for the queries, deletes, updates, and inserts.

An equally important part of the modeling task involves defining the type mapping between the SQL types and the Java types. Most primitive types are easy to map, but others are deceptively difficult. Strings might have as many as four natural mappings in a vendor's implementation of SQL, depending on the access patterns and the maximum length of the string. For example, CHAR, VARCHAR, VARCHAR2, or CLOB might be the best column-type representation for a string.

Another task is to map the database accesses into native SQL. The number of SQL statements that you need to code can be estimated by multiplying the number of persistent classes by four or more, and adding the number of business queries. Typically, you need at least four SQL statements per class:

SELECT columns for specific rows from the table.
INSERT a row into the table. For subclasses, this might be multiple INSERT statements, depending on how the inheritance is modeled.
DELETE a row from the table.
UPDATE some columns in certain rows.

Without going into much more detail, creating the SQL statements and corresponding result analysis for each class in your application domain is repetitive and error-prone. Many application programmers faced with a reasonably complex domain model try to write a tool to help with this part of the programming. Unfortunately, the result of the tool typically must be adjusted and optimized by hand, and the resulting production classes are not easily reused in different applications.

17.5.2 Remote Persistent Storage

Your domain-model entities may have requirements that cannot be satisfied by direct access to local persistent classes or JDBC. These requirements include:

Location independence: The location of the datastore might be different from the location of the calling business method. This might be a factor in the scalability of the system, since adding new server resources might require splitting the access of some datastores across servers. Defining access to certain entities as possibly remote gives more flexibility in the system design.
Transaction association per method: When defining the domain model, you might want to define different transaction contexts for different methods of persistent classes.
Security association per method: When defining the domain model, you might have different security requirements for different methods of persistent classes.

17.5.2.1 Entity beans

Entity beans are used for modeling large-scale persistent instances that have a natural (intrinsic) identity and are accessed via business methods. Entity beans have a lifecycle mandated by the EJB specification. The lifecycle governs whether the bean has a persistent state associated with it and whether the state might need to be synchronized with the datastore.

Entity beans use a pattern in which information from persistent storage is accessed from the datastore, cached in the bean, and stored back in the datastore under the direction of the container. The cached data is identified by a key, and the key can be used to access the bean from local or remote clients.

In terms of complexity, entity beans present a more difficult challenge to the container than stateless session beans do, but less difficult than with stateful session beans. Entity beans have a state that has to be managed, but since the state is not associated with a specific user, the container can use pooling techniques to maximize reuse of the beans for different transactions. Because of the difficulty of managing the state efficiently, most container implementations offer a range of tuning options for entity beans far beyond the options available for session beans.

Implementing the lifecycle of a bean-managed persistence (BMP) entity bean is a complex task for the bean developer. For each required method, you need to know whether there is an identity (primary key) associated with the bean, whether there is already a resource manager associated with the bean, and how to represent relationships to other entity beans. Even though the lifecycle of the bean is defined elaborately in the EJB specification, container vendors have chosen quite different strategies to optimize performance, and some of the lifecycle events are implemented differently by different containers. These differences are important if you want to optimize the performance of your bean.

For example, the lifecycle defines ejbLoad( ) to indicate that the state of the bean should be loaded from persistent storage. And ejbStore( ) indicates that the state of the bean should be stored into persistent storage. But there is no lifecycle method to indicate that the transaction context of the bean is changing. And the container does not indicate whether the bean's state has changed, and therefore whether the state really needs to be stored.

Additionally, the container doesn't indicate to the bean developer why ejbStore( ) is called. It might be to flush the cache so that query results are consistent, or it might be the last flush before transaction end. The absence of information makes it impossible for the bean developer to implement load/store optimizations.

Another example is the definition of the bean context for finder methods. In the bean's implementation of ejbFindByPrimaryKey( ), the bean contract requires that the developer establish whether or not the bean exists in the database, which requires a database query to execute successfully. An implementation might want to retrieve other information (e.g., state) from the database as long as a query is required. However, there is no way in the defined lifecycle to cache the information retrieved by the existence query. Therefore, it is difficult to eliminate the extra query.

Once you understand the strategy of entity-bean development, the complexity of the code is somewhat predictable and therefore lends itself to code generation. This is why we recommend that if you choose to use entity beans to implement your persistent object model, you should use container-managed persistence (CMP) entity beans instead of writing your own BMP entity beans.

When using CMP beans, you need to implement more methods and deployment descriptors than you need with session beans, but fewer compared to BMP beans. And while CMP beans offer significant portability of the code and deployment descriptors you write, there is no standard to describe the mapping between CMP beans and the corresponding datastore persistent-data description.

To implement our CashierBean using CMP beans as delegates, the first task is to understand the entity-relationship model implemented in the relational database. As with JDBC, the most interesting part of the model involves the relationships between the Customer, MediaContent, Movie, Game, RentalItem, Transaction, Rental, and Purchase entities. Since CMP beans do not directly support the polymorphic relationships inherent in this object model, you need to change the object model to remove these relationships.

CMP beans provide for type mapping, so you don't need to hand-code the transformations as you do in JDBC. The container provides mapping tools that allow you to declare the association between cmp-fields and database columns. The container handles the type conversions for you.

When using CMP beans with session beans, the application-assembly and deployment processes become more complex. For each CMP bean used by the session bean, the deployment descriptor must identify the bean's home and local and/or remote interfaces. The initialization of the session bean itself in the setSessionContext( ) method must look up and save references to the home interfaces for all beans that need to be accessed by finder methods.

17.5.2.2 Session beans as façades

When you have a requirement that cannot be implemented by a local persistent class directly, often you can model an entity bean's semantics by a stateless session bean façade that itself delegates to a JDO business delegate or data access object. In this model, each business method in the remote interface identifies not only the operation to be performed, but also the identity of the object upon which to perform it.

Using this pattern provides all the benefits of EJB components, with a small amount of extra work (compared to using JDO directly). You can use this pattern to implement inheritance that maps directly to JDO inheritance and polymorphism.

To use this pattern, analyze each method in the JDO persistent class and decide the category to which it belongs:

Private methods: These should not be exposed to outside callers, as they might cause inconsistent state changes if not performed as part of a larger operation. For example, city, state, and ZIP code should be updated together in the same business method, although the individual set methods can be implemented as private methods. The method that updates all three fields can be exposed as a local or remote instance method.
Local instance methods: These change the state of the instance in some trivial way or retrieve some trivial information. For example, getName( ) and setName( ) should be exposed only as local instance methods.
Remote instance methods: These change the state of the instance in a large-scale way or retrieve a substantial amount of information from the instance. You should use value objects as parameters to these methods.
Local static methods: These usually are defined in the persistent class as static and operate on a number of instances, instead of just one. For example, query methods that find one or more instances and return them to the caller operate on the extent of instances in the datastore. Other methods might take a collection of instances as a parameter and perform a similar operation on each of them.
Remote static methods: These have characteristics similar to local static methods. They include methods that operate on multiple instances, but they exclude methods that simply find instances.

Define the remote interface to the session bean façade, if needed, to include all remote instance methods and remote static methods of the persistent class. Declare each method to throw a RemoteException. Modify each instance method to add an extra parameter that is the JDO identity instance of the instance to which it applies.

Define the local interface to the session bean, if needed, to include all local instance methods and local static methods of the persistent class. Modify each instance method to add an extra parameter that is the JDO indentity instance of the instance to which it applies.

Implement each session-bean method that models a persistent-class instance method to obtain the PersistenceManager, obtain the persistent instance via a call to getObjectById( ), and delegate to the persistent-class instance method. Wrap the entire method in a try-catch block. For remote methods, if an exception is caught, throw a RemoteException with the caught exception as a nested exception.

Implement each session-bean method that models a persistent-class static method to obtain the PersistenceManager and delegate to the persistent class method. Wrap the entire method in a try-catch block. For remote methods, if an exception is caught, throw a RemoteException with the caught exception's toString( ) as part of the message text.

Modify methods that return references to persistent instances to return String instead, and in the session-bean method body, translate the return instance by calling getObjectId( ).toString( ). Similarly, modify methods that take persistent instances as parameters to take String instead, and look up the persistent instance in the method body by calling newObjectIdInstance( ) and getObjectById( ).

17.5.2.3 JDO or CMP?

Both CMP beans and JDO persistent classes have features that you should consider before committing your project to use either strategy.

JDO persistent classes are suitable for modeling both coarse-grained and fine-grained persistent instances and in an application server are typically used behind session beans. CMP beans are typically used behind session beans; their remote behavior is seldom exploited.

JDO persistent classes can be used without recompilation in any tier of a distributed architecture and can be debugged in a one- or two-tier environment prior to integration into a web or application server. CMP beans can be debugged only after deployment into the application server.

Unlike servlets, JSP pages, and EJB components, there is no built-in remote behavior with JDO classes. All of the distributed, transaction, and security policies are based on the single persistence manager that manages all of the persistent instances of your model. This means that JDO persistent classes can be used in any tier of a distributed application and remote behavior is implemented by the container, not the JDO implementation.

CMP beans give you a high degree of portability across application servers. The bean class and required deployment descriptor are standard. Most of the incompatibilities between implementations are found in unspecified areas of mapping beans to the underlying datastore, optional features such as read-only beans, and extensions in deployment and management of beans. JDO implementations vary with regard to the optional features that they support.

With CMP, you identify every bean class, persistent field, and persistent relationship in the deployment descriptor. Using JDO, you identify every persistent class in the metadata, but you can usually take the default for the persistence of fields, including relationships.

With CMP, relationships are managed; this means that during the transaction a change to one side of the relationship immediately affects the other side, and the change is visible to the application. JDO does not support managed relationships, although some vendors offer them as optional features.

Inheritance is a common paradigm for modeling real-world data, but CMP beans do not support inheritance. CMP makes a distinction between the implementation class and the bean. The abstract bean-implementation classes and the local and remote interfaces can form inheritance relationships, but the CMP beans that model the application's persistent classes cannot. Relationships in CMP are between CMP beans, not implementation classes, and these relationships cannot be polymorphic. In our example, it would be impossible for a MediaItem CMP bean to have a relationship with a MediaContent CMP bean, because MediaContent has no instances. In order to implement this kind of model, you would need to change the MediaItem CMP bean to have two different relationships: one between MediaItem and Movie, and another between MediaItem and Game. You would need to treat the relationships separately in every aspect of the bean.

The programming model used to access fields is very different between CMP beans and JDO. With CMP beans, all persistent fields and relationships are defined by abstract get and set methods in the abstract bean class, plus a declaration in the deployment descriptor. Access to the field value is the responsibility of the concrete implementation class generated by the CMP code-generation tool. With JDO, persistent fields and relationships are declared or defaulted in the metadata, and access to the field values is provided by the code in the class for transient instances or by the JDO implementation for persistent instances. The JDO enhancer generates the appropriate field-access code during the enhancement process.

JDOQL and EJBQL provide similar access to data in the datastore. Both allow you to select persistent instances from the datastore to use in your programs. Both use the read-modify-write pattern for updating persistent data. Neither language is a complete data-manipulation language; both are used only to select instances for manipulation by the programming language.

CMP beans require active transactions for all business methods. Nontransactional access is not standard or portable. JDO allows you to choose whether transactions are required. JDO requires inserts, deletes, and updates to be performed within transactions, but read-only applications, including caching, can be implemented portably without transactions.

Table 17-1 is a summary comparing CMP beans with JDO persistent classes.

Table 17-1. Comparison of CMP beans and JDO
Characteristic	CMP beans	JDO persistent classes
Environmental
Portability of applications	Few portability unknowns	Documented portability rules
Operating environment	Application server	One-tier, two-tier, web server, application server
Independence of persistent classes from environment	Low: beans must implement EJB interfaces and execute in server container	High: persistent classes are usable with no special interface requirements and execute in many environments
Metadata
Mark persistent classes	Deployment descriptor identifies all persistent classes	Metadata identifies all persistent classes
Mark persistent fields	Deployment descriptor identifies all persistent fields and relationships	Metadata defaults persistent fields and relationships
Modeling
Domain-class modeling object	CMP bean (abstract schema)	Persistent class
Inheritance of domain-class modeling objects	Not supported	Fully supported
Field access	Abstract get/set methods	Any valid field access, including get/set methods
`Collection`, `Set`	Supported	Supported
`List`, `Array`, `Map`	Not supported	Optional features
Relationships	Expressed as references to CMP local interfaces	Expressed as references to JDO persistent classes or interfaces
Polymorphic references	Not supported	Supported
Programming
Query language	EJBQL modeled after SQL	JDOQL modeled after Java Boolean expressions
Remote method invocation	Supported	Not supported
Required lifecycle methods	`setEntityContext`, `unsetEntityContext`, `ejbActivate`, `ejbPassivate`, `ejbLoad`, `ejbStore`, `ejbRemove`	no-arg constructor (may be private)
Optional lifecycle callback methods	`ejbCreate`, `ejbPostCreate`, `ejbFind`	`jdoPostLoad`, `jdoPreStore`, `jdoPreClear`, `jdoPreDelete`
Mapping to relationaldatastores	Vendor-specific	Vendor-specific
Method security policy	Supported	Not supported
Method transaction policy	Supported	Not supported
Nontransactional access	Not standard	Supported
Required classes/interfaces	`EJBLocalHome`, local interface (if local interface supported); `EJBHome`, remote interface (if remote interface supported); Abstract beans must implement `EJBEntityBean`; Identity class (if nonprimitiveidentity)	Persistent class; `objectid` class (only forapplication identity)
Transaction synchronization callbacks	Not supported	Supported