The goal of this prototype is to implement an eXtensible Markup Language (XML) repository that coexists with a preexisting (object-)relational DBMS schema supporting an enterprise information system. In the context of business information systems, XML is typically described as a solution to the difficulties inherent in inter-systems (and inter-business) communications. As part of any negotiation, two business organizations must agree on a set of contractual obligations: quantities such as widget price, how many widgets, widget delivery schedule, and so on. When such negotiations are conducted face-to-face between representatives of the participating firms, the economic cost of coordinating such exchanges is relatively high. Higher transaction costs tend to push firms towards excess inventory and stocking levels and to financial management practices that require having plenty of cash on hand for "emergencies." One promise of the Internet is that electronic communications will lower the costs of business transactions and day-to-day business operations by reducing the need for face-to-face negotiations.
Consider, for example, a humble can of tomato soup. In the beginning, before there exists a single tomato, tin can, or label, farmers negotiate with representatives of agricultural supply companies for seed, fertilizer, and capital equipment. Having raised a crop of tomatoes, the farmer then negotiates with buyers from canneries on questions of volume, quality, and delivery dates, before the canners negotiate with their suppliers for tins and labels and with managers of retailer stores over wholesale prices, shelf space, and volume.
With perfect knowledge, all of these businesses could avoid excess and wastage but would not miss any business opportunities because a firm underestimated its cash or inventory requirements. Complicating the business communication problem is each firm's desire for independence: A business's problem domain overlaps the problem domains of the organizations adjacent to it in the supply chain, but it is in each firm's interests to retain its autonomy in terms of its management decision making. Business differentiation leads to diversity in the structure and functionality of the computer systems used to record the changing state of an organization and its assets. Consequently, no two businesses and no two business information systems are quite alike.
XML in e-business is therefore likely to be characterized by a very large number (due to lower transaction costs) of small, structured messages (due to the need for clarity in business communication). Even so, each business will need to concern itself with only a subset of the data in a message and will use the relevant subset as input to its own management systems. Also, the firm needs to convey information to other companies using XML. Consequently, the three principal requirements of any persistent storage system regarding XML are
The system must be capable of accepting high volumes of information formatted using XML. Further, the system should provide mechanisms for extracting the relevant information from these messages and storing it according to the (probably preexisting) data model that businesses design to support their own particular requirements.
An XML-enabled repository must also be capable of generating messages as XML from either the preexisting information systems or from the XML data it manages.
Because the original XML messages may contain information that has no place in the business's information systems and yet must be kept as a statement of record, many information systems will also be required to store XML messages for whatever duration the business requires, and to provide the business with the ability to manipulate (query) this message history.
Our prototype system's design priority, therefore, is managing a great many small XML messages: inquiries, quotes, orders, and invoices. These arrive, are stored within the repository, and the contents of these messages can be queried using SQL or XPath. Both XML schema (metadata) and XML document data are organized in a tightly integrated fashion. XPath-read queries span multiple messages (one can think of the repository as a single XML document that is constantly growing), and the system must be capable of executing XPath queries at the same time that new messages are being added to the repository.