15.7 Conclusion

In this chapter we presented a new data model, named XDM, based on XML and designed to be adopted inside the framework of inductive databases. We have presented a set of data-mining operations on data (classification, association rules extraction, and evaluation) and shown how these operations can be described and obtained using XDM to represent data and patterns.

The new model presents several positive features. It allows the contemporaneous representation of raw data and pattern data inside the inductive databases. It allows also the representation of several heterogeneous typologies of patterns, such as trees and association rules. Thanks to the semi-structured nature of the data that can be represented by XML, XDM also allows the management of semi-structured and complex patterns. Furthermore, XDM explicitly represents the pattern definition?that is, the pattern derivation process, in order to keep track of the knowledge discovery process from which the patterns are generated. In XDM pattern definition is represented together with data. This allows the reuse of patterns by the inductive database management system and the efficient incremental computation of new patterns. The latter is an important feature of XDM that helps to overcome one of the limits of inductive databases based on the relational model?that is, they do not keep track of the pattern derivation process in the pattern definition.

The new model also allows more flexibility and generality in the representation of the conceptual tools that are used during the knowledge discovery process (classification trees, enumeration sets, etc.). Finally, the flexibility of the XDM representation allows extensibility to new pattern models and new mining operators, provided that the models are represented in XML and the implementation of the operators are compliant to an API interface provided by the XDM database management system. This gives to the framework the characteristic to be an open system, easily customized by the analyst.

One drawback of the use of XML in data mining, however, could be the large volumes reached by the source data represented in XML (due to the addition of markup tags and attributes).

The future work with XDM consists of studying the formal properties that may be obtained considering the database state and the transitions performed by the derivation processes. This theory should provide the foundation to deal with the problem of the incremental computation of the database state. Moreover, it will be certainly necessary to deliver an implementation of an XDM-based system and of the operators that we have mentioned in this chapter (classification, MINE-RULE, and EVALUATE-RULE operator). This would allow us to obtain actual figures of performance evaluation of a KDD process in the inductive database framework based on the XDM data model.

Top