15.5 Toward Flexible and Open Systems

  Previous section   Next section

The features of XDM open the way toward a new generation of flexible and open systems for data mining and knowledge discovery. In this section, we give a brief discussion of the set of functionalities that a system based on XDM should provide.

  • Flexibility: XDM is defined on top of XML, the eXtensible Markup Language. The aim of XDM is to exploit the extensibility characteristics provided by XML. Hence, an XDM-based system should be able to deal with any kind of data and pattern representation, provided that it is described as an XML fragment. Furthermore, the fact that XML provides a representation for semi-structured data means that an XDM-based system can easily deal with complex formats and not flat formats such as trees.

  • Extensibility: An XDM-based system should not be tailored on any data-mining algorithm or problem. In contrast, it should be open, in order to be extended with any data-mining operator and allowing its implementation, provided that the operator and its implementation comply with the API interface provided by the system. In particular, we think that an XML Schema definition for the operator and for the data items generated by the operator should be provided. In fact this can be useful in order to extend the system with a new operator and its implementation.

  • Incremental computations: An XDM-based system should provide support for incremental computations. In fact, since XDM data items represent both data and the statement that generated them, it is possible to trace the knowledge discovery process performed by the user. This fact may allow us to set up an incremental computation mechanism that recomputes derived data items when source data items are updated. This might be done in an efficient way if an incremental implementation of the operators is provided, or in a naive way simply recomputing from scratch each derived data item (in this case, a background task might be activated in order to perform recomputations when the system is not loaded with heavy computations).

  • User interaction: The user should be provided with a user interface that exploits XDM peculiarities. This means that the interface should provide clever support for navigating inside the set of XDM data items, by showing the computation trace, or by giving the possibility of showing, in different windows, several XDM data items. But a feature that the user interface of an XDM-based system should certainly provide is the automatic suggestion of new statements. This idea can be easily clarified if we consider the EVALUATE-RULE operator: A statement based on this operator makes sense only if the evaluation of association rules is coherent with the MINE-RULE statement that generated the association rule set, in terms of the rule schema and grouping features. Once the user selected the XDM data item containing the rule set to evaluate, the user interface might automatically complete the statement taking, for example, the rule schema specification from the MINE-RULE statement.


Part IV: Applications of XML