Data di Pubblicazione:
2004
Abstract:
We propose a novel methodology for clustering XML documents on the
basis of their structural similarities. The basic idea is to equip
each cluster with an \emph{XML cluster representative}, i.e. an XML
document subsuming the most typical structural specifics of a set of
XML documents. Clustering is essentially accomplished by comparing
cluster representatives, and updating the representatives as soon as
new clusters are detected. We propose an algorithm for computing an
XML representative through three phases. Suitable techniques for identifying
significant node matchings and for reliably merging and pruning XML
trees are investigated. Also, experimental evaluation performed on
both synthetic and real data shows the effectiveness of our approach.
Tipologia CRIS:
01.01 Articolo in rivista
Link alla scheda completa: