A latent semantic approach to XML clustering by content and structure based on non-negative matrix factorization
Conference Paper
Publication Date:
2013
abstract:
Non-negative matrix factorization is intensively used in text clustering. We investigate its exploitation in the XML domain for clustering XML documents by structure and content into topically homogeneous groups. Non-negative matrix factorization is performed through an alternating least squares method, which incorporates expedients to attenuate the burden of large-scale factorizations. This is especially relevant when massive text-centric XML corpora are processed. Empirical evidence from a comparative evaluation on real-world XML corpora reveals that our approach overcomes several state-of-the-art competitors in effectiveness. © 2013 IEEE.
Iris type:
04.01 Contributo in Atti di convegno
List of contributors: