Publication Date:
2015
abstract:
We study XML partitioning via unsupervised topic modeling. A new mixed-membership Bayesian generative model of the latent topics in XML corpora is proposed. Approximate posterior inference and parameter estimation are derived for the devised XML topic model and implemented by a Gibbs sampling algorithm. This is used to infer
the topic distributions of the input XML documents. In turn, such distributions are separated to divide the whole XML corpus by latent-topic similarity. Experiments on
real-world XML corpora reveal an overcoming effectiveness with respect to several state-of-the-art competitors.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
XML Clustering; Generative XML Topic Modeling
List of contributors: