Mining Clusters in XML Corpora based on Bayesian Generative Topic Modeling
Contributo in Atti di convegno
Data di Pubblicazione:
2015
Abstract:
We study XML partitioning via unsupervised topic modeling. A new mixed-membership Bayesian generative model of the latent topics in XML corpora is proposed. Approximate posterior inference and parameter estimation are derived for the devised XML topic model and implemented by a Gibbs sampling algorithm. This is used to infer
the topic distributions of the input XML documents. In turn, such distributions are separated to divide the whole XML corpus by latent-topic similarity. Experiments on
real-world XML corpora reveal an overcoming effectiveness with respect to several state-of-the-art competitors.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
XML Clustering; Generative XML Topic Modeling
Elenco autori:
Ortale, Riccardo; Costa, Giovanni
Link alla scheda completa: