Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

Document clustering meets topic modeling with word embeddings

Conference Paper
Publication Date:
2020
abstract:
We propose a new statistical-learning approach to marrying topic modeling and document clustering. In particular, a Bayesian generative model of text collections is developed, in which the two foresaid tasks are incorporated as coupled latent factors, that govern document wording. The latter consists of word embeddings, so as to capture the semantic and syntactic regularities among words. Collapsed Gibbs sampling is derived mathematically and implemented algorithmically, along with parameter estimation, with the aim to jointly perform topic modeling and document clustering through Bayesian reasoning. Comparative tests on benchmark real-world corpora reveal the effectiveness of the devised approach in clustering collections of text documents and coherently recovering their semantics.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Bayesian Text Analysis; Document Clustering; Topic Modeling; Word Embeddings
List of contributors:
Ortale, Riccardo; Costa, Giovanni
Authors of the University:
COSTA GIOVANNI
ORTALE RICCARDO
Handle:
https://iris.cnr.it/handle/20.500.14243/381065
  • Overview

Overview

URL

http://www.scopus.com/record/display.url?eid=2-s2.0-85085732531&origin=inward
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)