Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Document clustering meets topic modeling with word embeddings

Contributo in Atti di convegno
Data di Pubblicazione:
2020
Abstract:
We propose a new statistical-learning approach to marrying topic modeling and document clustering. In particular, a Bayesian generative model of text collections is developed, in which the two foresaid tasks are incorporated as coupled latent factors, that govern document wording. The latter consists of word embeddings, so as to capture the semantic and syntactic regularities among words. Collapsed Gibbs sampling is derived mathematically and implemented algorithmically, along with parameter estimation, with the aim to jointly perform topic modeling and document clustering through Bayesian reasoning. Comparative tests on benchmark real-world corpora reveal the effectiveness of the devised approach in clustering collections of text documents and coherently recovering their semantics.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Bayesian Text Analysis; Document Clustering; Topic Modeling; Word Embeddings
Elenco autori:
Ortale, Riccardo; Costa, Giovanni
Autori di Ateneo:
COSTA GIOVANNI
ORTALE RICCARDO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/381065
  • Dati Generali

Dati Generali

URL

http://www.scopus.com/record/display.url?eid=2-s2.0-85085732531&origin=inward
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)