Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Academic Article
Publication Date:
2018
abstract:
A new method is proposed for clustering XML documents by structure-constrained phrases. It is implemented by three machine-learning approaches previously unexplored in the XML domain, namely non-negative matrix (tri-)factorization, co-clustering and automatic transactional clustering. A novel class of XML features approximately captures structure-constrained phrases as n-grams contextualized by root-to-leaf paths. Experiments over real-world benchmark XML corpora show that the effectiveness of the three approaches improves with contextualized n-grams of suitable length. This confirms the validity of the devised method from multiple clustering perspectives. Two approaches overcome in effectiveness several state-of-the-art competitors. The scalability of the three approaches is investigated, too.
Iris type:
01.01 Articolo in rivista
Keywords:
XML; Semi-structured data analysis; XML (co-)clustering by structure and nested text; Structure-constrained phrases; Contextualized n-grams
List of contributors:
Ortale, Riccardo; Costa, Giovanni
Authors of the University:
COSTA GIOVANNI
ORTALE RICCARDO
Handle:
https://iris.cnr.it/handle/20.500.14243/334254
Published in:
INFORMATION RETRIEVAL (BOSTON)
Journal
  • Overview

Overview

URL

http://www.scopus.com/record/display.url?eid=2-s2.0-85026813867&origin=inward
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)