Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Funnelling: A New Ensemble Method for Heterogeneous Transfer Learning and its Application to Cross-Lingual Text Classification

Articolo
Data di Pubblicazione:
2019
Abstract:
Cross-lingual Text Classification(CLC) consists of automatically classifying, according to a common setCofclasses, documents each written in one of a set of languagesL, and doing so more accurately than when"naïvely" classifying each document via its corresponding language-specific classifier. In order to obtain anincrease in the classification accuracy for a given language, the system thus needs to also leverage the trainingexamples written in the other languages. We tackle "multilabel" CLC viafunnelling, a new ensemble learningmethod that we propose here. Funnelling consists of generating a two-tier classification system where alldocuments, irrespectively of language, are classified by the same (2nd-tier) classifier. For this classifier alldocuments are represented in a common, language-independent feature space consisting of the posteriorprobabilities generated by 1st-tier, language-dependent classifiers. This allows the classification of all testdocuments, of any language, to benefit from the information present in all training documents, of any language.We present substantial experiments, run on publicly available multilingual text collections, in which funnellingis shown to significantly outperform a number of state-of-the-art baselines. All code and datasets (in vectorform) are made publicly available.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
E-discovery; Technology-Assisted Review; Utility Theory; Semi-automated Text Classification
Elenco autori:
Esuli, Andrea; MOREO FERNANDEZ, ALEJANDRO DAVID; Sebastiani, Fabrizio
Autori di Ateneo:
ESULI ANDREA
MOREO FERNANDEZ ALEJANDRO DAVID
SEBASTIANI FABRIZIO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/360765
Link al Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/360765/23367/prod_403485-doc_159212.pdf
Pubblicato in:
ACM TRANSACTIONS ON INFORMATION SYSTEMS
Journal
  • Dati Generali

Dati Generali

URL

https://dl.acm.org/doi/abs/10.1145/3326065
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)