Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Utility-theoretic ranking for semiautomated text classification

Articolo
Data di Pubblicazione:
2015
Abstract:
Semiautomated Text Classification (SATC) may be defined as the task of ranking a set D of automatically labelled textual documents in such a way that, if a human annotator validates (i.e., inspects and corrects where appropriate) the documents in a top-ranked portion of D with the goal of increasing the overall labelling accuracy of D, the expected increase is maximized. An obvious SATC strategy is to rank D so that the documents that the classifier has labelled with the lowest confidence are top ranked. In this work, we show that this strategy is suboptimal. We develop new utility-theoretic ranking methods based on the notion of validation gain, defined as the improvement in classification effectiveness that would derive by validating a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially validating a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method mentioned earlier, and according to the proposed measure, our utility-theoretic ranking methods can achieve substantially higher expected reductions in classification error.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Semiautomatd classification
Elenco autori:
Berardi, Giacomo; Esuli, Andrea; Sebastiani, Fabrizio
Autori di Ateneo:
ESULI ANDREA
SEBASTIANI FABRIZIO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/289676
Link al Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/289676/94222/prod_332904-doc_156911.pdf
Pubblicato in:
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
Journal
  • Dati Generali

Dati Generali

URL

http://dl.acm.org/citation.cfm?doid=2808688.2742548
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)