Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

Utility-theoretic ranking for semiautomated text classification

Academic Article
Publication Date:
2015
abstract:
Semiautomated Text Classification (SATC) may be defined as the task of ranking a set D of automatically labelled textual documents in such a way that, if a human annotator validates (i.e., inspects and corrects where appropriate) the documents in a top-ranked portion of D with the goal of increasing the overall labelling accuracy of D, the expected increase is maximized. An obvious SATC strategy is to rank D so that the documents that the classifier has labelled with the lowest confidence are top ranked. In this work, we show that this strategy is suboptimal. We develop new utility-theoretic ranking methods based on the notion of validation gain, defined as the improvement in classification effectiveness that would derive by validating a given automatically labelled document. We also propose a new effectiveness measure for SATC-oriented ranking methods, based on the expected reduction in classification error brought about by partially validating a list generated by a given ranking method. We report the results of experiments showing that, with respect to the baseline method mentioned earlier, and according to the proposed measure, our utility-theoretic ranking methods can achieve substantially higher expected reductions in classification error.
Iris type:
01.01 Articolo in rivista
Keywords:
Semiautomatd classification
List of contributors:
Berardi, Giacomo; Esuli, Andrea; Sebastiani, Fabrizio
Authors of the University:
ESULI ANDREA
SEBASTIANI FABRIZIO
Handle:
https://iris.cnr.it/handle/20.500.14243/289676
Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/289676/94222/prod_332904-doc_156911.pdf
Published in:
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA
Journal
  • Overview

Overview

URL

http://dl.acm.org/citation.cfm?doid=2808688.2742548
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)