Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Integrated use of KOS and deep learning for data set annotation in tourism domain

Articolo
Data di Pubblicazione:
2023
Abstract:
Purpose The purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system. Design/methodology/approach A method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision. Findings The study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation. Originality/value The paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
KOS; Named entity recognition; Annotation; Distant supervision; Information extraction; Active learning
Elenco autori:
Aracri, Giovanna; Silvestri, Stefano
Autori di Ateneo:
ARACRI GIOVANNA
SILVESTRI STEFANO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/435534
Pubblicato in:
JOURNAL OF DOCUMENTATION
Journal
  • Dati Generali

Dati Generali

URL

https://www.emerald.com/insight/content/doi/10.1108/JD-02-2023-0019/full/html
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)