Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Strutture

Tuning SyntaxNet for POS Tagging Italian Sentences

Contributo in Atti di convegno
Data di Pubblicazione:
2017
Abstract:
SyntaxNet is the NLP framework released by Google in 2016, claimed by its authors as the most accurate dependency parser over across 40 languages beyond English. It relies on a transition-based model implementing POS tagger and dependency parser modules. SyntaxNet is provided with source code, so it can be trained and configured differently from the pre-trained models already provided. In this work, we present a case study aiming at investigating how to refine Google SyntaxNet NLP framework for the Italian language. In particular, we describe a procedure for tuning the native SyntaxNet model, to address some shortcomings evidenced during preliminary tests. We mainly acted by customizing the original model for Italian POS tagging task by exploiting a particularly interesting dataset for training, and by testing a number of network configurations, different from the original one released by Google. In detail, different sets of features are included, starting from the simplest possible configuration, by employing a forward selection approach. A discussion, comparing our results with the SyntaxNet current state of the art, is provided, thus evidencing how network performances are influenced by different feature types. Finally, some tests are performed by further changing network settings, in order to search how to avoid shortcomings of the original implementation, for a potential deployment in real-time applications.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Machine Learning; NLP; SyntaxNet; POS tagging; Cognitive Computing; Neural Networks.
Elenco autori:
Esposito, Massimo; Pota, Marco
Autori di Ateneo:
ESPOSITO MASSIMO
POTA MARCO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/332110
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)