Data di Pubblicazione:
2018
Abstract:
Part-of-speech (POS) tagging is a Natural Language Processing (NLP) technique extremely relevant in Question Answering systems and becomes more complex when these systems operate on spoken language. For the use case of Italian spoken language, here considered, enclitic forms are very difficult to be tagged, since they consist of one or more pronouns appended as suffixes to verbs. This work describes a case study aiming at investigating how to refine SyntaxNet, the NLP framework released by Google, to efficiently tag enclitic forms in Italian. In particular, first, a forward selection of different features is presented, aimed to assess their influence on POS tagging performance of SyntaxNet in Italian. Second, further features are added, as suggested by morphological rules characterizing Italian enclitics, in order to improve POS tagging performance. Finally, a qualitative and quantitative evaluation with respect to sentences coming from real spoken dialogs is performed, showing very promising results.
Tipologia CRIS:
02.01 Contributo in volume (Capitolo o Saggio)
Keywords:
pos-tagging; Italian Language; Natural Language Processing
Elenco autori:
Guarasci, Raffaele
Link alla scheda completa:
Titolo del libro:
Advances on P2P, Parallel, Grid, Cloud and Internet Computing