Towards a gold standard dataset for Open Information Extraction in Italian

Contributo in Atti di convegno

Data di Pubblicazione:

2019

Abstract:

Although Open Information Extraction (OIE) has emerged in recent years as one of the most suitable techniques for handling the growing volume of textual data, it still has many limitations. The existing approaches are almost exclusively for the English language, and are based on heuristics without a rigorous formalization of the language. Moreover, they do not use a unique dataset for the validation and measurement of their performance. To overcome these limitations, this work describes the creation of the first gold standard dataset for the validation of OIE approaches in Italian. The created dataset has been manually built on the basis of solid linguistic foundations and, then, it has been used for testing an OIE application for the Italian language. The presented resource aims not only to help the estimation of OIE performance, but also to be the first dataset for grammaticality/acceptance judgments in Italian.

Tipologia CRIS:

04.01 Contributo in Atti di convegno

Keywords:

open information extraction; Italian language; acceptability judgements; natural language processing

Elenco autori: