PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification
Contributo in Atti di convegno
Data di Pubblicazione:
2016
Abstract:
In this paper we present PaCCSS-IT, a Parallel Corpus of Complex-Simple Sentences for ITalian. To build the resource we develop a new method for automatically acquiring a corpus of complex-simple paired sentences able to intercept structural transformations and particularly suitable for text simplification. The method requires a wide amount of texts that can be easily extracted from the web making it suitable also for less-resourced languages. We test it on the Italian language making available the biggest Italian corpus for automatic text simplification.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Automatic Text Simplification; Sentence alignment; Italian corpus
Elenco autori:
Venturi, Giulia; Cimino, Andrea; Brunato, DOMINIQUE PIERINA; Dell'Orletta, Felice
Link alla scheda completa: