Data di Pubblicazione:
2021
Abstract:
In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
transformer models; nlp; punctuation restoration
Elenco autori:
Miaschi, Alessio; Ravelli, ANDREA AMELIO; Dell'Orletta, Felice
Link alla scheda completa:
Pubblicato in: