Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Galliz at GeoLingIt: enhancing BERT with vocabulary knowledge for predicting the region of language varieties of Italy

Contributo in Atti di convegno
Data di Pubblicazione:
2023
Abstract:
The linguistic diversity of the Italian peninsula and its islands, characterized by several language varieties, represents a linguistic condition and a cultural treasure unique in Europe. However, the oral nature of these varieties poses a challenge to their preservation in the written form. While significant research efforts have been dedicated to standard Italian language processing, less attention has been given to the language varieties of Italy and the development of supporting resources. This paper aims to study the peculiarities of language varieties of Italy and identify the region of origin of tweets written in non-[Standard Italian] varieties. To achieve this goal, we utilized two main techniques: fine-tuning a language model (BERT) and implementing an algorithm that utilizes dictionaries of regional varieties and word frequency. Our results show that integrating lexical analysis with BERT could be a promising approach for this particular task. We present an overview of the data, methodology, and evaluation results, then discuss the implications of our findings.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Natural Language Processing; Language varieties; Tweets classification
Elenco autori:
Gallo, Simone
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/451787
Link al Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/451787/132688/prod_489916-doc_204414.pdf
Titolo del libro:
Proceedings of the Eighth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2023)
Pubblicato in:
CEUR WORKSHOP PROCEEDINGS
Series
  • Dati Generali

Dati Generali

URL

https://ceur-ws.org/Vol-3473/paper15.pdf
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)