Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

Contributo in Atti di convegno

Data di Pubblicazione:

2023

Abstract:

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Tipologia CRIS:

04.01 Contributo in Atti di convegno

Keywords:

cryptocurrency influencers; data augmentation; author profiling; text classification; Twitter; text enrichment

Elenco autori:

Siino, Marco; Tesconi, Maurizio

Autori di Ateneo:

TESCONI MAURIZIO

Link alla scheda completa:

https://iris.cnr.it/handle/20.500.14243/452010