Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers
Contributo in Atti di convegno
Data di Pubblicazione:
2023
Abstract:
From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
cryptocurrency influencers; data augmentation; author profiling; text classification; Twitter; text enrichment
Elenco autori:
Siino, Marco; Tesconi, Maurizio
Link alla scheda completa: