Publication Date:
2021
abstract:
We take a collection of short texts, some of which are human-written, while others are automatically generated, and ask subjects, who are unaware of the texts' source, whether they perceive them as human-produced. We use this data to fine-tune a GPT-2 model to push it to generate more human-like texts, and observe that the production of this fine-tuned model is indeed perceived as more human-like than that of the original model. Contextually, we show that our automatic evaluation strategy correlates well with human judgements. We also run a linguistic analysis to unveil the characteristics of human- vs machine-perceived language.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Natural Language Generation; Neural Language Models; Evaluation
List of contributors:
Dell'Orletta, Felice
Book title:
Proceedings of the First Workshop on Generation Evaluation and Metrics (GEM 2021)