Investigating topic-agnostic features for authorship tasks in Spanish political speeches
Contributo in Atti di convegno
Data di Pubblicazione:
2022
Abstract:
Authorship Identification is the branch of authorship analysis concerned with uncovering the author of a written document. Methods devised for Authorship Identification typically employ stylometry (the analysis of unconscious traits that authors exhibit while writing), and are expected not to make inferences grounded on the topics the authors usually write about (as reflected in their past production). In this paper, we present a series of experiments evaluating the use of feature sets based on rhythmic and psycholinguistic patterns for Authorship Verification and Attribution in Spanish political language, via different approaches of text distortion used to actively mask the underlying topic. We feed these feature sets to a SVM learner, and show that they lead to results that are comparable to those obtained by the BETO transformer when the latter is trained on the original text, i.e., when potentially learning from topical information.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Authorship identification; Text distortion; Political speech
Elenco autori:
MOREO FERNANDEZ, ALEJANDRO DAVID
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Natural Language Processing and Information Systems