Publication Date:
2018
abstract:
The goal of similar Language IDentification (LID) is to quickly and accurately identify the language of the text. It plays an important role in several Natural Language Processing (NLP) applications where it is frequently used as a pre-processing technique. For example, information retrieval systems use LID as a filtering technique to provide users with documents written only in a given language. Although different approaches
to this problem have been proposed, similar language identification, in particular applied to short texts, remains a challenging task in NLP. In this paper, a method that combines word vectors representation and Long Short-Term Memory (LSTM) has been implemented. The experimental evaluation on public and well-known datasets has shown that the proposed method improves accuracy and precision of language identification tasks.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Language Identification; Word embedding; Natural Language Processing; Deep neural network; Long Short-Term Memory; Recurrent Neural Network.
List of contributors: