Language Identification of Similar Languages using Recurrent Neural Networks

Conference Paper

Publication Date:

2018

abstract:

The goal of similar Language IDentification (LID) is to quickly and accurately identify the language of the text. It plays an important role in several Natural Language Processing (NLP) applications where it is frequently used as a pre-processing technique. For example, information retrieval systems use LID as a filtering technique to provide users with documents written only in a given language. Although different approaches to this problem have been proposed, similar language identification, in particular applied to short texts, remains a challenging task in NLP. In this paper, a method that combines word vectors representation and Long Short-Term Memory (LSTM) has been implemented. The experimental evaluation on public and well-known datasets has shown that the proposed method improves accuracy and precision of language identification tasks.

Iris type:

04.01 Contributo in Atti di convegno

Keywords:

Language Identification; Word embedding; Natural Language Processing; Deep neural network; Long Short-Term Memory; Recurrent Neural Network.

List of contributors:

Ruffolo, Massimo; Oro, Ermelinda

Authors of the University:

ORO ERMELINDA

RUFFOLO MASSIMO

Handle:

https://iris.cnr.it/handle/20.500.14243/326288