Data di Pubblicazione:
2003
Abstract:
information others than those usually found in machine readable
dictionaries or manually encoded by lexicographers are urgently needed.
Different sources must be exploited if we want to overcome the lexical
bottleneck of Natural Language Processing. Very interesting data can
be found by processing large textual corpora, where the actual usage of
the language can be truly investigated. These data refer, typically, to
various kinds of syntagmatic relations, which are particularly
problematic in many NLP applications. The paper describes how this
data can be at least partially extracted by processing and analysing large
text corpora, with quantitative/statistic methods. We describe two types of
quantitative analyses whose aim is to extract information on the strength
of association between two words, and on fixed phrases and idioms. We
observe how the measure of the association ratio provides quantitative
evidence to a number of lexical, syntactic and semantic relationships
between word-pairs. One of the claims is that the linguistic information
embodied in all these quite different types of lexical collocations can be
helpful for lexical disambiguation in analysis and crucial for lexical
selection in generation. This is a step towards a more objective
lexicography and a more data-based linguistics.
Tipologia CRIS:
01.01 Articolo in rivista
Elenco autori:
Bindi, Remo; Zamorani, Nicoletta
Link alla scheda completa: