Publication Date:
2021
abstract:
We illustrate an approach for multilingual treebanks explorations by introducing a novel adaptation to small treebanks of a methodology for identifying cross-lingual quantitative trends in the distribution of dependency relations. By relying on the principles of cross-validation, we reduce the amount of data required to execute the method, paving the way to expanding its use to low-resources languages. We validated the approach on 8 small treebanks, each containing less than 100,000 tokens and representing typologically different languages. We also show preliminary but promising evidence on the use of the proposed methodology for treebank expansion.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
universal depedency; language resources; quality check; treebank expansion
List of contributors:
Alzetta, Chiara
Published in: