Towards a dataset for natural language requirements processing

Conference Paper

Publication Date:

2017

abstract:

[Context and motivation] The current breakthrough of natural language processing (NLP) techniques can provide the requirements engineering (RE) community with powerful tools that can help addressing specic tasks of natural language (NL) requirements analysis, such as traceability, ambiguity detection and requirements classification, to name a few. [Question/problem] However, modern NLP techniques are mainly statistical, and need large NL requirements datasets, to support appropriate training, test and validation of the techniques. The RE community has experimented with NLP since long time, but datasets were often proprietary, or limited to few software projects for which requirements were publicly available. Hence, replication of the experiments and generalization have always been an issue. [Principal idea/results] Our near future commitment is to provide a publicly available NL requirements dataset. [Contribution] To this end, we are collecting requirements documents from the Web, and we are representing them in a common XML format. In this paper, we present the current version of the dataset, together with our agenda concerning formatting, extension, and annotation of the dataset.

Iris type:

04.01 Contributo in Atti di convegno

Keywords:

NAtural language processing; Natural language requirements; Requirements classifications; Requirements document

List of contributors: