Publication Date:
2004
abstract:
This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Frequent itemsets mining datasets
List of contributors: