Data di Pubblicazione:
2004
Abstract:
This short note describes the main characteristics of WebDocs, a huge real-life transactional dataset we made publicly available to the Data Mining community through the FIMI repository. We built WebDocs from a spidered collection of web html documents. The whole collection contains about 1.7 millions documents, mainly written in English, and its size is about 5GB.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Frequent itemsets mining datasets
Elenco autori:
Orlando, Salvatore; Silvestri, Fabrizio; Lucchese, Claudio; Perego, Raffaele
Link alla scheda completa: