Data di Pubblicazione:
2019
Abstract:
In this paper, we present a collection of news documents labeled at the level of crisp events. Compared to other publicly-available collections, our dataset is made of heterogeneous documents published by popular news channels on different platforms in the same temporal window and, therefore, dealing with roughly the same events and topics. The collection spans 4 months and comprises 147K news documents from 27 news streams, i.e., 9 different channels and 3 platforms: Twitter, RSS portals, and news websites. We also provide relevance labels of news documents for some selected events. These relevance judgments were collected using crowdsourcing. The collection can be useful to researchers investigating challenging newsmining tasks, such as event detection and tracking, multi-stream analysis, and temporal analysis of news publishing patterns.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
test collections; news streams; event detection and analysis
Elenco autori:
Mele, Ida
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval