Data di Pubblicazione:
2014
Abstract:
The emergence of the Web of Data, in particular Linked Open Data (LOD) [1], has led to an abundance of data available on the Web. Data is shared as part of datasets, often containing inter-dataset links [6], mostly concentrated on established datasets, such as DBpedia. Datasets vary significantly with respect to represented resource types, currentness, coverage of topics and domains, size, used languages, coherence, accessibility [3] or general quality aspects. The challenges from such diversity are underlined by the limited reuse of datasets from the LOD Cloud, where reuse and linking often focus on well-known datasets like DBpedia. Therefore, descriptive and reliable metadata are paramount to enable targeted search, assessment and reuse of datasets. To address these issues and building up on earlier work [4], we propose an automated approach for creating structured
profiles describing the topic coverage of individual datasets. The proposed approach considers a combination of sampling, topic extraction and topic ranking techniques. The sampling process is used to determine the best trade-off between scalability and profiling accuracy. Topic ranking is based on an adoption of graphical models PageRank, K-Step Markov, and HITS, which introduces prior knowledge into the computation of vertex importance [7]. Finally, the generated profiles are exposed as part of a public dataset based on the Vocabulary of Interlinked Datasets (VoID)and the newly introduced vocabulary of links (VoL) which describes the degree of relatedness between datasets and topics.
Tipologia CRIS:
04.03 Poster in Atti di convegno
Elenco autori:
Taibi, Davide
Link alla scheda completa:
Titolo del libro:
WWW Companion '14: Proceedings of the companion publication of the 23rd international conference on World wide web companion