Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Reasoning and ontologies in data extraction

Capitolo di libro
Data di Pubblicazione:
2012
Abstract:
The web has become a pig sty-everyone dumps information at random places and in random shapes. Try to find the cheapest apartment in Oxford considering rent, travel, tax and heating costs; or a cheap, reasonable reviewed 11" laptop with an SSD drive. Data extraction flushes structured information out of this sty: It turns mostly unstructured web pages into highly structured knowledge. In this chapter, we give a gentle introduction to data extraction including pointers to existing systems. We start with an overview and classification of data extraction systems along two primary dimensions, the level of supervision and the considered scale. The rest of the chapter is organized along the major division of these approaches into site-specific and supervised versus domain-specific and unsupervised. We first discuss supervised data extraction, where a human user identifies for each site examples of the relevant data and the system generalizes these examples into extraction programs. We focus particularly on declarative and rule-based paradigms. In the second part, we turn to fully automated (or unsupervised) approaches where the system by itself identifies the relevant data and fully automatically extracts data from many websites. Ontologies or schemata have proven invaluable to guide unsupervised data extraction and we present an overview of the existing approaches and the different ways in which they are using ontologies. © 2012 Springer-Verlag.
Tipologia CRIS:
02.01 Contributo in volume (Capitolo o Saggio)
Keywords:
Reasoning; ontologies; data extraction
Elenco autori:
Oro, Ermelinda
Autori di Ateneo:
ORO ERMELINDA
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/245872
  • Dati Generali

Dati Generali

URL

http://www.scopus.com/inward/record.url?eid=2-s2.0-84865830940&partnerID=q2rCbXpz
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)