Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

SILA: A spatial instance learning approach for deep webpages

Conference Paper
Publication Date:
2011
abstract:
Deep Web pages convey very relevant information for different application domains like e-government, e-commerce, social networking. For this reason there is a constant high interest in efficiently, effectively and automatically extracting data from Deep Web data sources. In this paper we present SILA, a novel Spatial Instance Learning Approach, that allows for extracting data records from Deep Web pages by exploiting both the spatial arrangement and the presentation features of data items/fields produced by layout engines of Web browsers in visualizing Deep Web pages on the screen. SILA is independent from the internal HTML encodings of Web pages, and allows for recognizing data records in pages having multiple data regions in which data items are arranged by many different presentation layouts. Experimental results show that SILA has very high precision and recall and that it works much better than MDR and ViNTs approaches. © 2011 ACM.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
deep web; instance learning; web information extraction; web wrapping
List of contributors:
Ruffolo, Massimo; Oro, Ermelinda
Authors of the University:
ORO ERMELINDA
RUFFOLO MASSIMO
Handle:
https://iris.cnr.it/handle/20.500.14243/253596
Book title:
CIKM '11 Proceedings of the 20th ACM international conference on Information and knowledge management
  • Overview

Overview

URL

http://www.scopus.com/inward/record.url?eid=2-s2.0-83055161475&partnerID=q2rCbXpz
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)