Information Extraction from Presentation-Oriented Documents

Articolo

Data di Pubblicazione:

2012

Abstract:

The Web is the largest knowledge repository ever. In recent years there has been considerable interest in languages and approaches providing structured (eg XML) and semantic (eg Semantic Web) representation of Web content. However, most of the information available is still accessed via Web pages in HTML and documents in PDF, both of which have internal encoding conceived to present content on screen to human users. This makes automatic information extraction problematic.

Tipologia CRIS:

01.01 Articolo in rivista

Keywords:

Information Extraction; Presentation-Oriented Documents

Elenco autori: