Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Medium sized crawling made fast and easy through Lumbricus webis

Contributo in Atti di convegno
Data di Pubblicazione:
2011
Abstract:
Web crawlers have become popular tools for collecting large portions of the web that can be used for many tasks from statistics to structural analysis of the web. Due to the amount of data and the heterogeneity of tasks to manage, it is essential for crawlers to have a modular and distributed architecture. In this paper we describe Lumbricus webis, (short L.webis) a modular crawling infrastructure built to mine data from the.it domain and portions of the web reachable from it. The purpose of our crawler is to support gathering of advanced statistics, and advanced analytic tools on the content of the Italian Web. This paper describes the architectural features of L.webis and its performance. L.webis can currently download a mid-sized ccTLD such as .it in about one week. © 2011 IEEE.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Software development
Elenco autori:
Felicioli, Claudio; Pellegrini, Marco; Geraci, Filippo
Autori di Ateneo:
GERACI FILIPPO
PELLEGRINI MARCO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/271952
Titolo del libro:
International Conference on Machine Learning and Cybernetics (ICMLC), 2011
Pubblicato in:
PROCEEDINGS OF ... INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS
Series
  • Dati Generali

Dati Generali

URL

http://www.scopus.com/inward/record.url?eid=2-s2.0-80155188419&partnerID=q2rCbXpz
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)