Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Popularity-based caching of CMS datasets

Capitolo di libro
Data di Pubblicazione:
2018
Abstract:
The distributed monitoring infrastructure of the Compact Muon Solenoid (CMS) experiment at the European Organization for Nuclear Research (CERN) records on a Hadoop infrastructures a broad variety of computing and storage logs. They represent a valuable source of information for system tuning and capacity planning. In this paper we analyze machine learning (ML) techniques on large amount of traces to discover patterns and correlations useful to classify the popularity of experiment-related datasets. We implement a scalable pipeline of Spark components which collect the dataset access logs from heterogeneous monitoring sources and group them into weekly snapshots organized by CMS sites. Predictive models are trained on these snapshots and forecast which dataset will become popular over time. Dataset popularity predictions are then used to experiment a novel strategy of data caching, called Popularity Prediction Caching (PPC). We compare the hit rates of PPC with those produced by well known caching policies. We demonstrate how the performance improvement is as high as 20% in some sites.
Tipologia CRIS:
02.01 Contributo in volume (Capitolo o Saggio)
Keywords:
Big Data; Caching; CERN CMS; Classification; Dataset Popularity
Elenco autori:
Tonellotto, Nicola; Perego, Raffaele
Autori di Ateneo:
PEREGO RAFFAELE
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/411789
Titolo del libro:
Parallel Computing is Everywhere
  • Dati Generali

Dati Generali

URL

http://ebooks.iospress.nl/volumearticle/48611
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)