Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Distribution-Preserving Stratified Sampling for Learning Problems

Articolo
Data di Pubblicazione:
2018
Abstract:
The need for extracting a small sample from a large amount of real data, possibly streaming, arises routinely in learning problems, e.g., for storage, to cope with computational limitations, obtain good training/test/validation sets, and select minibatches for stochastic gradient neural network training. Unless we have reasons to select the samples in an active way dictated by the specific task and/or model at hand, it is important that the distribution of the selected points is as similar as possible to the original data. This is obvious for unsupervised learning problems, where the goal is to gain insights on the distribution of the data, but it is also relevant for supervised problems, where the theory explains how the training set distribution influences the generalization error. In this paper, we analyze the technique of stratified sampling from the point of view of distances between probabilities. This allows us to introduce an algorithm, based on recursive binary partition of the input space, aimed at obtaining samples that are distributed as much as possible as the original data. A theoretical analysis is proposed, proving the (greedy) optimality of the procedure together with explicit error bounds. An adaptive version of the algorithm is also introduced to cope with streaming data. Simulation tests on various data sets and different learning tasks are also provided.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Adaptive sampling; binary recursive partition; F-discrepancy; stratified sampling
Elenco autori:
Cervellera, Cristiano; Maccio', Danilo
Autori di Ateneo:
CERVELLERA CRISTIANO
MACCIO' DANILO
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/353075
Pubblicato in:
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS
Journal
  • Dati Generali

Dati Generali

URL

https://ieeexplore.ieee.org/document/7945296
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)