Data di Pubblicazione:
2018
Abstract:
Techniques of the Hamming embedding, producing bit string sketches, have been recently successfully applied to speed up similarity search. Sketches are usually compared by the Hamming distance, and applied to filter out non-relevant objects during the query evaluation. As several sketching techniques exist and each can produce sketches with different lengths, it is hard to select a proper configuration for a particular dataset. We assume that the (dis)similarity of objects is expressed by an arbitrary metric function, and we propose a way to efficiently estimate the quality of sketches using just a small sample set of data. Our approach is based on a probabilistic analysis of sketches which describes how separated are objects after projection to the Hamming space.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Bit string sketches; Similarity Search
Elenco autori:
Vadicamo, Lucia
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Advances in Databases and Information Systems. ADBIS 2018.