Data di Pubblicazione:
2001
Abstract:
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types (such as image, video, or audio features) grows, the traditional partitioning techniques based on total ordering of data can not typically be applied. We consider the problem of partitioning data collections from generic metric spaces, where total ordering of objects does not exists, and where only distances between pairs of objects can be determined. We study the elementary type of partitioning that splits a given collection into two well-separated subsets, allowing some objects to be excluded from the partitioning process. Five implementation techniques of separable splits are proposed and proved for correctness. The rst two are simple extensions of the known ball partitioning and the generalized hyperplane approaches, the third is an advanced hyperplane partitioning. The additional two techniques are completely original and are based on the elliptic and pseudo-elliptic geometric strategies. E ectiveness of all techniques is evaluated in terms of their ability to equalize the separable set sizes, and to minimize the number of excluded objects. Proposed techniques are evaluated on three large data les.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Data partitioning; Advanced hyperplane partitioning; Metric space; Information search and retrieval
Elenco autori:
Zezula, Pavel; Savino, Pasquale; Gennaro, Claudio
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Atti del IX Congresso Nazionale SEBD 2001