Data di Pubblicazione:
2022
Abstract:
Approximate search for high-dimensional vectors is commonly addressed using dedicated techniques often combined with hardware acceleration provided by GPUs, FPGAs, and other custom in-memory silicon.
Despite their effectiveness, harmonizing those optimized solutions with other types of searches often poses technological difficulties. For example, to implement a combined text+image multimodal search, we are forced first to query the index of high-dimensional image descriptors and then filter the results based on the textual query or vice versa.
This paper proposes a text surrogate technique to translate real-valued vectors into text and index them with a standard textual search engine such as Elasticsearch or Apache Lucene. This technique allows us to perform approximate kNN searches of high-dimensional vectors alongside classical full-text searches natively on a single textual search engine, enabling multimedia queries without sacrificing scalability. Our proposal exploits a combination of vector quantization and scalar quantization.
We compared our approach to the existing literature in this field of research, demonstrating a significant improvement in performance through preliminary experimentation.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Surrogate text representation; Inverted index; Approximate search; High-dimensional indexing; Very large databases
Elenco autori:
Amato, Giuseppe; Gennaro, Claudio; Vadicamo, Lucia; Carrara, Fabio
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Similarity Search and Applications