Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Space-efficient data structures for top-k completion

Contributo in Atti di convegno
Data di Pubblicazione:
2013
Abstract:
Virtually every modern search application, either desktop, web, or mobile, features some kind of query auto-completion. In its basic form, the problem consists in retrieving from a string set a small number of completions, i.e. strings be- ginning with a given prefix, that have the highest scores according to some static ranking. In this paper, we focus on the case where the string set is so large that compres- sion is needed to fit the data structure in memory. This is a compelling case for web search engines and social networks, where it is necessary to index hundreds of millions of distinct queries to guarantee a reasonable coverage; and for mobile devices, where the amount of memory is limited. We present three different trie-based data structures to address this problem, each one with different space/time/ complexity trade-offs. Experiments on large-scale datasets show that it is possible to compress the string sets, including the scores, down to spaces competitive with the gzip'ed data, while supporting efficient retrieval of completions at about a microsecond per completion.
Tipologia CRIS:
04.01 Contributo in Atti di convegno
Keywords:
Top-k completion; Scored string sets; Tries; Compression; H.3.3 Information Search and Retrieval; E.1 Data Structures. Trees
Elenco autori:
Ottaviano, Giuseppe
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/244964
Link al Full Text:
https://iris.cnr.it//retrieve/handle/20.500.14243/244964/23228/prod_277761-doc_78332.pdf
  • Dati Generali

Dati Generali

URL

http://dl.acm.org/citation.cfm?id=2488440
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)