The query-vector document model

Conference Poster

Publication Date:

2006

abstract:

Modern Web IR systems have to manage collections of billions of documents. The indexes used to represent them are very large data structures, the form of which can have a big impact on the quality and the speed of IR algorithms. Traditionally, two main ways are used to model the documents available: the bag-of-words model, and the vector-space model. In the query-vector document model, documents are mod- eled with the list of queries they match, along with the rank they get for each. The query-vector representation of a doc- ument is built out of a query-log. A reference search engine is used in the building phase: for every query in the training set, the system stores the first 100 results along with their rank. This creates a matrix, with documents on columns and queries on rows, where each entry is the rank of a doc- ument for a given query.

Iris type:

04.03 Poster in Atti di convegno

Keywords:

Document Partitioning; Collection Selection

List of contributors:

Puppin, Diego; Silvestri, Fabrizio

Handle:

https://iris.cnr.it/handle/20.500.14243/85955