Publication Date:
2006
abstract:
Modern Web IR systems have to manage collections of billions of documents. The indexes used to represent them are very large data structures, the form of which can have a big impact on the quality and the speed of IR algorithms. Traditionally, two main ways are used to model the documents available: the bag-of-words model, and the vector-space model. In the query-vector document model, documents are mod- eled with the list of queries they match, along with the rank they get for each. The query-vector representation of a doc- ument is built out of a query-log. A reference search engine is used in the building phase: for every query in the training set, the system stores the first 100 results along with their rank. This creates a matrix, with documents on columns and queries on rows, where each entry is the rank of a doc- ument for a given query.
Iris type:
04.03 Poster in Atti di convegno
Keywords:
Document Partitioning; Collection Selection
List of contributors: