Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • Persone
  • Pubblicazioni
  • Strutture
  • Competenze
  1. Pubblicazioni

Weighting passages enhances accuracy

Articolo
Data di Pubblicazione:
2020
Abstract:
We observe that in curated documents the distribution of the occurrences of salient terms, e.g., terms with a high Inverse Document Frequency, is not uniform, and such terms are primarily concentrated towards the beginning and the end of the document. Exploiting this observation, we propose a novel version of the classical BM25 weighting model, called BM25 Passage (BM25P), which scores query results by computing a linear combination of term statistics in the different portions of the document. We study a multiplicity of partitioning schemes of document content into passages and compute the collection-dependent weights associated with them on the basis of the distribution of occurrences of salient terms in documents. Moreover, we tune BM25P hyperparameters and investigate their impact on ad hoc document retrieval through fully reproducible experiments conducted using four publicly available datasets. Our findings demonstrate that our BM25P weighting model markedly and consistently outperforms BM25 in terms of effectiveness by up to 17.44% in NDCG@5 and 85% in NDCG@1, and up to 21% in MRR.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Passage retrieval; BM25P; Weighting models; Salient terms; Evaluation
Elenco autori:
Nardini, FRANCO MARIA; Muntean, CRISTINA-IOANA; Perego, Raffaele
Autori di Ateneo:
MUNTEAN CRISTINA-IOANA
NARDINI FRANCO MARIA
PEREGO RAFFAELE
Link alla scheda completa:
https://iris.cnr.it/handle/20.500.14243/385058
Pubblicato in:
ACM TRANSACTIONS ON INFORMATION SYSTEMS
Journal
  • Dati Generali

Dati Generali

URL

https://doi.org/10.1145/3428687
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)