Publication Date:
2011
abstract:
The ad-hoc task of the microblogging track has an important theoretical impact for Information Retrieval.
A key problem in Information Retrieval is, in fact, how to compare term frequencies among documents of
different length. Apparently, term frequency normalization for microblogging can be simplified because of
the short length constraint for the composition of admissible messages. The shortness of messages reduces
the number of admissible values for the document length, and thus the length of a message can be regarded
as if it were almost small and constant. On the other hand, short messages can carry a small amount of
information, so that they are hardly distinguishable from each other for content. To overcome both problems,
we propose to use a precise mathematical definition of information as the one given by Shannon to
provide an ad hoc IR model for Microblogging search. We show how to use Shannon's information theory
and coding theory to weight the query content in Twitter messages and retrieve relevant messages.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
.
List of contributors:
Gaibisso, Carlo
Book title:
The Twentieth Text REtrieval