Data di Pubblicazione:
2003
Abstract:
WebCat is a versatile system which reorganizes search
results into a partition of homogeneous document clusters
using Data Mining techniques. The purpose is to help
users to easily browse through the set of retrieved
documents, by focusing on clusters whose characterizing
keywords are directly pertinent to the search. WebCat
submits a query specified by the user to the Google search
engine, and retrieves a large number of snippets, i.e.,
answers. Then, snippets are modelled as sets of (clean,
stemmed) terms and are partitioned into clusters by means of
the Transactional K-means algorithm. Clusters are then
presented to the users by means of their centroids (i.e.,
sets of terms which well represent the content of each
cluster) which can be used as a fast access method to the
answers contained in each cluster. The overall system is
computationally light, very fast, and can be run on the
client side as a Internet Explorer toolbar (similar to the
Google Toolbar).
Tipologia CRIS:
05.11 Software
Keywords:
Web mining; Clustering; Search engines
Elenco autori:
Giannotti, Fosca; Nanni, Mirco
Link alla scheda completa: