WebCat

Software

Data di Pubblicazione:

2003

Abstract:

WebCat is a versatile system which reorganizes search results into a partition of homogeneous document clusters using Data Mining techniques. The purpose is to help users to easily browse through the set of retrieved documents, by focusing on clusters whose characterizing keywords are directly pertinent to the search. WebCat submits a query specified by the user to the Google search engine, and retrieves a large number of snippets, i.e., answers. Then, snippets are modelled as sets of (clean, stemmed) terms and are partitioned into clusters by means of the Transactional K-means algorithm. Clusters are then presented to the users by means of their centroids (i.e., sets of terms which well represent the content of each cluster) which can be used as a fast access method to the answers contained in each cluster. The overall system is computationally light, very fast, and can be run on the client side as a Internet Explorer toolbar (similar to the Google Toolbar).

Tipologia CRIS:

05.11 Software

Keywords:

Web mining; Clustering; Search engines

Elenco autori:

Giannotti, Fosca; Nanni, Mirco

Autori di Ateneo:

NANNI MIRCO

Link alla scheda completa:

https://iris.cnr.it/handle/20.500.14243/180200