Skip to Main Content (Press Enter)

Logo CNR
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills

UNI-FIND
Logo CNR

|

UNI-FIND

cnr.it
  • ×
  • Home
  • People
  • Outputs
  • Organizations
  • Expertise & Skills
  1. Outputs

An Efficient Algorithm for Clustering Sets

Conference Paper
Publication Date:
2023
abstract:
This paper proposes an algorithm, named HWK-Sets, based on K-Means, suited for clustering data which are variable-sized sets of elementary items. Clustering sets is difficult because data objects do not have numerical attributes and it is not possible to use the classical Euclidean distance upon which K-Means is normally based. An adaptation of the Jaccard distance between sets is used, which exploits application-sensitive information. More in particular, the Hartigan and Wong variation of K-Means is adopted which uses medoids as cluster representatives, can work with several seeding methods and can favor the fast attainment of a careful solution. The paper introduces HWK-Sets which is implemented in Java by parallel streams. Then, the efficiency and accuracy of HWK-Sets are demonstrated by simulation experiments.
Iris type:
04.01 Contributo in Atti di convegno
Keywords:
Clustering sets; Hartigan & Wong K-Means; Jaccard distance; Medoids; Seeding methods; benchmark datasets.
List of contributors:
Cicirelli, FRANCO DOMENICO
Authors of the University:
CICIRELLI FRANCO DOMENICO
Handle:
https://iris.cnr.it/handle/20.500.14243/463518
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.5.0.0 | Sorgente dati: PREPROD (Ribaltamento disabilitato)