Class: Rumale::EvaluationMeasure::SilhouetteScore
- Inherits:
-
Object
- Object
- Rumale::EvaluationMeasure::SilhouetteScore
- Includes:
- Base::Evaluator
- Defined in:
- rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb
Overview
SilhouetteScore is a class that calculates the Silhouette Coefficient.
Reference
-
Rousseuw, P J., “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, Vol. 20, pp. 53–65, 1987.
Instance Method Summary collapse
-
#initialize(metric: 'euclidean') ⇒ SilhouetteScore
constructor
Create a new evaluator that calculates the silhouette coefficient.
-
#score(x, y) ⇒ Float
Calculates the silhouette coefficient.
Constructor Details
#initialize(metric: 'euclidean') ⇒ SilhouetteScore
Create a new evaluator that calculates the silhouette coefficient.
26 27 28 |
# File 'rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb', line 26 def initialize(metric: 'euclidean') @metric = metric end |
Instance Method Details
#score(x, y) ⇒ Float
Calculates the silhouette coefficient.
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
# File 'rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb', line 35 def score(x, y) dist_mat = @metric == 'precomputed' ? x : ::Rumale::PairwiseMetric.euclidean_distance(x) labels = y.to_a.uniq.sort n_clusters = labels.size n_samples = dist_mat.shape[0] intra_dists = Numo::DFloat.zeros(n_samples) n_clusters.times do |n| cls_pos = y.eq(labels[n]) sz_cluster = cls_pos.count next unless sz_cluster > 1 cls_dist_mat = dist_mat[cls_pos, cls_pos].dup cls_dist_mat[cls_dist_mat.diag_indices] = 0.0 intra_dists[cls_pos] = cls_dist_mat.sum(axis: 0) / (sz_cluster - 1) end inter_dists = Numo::DFloat.zeros(n_samples) + Float::INFINITY n_clusters.times do |m| cls_pos = y.eq(labels[m]) n_clusters.times do |n| next if m == n not_cls_pos = y.eq(labels[n]) inter_dists[cls_pos] = Numo::DFloat.minimum( inter_dists[cls_pos], dist_mat[cls_pos, not_cls_pos].mean(1) ) end end mask = Numo::DFloat.ones(n_samples) n_clusters.times do |n| cls_pos = y.eq(labels[n]) mask[cls_pos] = 0 unless cls_pos.count > 1 end silhouettes = mask * ((inter_dists - intra_dists) / Numo::DFloat.maximum(inter_dists, intra_dists)) silhouettes[silhouettes.isnan] = 0.0 silhouettes.mean end |