Class: Rumale::EvaluationMeasure::SilhouetteScore

Inherits:
Object
  • Object
show all
Includes:
Base::Evaluator
Defined in:
rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb

Overview

SilhouetteScore is a class that calculates the Silhouette Coefficient.

Reference

  • Rousseuw, P J., “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, Vol. 20, pp. 53–65, 1987.

Examples:

require 'rumale/evaluation_measure/silhouette_score'

evaluator = Rumale::EvaluationMeasure::SilhouetteScore.new
puts evaluator.score(x, predicted)

Instance Method Summary collapse

Constructor Details

#initialize(metric: 'euclidean') ⇒ SilhouetteScore

Create a new evaluator that calculates the silhouette coefficient.

Parameters:

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the sihouette coefficient. If metric is ‘euclidean’, Euclidean distance is used for dissimilarity between sample points. If metric is ‘precomputed’, the score method expects to be given a distance matrix.



26
27
28
# File 'rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb', line 26

def initialize(metric: 'euclidean')
  @metric = metric
end

Instance Method Details

#score(x, y) ⇒ Float

Calculates the silhouette coefficient.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be used for calculating score.

  • y (Numo::Int32)

    (shape: [n_samples]) The predicted labels for each sample.

Returns:

  • (Float)

    The mean of silhouette coefficient.



35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# File 'rumale-evaluation_measure/lib/rumale/evaluation_measure/silhouette_score.rb', line 35

def score(x, y)
  dist_mat = @metric == 'precomputed' ? x : ::Rumale::PairwiseMetric.euclidean_distance(x)

  labels = y.to_a.uniq.sort
  n_clusters = labels.size
  n_samples = dist_mat.shape[0]

  intra_dists = Numo::DFloat.zeros(n_samples)
  n_clusters.times do |n|
    cls_pos = y.eq(labels[n])
    sz_cluster = cls_pos.count
    next unless sz_cluster > 1

    cls_dist_mat = dist_mat[cls_pos, cls_pos].dup
    cls_dist_mat[cls_dist_mat.diag_indices] = 0.0
    intra_dists[cls_pos] = cls_dist_mat.sum(axis: 0) / (sz_cluster - 1)
  end

  inter_dists = Numo::DFloat.zeros(n_samples) + Float::INFINITY
  n_clusters.times do |m|
    cls_pos = y.eq(labels[m])
    n_clusters.times do |n|
      next if m == n

      not_cls_pos = y.eq(labels[n])
      inter_dists[cls_pos] = Numo::DFloat.minimum(
        inter_dists[cls_pos], dist_mat[cls_pos, not_cls_pos].mean(1)
      )
    end
  end

  mask = Numo::DFloat.ones(n_samples)
  n_clusters.times do |n|
    cls_pos = y.eq(labels[n])
    mask[cls_pos] = 0 unless cls_pos.count > 1
  end

  silhouettes = mask * ((inter_dists - intra_dists) / Numo::DFloat.maximum(inter_dists, intra_dists))
  silhouettes[silhouettes.isnan] = 0.0

  silhouettes.mean
end