Class: Rumale::EvaluationMeasure::CalinskiHarabaszScore

Inherits:
Object
  • Object
show all
Includes:
Base::Evaluator
Defined in:
rumale-evaluation_measure/lib/rumale/evaluation_measure/calinski_harabasz_score.rb

Overview

CalinskiHarabaszScore is a class that calculates the Calinski and Harabasz score.

Reference

  • Calinski, T., and Harabsz, J., “A dendrite method for cluster analysis,” Communication in Statistics, Vol. 3 (1), pp. 1–27, 1972.

Examples:

require 'rumale/evaluation_measure/calinski_harabasz_score'

evaluator = Rumale::EvaluationMeasure::CalinskiHarabaszScore.new
puts evaluator.score(x, predicted)

Instance Method Summary collapse

Instance Method Details

#score(x, y) ⇒ Float

Calculates the Calinski and Harabasz score.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be used for calculating score.

  • y (Numo::Int32)

    (shape: [n_samples]) The predicted labels for each sample.

Returns:

  • (Float)

    The Calinski and Harabasz score.



25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'rumale-evaluation_measure/lib/rumale/evaluation_measure/calinski_harabasz_score.rb', line 25

def score(x, y)
  labels = y.to_a.uniq.sort
  n_clusters = labels.size
  n_dimensions = x.shape[1]

  centroids = Numo::DFloat.zeros(n_clusters, n_dimensions)

  within_group = 0.0
  n_clusters.times do |n|
    cls_samples = x[y.eq(labels[n]), true]
    cls_centroid = cls_samples.mean(0)
    centroids[n, true] = cls_centroid
    within_group += ((cls_samples - cls_centroid)**2).sum
  end

  return 1.0 if within_group.zero?

  mean_vec = x.mean(0)
  between_group = 0.0
  n_clusters.times do |n|
    sz_cluster = y.eq(labels[n]).count
    between_group += sz_cluster * ((centroids[n, true] - mean_vec)**2).sum
  end

  n_samples = x.shape[0]
  (between_group / (n_clusters - 1)) / (within_group / (n_samples - n_clusters))
end