Class: Rumale::Clustering::KMeans

Inherits:
Base::Estimator show all
Includes:
Base::ClusterAnalyzer
Defined in:
rumale-clustering/lib/rumale/clustering/k_means.rb

Overview

KMeans is a class that implements K-Means cluster analysis. The current implementation uses the Euclidean distance for analyzing the clusters.

Reference

  • Arthur, D., and Vassilvitskii, S., “k-means++: the advantages of careful seeding,” Proc. SODA’07, pp. 1027–1035, 2007.

Examples:

require 'rumale/clustering/k_means'

analyzer = Rumale::Clustering::KMeans.new(n_clusters: 10, max_iter: 50)
cluster_labels = analyzer.fit_predict(samples)

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Methods included from Base::ClusterAnalyzer

#score

Constructor Details

#initialize(n_clusters: 8, init: 'k-means++', max_iter: 50, tol: 1.0e-4, random_seed: nil) ⇒ KMeans

Create a new cluster analyzer with K-Means method.

Parameters:

  • n_clusters (Integer) (defaults to: 8)

    The number of clusters.

  • init (String) (defaults to: 'k-means++')

    The initialization method for centroids (‘random’ or ‘k-means++’).

  • max_iter (Integer) (defaults to: 50)

    The maximum number of iterations.

  • tol (Float) (defaults to: 1.0e-4)

    The tolerance of termination criterion.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator.



39
40
41
42
43
44
45
46
47
48
49
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 39

def initialize(n_clusters: 8, init: 'k-means++', max_iter: 50, tol: 1.0e-4, random_seed: nil)
  super()
  @params = {
    n_clusters: n_clusters,
    init: (init == 'random' ? 'random' : 'k-means++'),
    max_iter: max_iter,
    tol: tol,
    random_seed: random_seed || srand
  }
  @rng = Random.new(@params[:random_seed])
end

Instance Attribute Details

#cluster_centersNumo::DFloat (readonly)

Return the centroids.

Returns:

  • (Numo::DFloat)

    (shape: [n_clusters, n_features])



26
27
28
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 26

def cluster_centers
  @cluster_centers
end

#rngRandom (readonly)

Return the random generator.

Returns:

  • (Random)


30
31
32
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 30

def rng
  @rng
end

Instance Method Details

#fit(x) ⇒ KMeans

Analysis clusters with given training data.

Returns The learned cluster analyzer itself.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis.

Returns:

  • (KMeans)

    The learned cluster analyzer itself.



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 56

def fit(x, _y = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  init_cluster_centers(x)
  @params[:max_iter].times do |_t|
    cluster_labels = assign_cluster(x)
    old_centers = @cluster_centers.dup
    @params[:n_clusters].times do |n|
      assigned_bits = cluster_labels.eq(n)
      @cluster_centers[n, true] = x[assigned_bits.where, true].mean(axis: 0) if assigned_bits.count.positive?
    end
    error = Numo::NMath.sqrt(((old_centers - @cluster_centers)**2).sum(axis: 1)).mean
    break if error <= @params[:tol]
  end
  self
end

#fit_predict(x) ⇒ Numo::Int32

Analysis clusters and assign samples to clusters.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis.

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Predicted cluster label per sample.



87
88
89
90
91
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 87

def fit_predict(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  fit(x).predict(x)
end

#predict(x) ⇒ Numo::Int32

Predict cluster labels for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the cluster label.

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Predicted cluster label per sample.



77
78
79
80
81
# File 'rumale-clustering/lib/rumale/clustering/k_means.rb', line 77

def predict(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  assign_cluster(x)
end