Class: Rumale::Clustering::SpectralClustering

Inherits:
Base::Estimator show all
Includes:
Base::ClusterAnalyzer
Defined in:
rumale-clustering/lib/rumale/clustering/spectral_clustering.rb

Overview

SpectralClustering is a class that implements the normalized spectral clustering.

Reference

  • Ng, A Y., Jordan, M I., and Weiss, Y., “On Spectral Clustering: Analyssi and an algorithm,” Proc. NIPS’01, pp. 849–856, 2001.

  • von Luxburg, U., “A tutorial on spectral clustering,” Statistics and Computing, Vol. 17 (4), pp. 395–416, 2007.

Examples:

require 'numo/linalg/autoloader'
require 'rumale/clustering/spectral_clustering'

analyzer = Rumale::Clustering::SpectralClustering.new(n_clusters: 10, gamma: 8.0)
cluster_labels = analyzer.fit_predict(samples)

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Methods included from Base::ClusterAnalyzer

#score

Constructor Details

#initialize(n_clusters: 2, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 10, tol: 1.0e-8, random_seed: nil) ⇒ SpectralClustering

Create a new cluster analyzer with normalized spectral clustering.

Parameters:

  • n_clusters (Integer) (defaults to: 2)

    The number of clusters.

  • affinity (String) (defaults to: 'rbf')

    The representation of affinity matrix (‘rbf’ or ‘precomputed’). If affinity = ‘rbf’, the class performs the normalized spectral clustering with the fully connected graph weighted by rbf kernel.

  • gamma (Float) (defaults to: nil)

    The parameter of rbf kernel, if nil it is 1 / n_features. If affinity = ‘precomputed’, this parameter is ignored.

  • init (String) (defaults to: 'k-means++')

    The initialization method for centroids of K-Means clustering (‘random’ or ‘k-means++’).

  • max_iter (Integer) (defaults to: 10)

    The maximum number of iterations for K-Means clustering.

  • tol (Float) (defaults to: 1.0e-8)

    The tolerance of termination criterion for K-Means clustering.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator.



46
47
48
49
50
51
52
53
54
55
56
57
# File 'rumale-clustering/lib/rumale/clustering/spectral_clustering.rb', line 46

def initialize(n_clusters: 2, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 10, tol: 1.0e-8, random_seed: nil)
  super()
  @params = {
    n_clusters: n_clusters,
    affinity: affinity,
    gamma: gamma,
    init: (init == 'random' ? 'random' : 'k-means++'),
    max_iter: max_iter,
    tol: tol,
    random_seed: random_seed || srand
  }
end

Instance Attribute Details

#embeddingNumo::DFloat (readonly)

Return the data in embedded space.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_clusters])



29
30
31
# File 'rumale-clustering/lib/rumale/clustering/spectral_clustering.rb', line 29

def embedding
  @embedding
end

#labelsNumo::Int32 (readonly)

Return the cluster labels.

Returns:

  • (Numo::Int32)

    (shape: [n_samples])



33
34
35
# File 'rumale-clustering/lib/rumale/clustering/spectral_clustering.rb', line 33

def labels
  @labels
end

Instance Method Details

#fit(x) ⇒ SpectralClustering

Analysis clusters with given training data. To execute this method, Numo::Linalg must be loaded.

Returns The learned cluster analyzer itself.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square affinity matrix (shape: [n_samples, n_samples]).

Returns:

Raises:

  • (ArgumentError)


66
67
68
69
70
71
72
73
74
# File 'rumale-clustering/lib/rumale/clustering/spectral_clustering.rb', line 66

def fit(x, _y = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  raise ArgumentError, 'the input affinity matrix should be square' if check_invalid_array_shape(x)

  raise 'SpectralClustering#fit requires Numo::Linalg but that is not loaded' unless enable_linalg?(warning: false)

  fit_predict(x)
  self
end

#fit_predict(x) ⇒ Numo::Int32

Analysis clusters and assign samples to clusters. To execute this method, Numo::Linalg must be loaded.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square affinity matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Predicted cluster label per sample.

Raises:

  • (ArgumentError)


82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'rumale-clustering/lib/rumale/clustering/spectral_clustering.rb', line 82

def fit_predict(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  raise ArgumentError, 'the input affinity matrix should be square' if check_invalid_array_shape(x)

  unless enable_linalg?(warning: false)
    raise 'SpectralClustering#fit_predict requires Numo::Linalg but that is not loaded'
  end

  affinity_mat = @params[:metric] == 'precomputed' ? x : ::Rumale::PairwiseMetric.rbf_kernel(x, nil, @params[:gamma])
  @embedding = embedded_space(affinity_mat, @params[:n_clusters])
  normalized_embedding = ::Rumale::Utils.normalize(@embedding, 'l2')
  @labels = kmeans_clustering(normalized_embedding)
end