Class: Rumale::Clustering::SNN

Inherits:
DBSCAN show all
Defined in:
rumale-clustering/lib/rumale/clustering/snn.rb

Overview

SNN is a class that implements Shared Nearest Neighbor cluster analysis. The SNN method is a variation of DBSCAN that uses similarity based on k-nearest neighbors as a metric.

Reference

  • Ertoz, L., Steinbach, M., and Kumar, V., “Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,” Proc. SDM’03, pp. 47–58, 2003.

  • Houle, M E., Kriegel, H-P., Kroger, P., Schubert, E., and Zimek, A., “Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?,” Proc. SSDBM’10, pp. 482–500, 2010.

Examples:

require 'rumale/clustering/snn'

analyzer = Rumale::Clustering::SNN.new(n_neighbros: 10, eps: 5, min_samples: 5)
cluster_labels = analyzer.fit_predict(samples)

Instance Attribute Summary

Attributes inherited from DBSCAN

#core_sample_ids, #labels

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Methods included from Base::ClusterAnalyzer

#score

Constructor Details

#initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean') ⇒ SNN

Create a new cluster analyzer with Shared Neareset Neighbor method.

Parameters:

  • n_neighbors (Integer) (defaults to: 10)

    The number of neighbors to be used for finding k-nearest neighbors.

  • eps (Integer) (defaults to: 5)

    The threshold value for finding connected components based on similarity.

  • min_samples (Integer) (defaults to: 5)

    The number of neighbor samples to be used for the criterion whether a point is a core point.

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the distances. If metric is ‘euclidean’, Euclidean distance is calculated for distance between points. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.



29
30
31
32
33
34
35
36
# File 'rumale-clustering/lib/rumale/clustering/snn.rb', line 29

def initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean') # rubocop:disable Lint/MissingSuper
  @params = {
    n_neighbors: n_neighbors,
    eps: eps,
    min_samples: min_samples,
    metric: (metric == 'precomputed' ? 'precomputed' : 'euclidean')
  }
end

Instance Method Details

#fit(x) ⇒ SNN

Analysis clusters with given training data.

Returns The learned cluster analyzer itself.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (SNN)

    The learned cluster analyzer itself.



44
45
46
# File 'rumale-clustering/lib/rumale/clustering/snn.rb', line 44

def fit(x, _y = nil)
  super
end

#fit_predict(x) ⇒ Numo::Int32

Analysis clusters and assign samples to clusters.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be used for cluster analysis. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Predicted cluster label per sample.



53
54
55
# File 'rumale-clustering/lib/rumale/clustering/snn.rb', line 53

def fit_predict(x) # rubocop:disable Lint/UselessMethodDefinition
  super
end