Class: Rumale::Manifold::TSNE

Inherits:
Base::Estimator show all
Includes:
Base::Transformer
Defined in:
rumale-manifold/lib/rumale/manifold/tsne.rb

Overview

TSNE is a class that implements t-Distributed Stochastic Neighbor Embedding (t-SNE) with fixed-point optimization algorithm. Fixed-point algorithm usually converges faster than gradient descent method and do not need the learning parameters such as the learning rate and momentum.

Reference

  • van der Maaten, L., and Hinton, G., “Visualizing data using t-SNE,” J. of Machine Learning Research, vol. 9, pp. 2579–2605, 2008.

  • Yang, Z., King, I., Xu, Z., and Oja, E., “Heavy-Tailed Symmetric Stochastic Neighbor Embedding,” Proc. NIPS’09, pp. 2169–2177, 2009.

Examples:

require 'rumale/manifold/tsne'

tsne = Rumale::Manifold::TSNE.new(perplexity: 40.0, init: 'pca', max_iter: 500, random_seed: 1)
representations = tsne.fit_transform(samples)

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Constructor Details

#initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random', max_iter: 500, tol: nil, verbose: false, random_seed: nil) ⇒ TSNE

Create a new transformer with t-SNE.

Parameters:

  • n_components (Integer) (defaults to: 2)

    The number of dimensions on representation space.

  • perplexity (Float) (defaults to: 30.0)

    The effective number of neighbors for each point. Perplexity are typically set from 5 to 50.

  • metric (String) (defaults to: 'euclidean')

    The metric to calculate the distances in original space. If metric is ‘euclidean’, Euclidean distance is calculated for distance in original space. If metric is ‘precomputed’, the fit and fit_transform methods expect to be given a distance matrix.

  • init (String) (defaults to: 'random')

    The init is a method to initialize the representaion space. If init is ‘random’, the representaion space is initialized with normal random variables. If init is ‘pca’, the result of principal component analysis as the initial value of the representation space.

  • max_iter (Integer) (defaults to: 500)

    The maximum number of iterations.

  • tol (Float) (defaults to: nil)

    The tolerance of KL-divergence for terminating optimization. If tol is nil, it does not use KL divergence as a criterion for terminating the optimization.

  • verbose (Boolean) (defaults to: false)

    The flag indicating whether to output KL divergence during iteration.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator.



60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 60

def initialize(n_components: 2, perplexity: 30.0, metric: 'euclidean', init: 'random',
               max_iter: 500, tol: nil, verbose: false, random_seed: nil)
  super()
  @params = {
    n_components: n_components,
    perplexity: perplexity,
    max_iter: max_iter,
    tol: tol,
    metric: metric,
    init: init,
    verbose: verbose,
    random_seed: random_seed || srand
  }
  @rng = Random.new(@params[:random_seed])
end

Instance Attribute Details

#embeddingNumo::DFloat (readonly)

Return the data in representation space.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components])



31
32
33
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 31

def embedding
  @embedding
end

#kl_divergenceFloat (readonly)

Return the Kullback-Leibler divergence after optimization.

Returns:

  • (Float)


35
36
37
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 35

def kl_divergence
  @kl_divergence
end

#n_iterInteger (readonly)

Return the number of iterations run for optimization

Returns:

  • (Integer)


39
40
41
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 39

def n_iter
  @n_iter
end

#rngRandom (readonly)

Return the random generator.

Returns:

  • (Random)


43
44
45
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 43

def rng
  @rng
end

Instance Method Details

#fit(x) ⇒ TSNE

Fit the model with given training data.

Returns The learned transformer itself.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (TSNE)

    The learned transformer itself.



82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 82

def fit(x, _not_used = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
    raise ArgumentError, 'Expect the input distance matrix to be square.'
  end

  # initialize some varibales.
  @n_iter = 0
  distance_mat = @params[:metric] == 'precomputed' ? x**2 : ::Rumale::PairwiseMetric.squared_error(x)
  hi_prob_mat = gaussian_distributed_probability_matrix(distance_mat)
  y = init_embedding(x)
  lo_prob_mat = t_distributed_probability_matrix(y)
  # perform fixed-point optimization.
  one_vec = Numo::DFloat.ones(x.shape[0]).expand_dims(1)
  @params[:max_iter].times do |t|
    break if terminate?(hi_prob_mat, lo_prob_mat)

    a = hi_prob_mat * lo_prob_mat
    b = lo_prob_mat**2
    y = (b.dot(one_vec) * y + (a - b).dot(y)) / a.dot(one_vec)
    lo_prob_mat = t_distributed_probability_matrix(y)
    @n_iter = t + 1
    if @params[:verbose] && (@n_iter % 100).zero?
      puts "[t-SNE] KL divergence after #{@n_iter} iterations: #{cost(hi_prob_mat, lo_prob_mat)}"
    end
  end
  # store results.
  @embedding = y
  @kl_divergence = cost(hi_prob_mat, lo_prob_mat)
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit the model with training data, and then transform them with the learned model.

Returns (shape: [n_samples, n_components]) The transformed data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model. If the metric is ‘precomputed’, x must be a square distance matrix (shape: [n_samples, n_samples]).

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components]) The transformed data



120
121
122
123
124
125
# File 'rumale-manifold/lib/rumale/manifold/tsne.rb', line 120

def fit_transform(x, _not_used = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  fit(x)
  @embedding.dup
end