Class: Rumale::Preprocessing::BinDiscretizer

Inherits:
Base::Estimator show all
Includes:
Base::Transformer
Defined in:
rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb

Overview

Discretizes features with a given number of bins. In some cases, discretizing features may accelerate decision tree training.

Examples:

require 'rumale/preprocessing/bin_discretizer'

discretizer = Rumale::Preprocessing::BinDiscretizer.new(n_bins: 4)
samples = Numo::DFloat.new(5, 2).rand - 0.5
transformed = discretizer.fit_transform(samples)
# > pp samples
# Numo::DFloat#shape=[5,2]
# [[-0.438246, -0.126933],
#  [ 0.294815, -0.298958],
#  [-0.383959, -0.155968],
#  [ 0.039948,  0.237815],
#  [-0.334911, -0.449117]]
# > pp transformed
# Numo::DFloat#shape=[5,2]
# [[0, 1],
#  [3, 0],
#  [0, 1],
#  [2, 3],
#  [0, 0]]

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Constructor Details

#initialize(n_bins: 32) ⇒ BinDiscretizer

Create a new discretizer for features with given number of bins.

Parameters:

  • n_bins (Integer) (defaults to: 32)

    The number of bins to be used disretizing feature values.



42
43
44
45
# File 'rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb', line 42

def initialize(n_bins: 32)
  super()
  @params = { n_bins: n_bins }
end

Instance Attribute Details

#feature_stepsArray<Numo::DFloat> (readonly)

Return the feature steps to be used discretizing.

Returns:

  • (Array<Numo::DFloat>)

    (shape: [n_features, n_bins])



37
38
39
# File 'rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb', line 37

def feature_steps
  @feature_steps
end

Instance Method Details

#fit(x) ⇒ BinDiscretizer

Fit feature ranges to be discretized.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to calculate the feature ranges.

Returns:



53
54
55
56
57
58
59
60
61
62
63
# File 'rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb', line 53

def fit(x, _y = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  n_features = x.shape[1]
  max_vals = x.max(0)
  min_vals = x.min(0)
  @feature_steps = Array.new(n_features) do |n|
    Numo::DFloat.linspace(min_vals[n], max_vals[n], @params[:n_bins] + 1)[0...@params[:n_bins]]
  end
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit feature ranges to be discretized, then return discretized samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be discretized.

Returns:

  • (Numo::DFloat)

    The discretized samples.



71
72
73
74
75
# File 'rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb', line 71

def fit_transform(x, _y = nil)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  fit(x).transform(x)
end

#transform(x) ⇒ Numo::DFloat

Peform discretizing the given samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be discretized.

Returns:

  • (Numo::DFloat)

    The discretized samples.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'rumale-preprocessing/lib/rumale/preprocessing/bin_discretizer.rb', line 81

def transform(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  n_samples, n_features = x.shape
  transformed = Numo::DFloat.zeros(n_samples, n_features)
  n_features.times do |n|
    steps = @feature_steps[n]
    @params[:n_bins].times do |bin|
      mask = x[true, n].ge(steps[bin]).where
      transformed[mask, n] = bin
    end
  end
  transformed
end