Class: Rumale::Preprocessing::OneHotEncoder

Inherits:
Base::Estimator show all
Includes:
Base::Transformer
Defined in:
rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb

Overview

Encode categorical integer features to one-hot-vectors.

Examples:

require 'rumale/preprocessing/one_hot_encoder'

encoder = Rumale::Preprocessing::OneHotEncoder.new
labels = Numo::Int32[0, 0, 2, 3, 2, 1]
one_hot_vectors = encoder.fit_transform(labels)
# > pp one_hot_vectors
# Numo::DFloat#shape[6, 4]
# [[1, 0, 0, 0],
#  [1, 0, 0, 0],
#  [0, 0, 1, 0],
#  [0, 0, 0, 1],
#  [0, 0, 1, 0],
#  [0, 1, 0, 0]]

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Constructor Details

#initializeOneHotEncoder

Create a new encoder for encoding categorical integer features to one-hot-vectors



40
41
42
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 40

def initialize # rubocop:disable Lint/UselessMethodDefinition
  super
end

Instance Attribute Details

#active_featuresNimo::Int32 (readonly)

Return the indices for feature values that actually occur in the training set.

Returns:

  • (Nimo::Int32)


33
34
35
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 33

def active_features
  @active_features
end

#feature_indicesNumo::Int32 (readonly)

Return the indices to feature ranges.

Returns:

  • (Numo::Int32)

    (shape: [n_features + 1])



37
38
39
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 37

def feature_indices
  @feature_indices
end

#n_valuesNumo::Int32 (readonly)

Return the maximum values for each feature.

Returns:

  • (Numo::Int32)

    (shape: [n_features])



29
30
31
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 29

def n_values
  @n_values
end

Instance Method Details

#fit(x) ⇒ OneHotEncoder

Fit one-hot-encoder to samples.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to fit one-hot-encoder.

Returns:

Raises:

  • (ArgumentError)


49
50
51
52
53
54
55
56
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 49

def fit(x, _y = nil)
  raise ArgumentError, 'Expected the input samples only consists of non-negative integer values.' if x.lt(0).any?

  @n_values = x.max(0) + 1
  @feature_indices = Numo::Int32.hstack([[0], @n_values]).cumsum
  @active_features = encode(x, @feature_indices).sum(axis: 0).ne(0).where
  self
end

#fit_transform(x) ⇒ Numo::DFloat

Fit one-hot-encoder to samples, then encode samples into one-hot-vectors

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.

Raises:

  • (ArgumentError)


64
65
66
67
68
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 64

def fit_transform(x, _y = nil)
  raise ArgumentError, 'Expected the input samples only consists of non-negative integer values.' if x.lt(0).any?

  fit(x).transform(x)
end

#transform(x) ⇒ Numo::DFloat

Encode samples into one-hot-vectors.

Parameters:

  • x (Numo::Int32)

    (shape: [n_samples, n_features]) The samples to encode into one-hot-vectors.

Returns:

  • (Numo::DFloat)

    The one-hot-vectors.

Raises:

  • (ArgumentError)


74
75
76
77
78
79
# File 'rumale-preprocessing/lib/rumale/preprocessing/one_hot_encoder.rb', line 74

def transform(x)
  raise ArgumentError, 'Expected the input samples only consists of non-negative integer values.' if x.lt(0).any?

  codes = encode(x, @feature_indices)
  codes[true, @active_features].dup
end