Class: Rumale::Ensemble::StackingClassifier

Inherits:
Base::Estimator show all
Includes:
Base::Classifier
Defined in:
rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb

Overview

StackingClassifier is a class that implements classifier with stacking method.

Reference

  • Zhou, Z-H., “Ensemble Methods - Foundations and Algorithms,” CRC Press Taylor and Francis Group, Chapman and Hall/CRC, 2012.

Examples:

require 'rumale/ensemble/stacking_classifier'

estimators = {
  lgr: Rumale::LinearModel::LogisticRegression.new(reg_param: 1e-2),
  mlp: Rumale::NeuralNetwork::MLPClassifier.new(hidden_units: [256], random_seed: 1),
  rnd: Rumale::Ensemble::RandomForestClassifier.new(random_seed: 1)
}
meta_estimator = Rumale::LinearModel::LogisticRegression.new
classifier = Rumale::Ensemble::StackedClassifier.new(
  estimators: estimators, meta_estimator: meta_estimator, random_seed: 1
)
classifier.fit(training_samples, training_labels)
results = classifier.predict(testing_samples)

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Methods included from Base::Classifier

#score

Constructor Details

#initialize(estimators:, meta_estimator: nil, n_splits: 5, shuffle: true, stack_method: 'auto', passthrough: false, random_seed: nil) ⇒ StackingClassifier

Create a new classifier with stacking method.

Parameters:

  • estimators (Hash<Symbol,Classifier>)

    The base classifiers for extracting meta features.

  • meta_estimator (Classifier/Nil) (defaults to: nil)

    The meta classifier that predicts class label. If nil is given, LogisticRegression is used.

  • n_splits (Integer) (defaults to: 5)

    The number of folds for cross validation with stratified k-fold on meta feature extraction in training phase.

  • shuffle (Boolean) (defaults to: true)

    The flag indicating whether to shuffle the dataset on cross validation.

  • stack_method (String) (defaults to: 'auto')

    The method name of base classifier for using meta feature extraction. If ‘auto’ is given, it searches the callable method in the order ‘predict_proba’, ‘decision_function’, and ‘predict’ on each classifier.

  • passthrough (Boolean) (defaults to: false)

    The flag indicating whether to concatenate the original features and meta features when training the meta classifier.

  • random_seed (Integer/Nil) (defaults to: nil)

    The seed value using to initialize the random generator on cross validation.



62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 62

def initialize(estimators:, meta_estimator: nil, n_splits: 5, shuffle: true, stack_method: 'auto', passthrough: false,
               random_seed: nil)
  super()
  @estimators = estimators
  @meta_estimator = meta_estimator || ::Rumale::LinearModel::LogisticRegression.new
  @params = {
    n_splits: n_splits,
    shuffle: shuffle,
    stack_method: stack_method,
    passthrough: passthrough,
    random_seed: random_seed || srand
  }
end

Instance Attribute Details

#classesNumo::Int32 (readonly)

Return the class labels.

Returns:

  • (Numo::Int32)

    (size: n_classes)



44
45
46
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 44

def classes
  @classes
end

#estimatorsHash<Symbol,Classifier> (readonly)

Return the base classifiers.

Returns:

  • (Hash<Symbol,Classifier>)


36
37
38
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 36

def estimators
  @estimators
end

#meta_estimatorClassifier (readonly)

Return the meta classifier.

Returns:

  • (Classifier)


40
41
42
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 40

def meta_estimator
  @meta_estimator
end

#stack_methodHash<Symbol,Symbol> (readonly)

Return the method used by each base classifier.

Returns:

  • (Hash<Symbol,Symbol>)


48
49
50
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 48

def stack_method
  @stack_method
end

Instance Method Details

#decision_function(x) ⇒ Numo::DFloat

Calculate confidence scores for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to compute the scores.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_classes]) The confidence score per sample.



134
135
136
137
138
139
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 134

def decision_function(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  z = transform(x)
  @meta_estimator.decision_function(z)
end

#fit(x, y) ⇒ StackedClassifier

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model.

  • y (Numo::Int32)

    (shape: [n_samples]) The labels to be used for fitting the model.

Returns:

  • (StackedClassifier)

    The learned classifier itself.



81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 81

def fit(x, y)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  y = ::Rumale::Validation.check_convert_label_array(y)
  ::Rumale::Validation.check_sample_size(x, y)

  n_samples, n_features = x.shape

  @encoder = ::Rumale::Preprocessing::LabelEncoder.new
  y_encoded = @encoder.fit_transform(y)
  @classes = Numo::NArray[*@encoder.classes]

  # training base classifiers with all training data.
  @estimators.each_key { |name| @estimators[name].fit(x, y_encoded) }

  # detecting feature extraction method and its size of output for each base classifier.
  @stack_method = detect_stack_method
  @output_size = detect_output_size(n_features)

  # extracting meta features with base classifiers.
  n_components = @output_size.values.sum
  z = Numo::DFloat.zeros(n_samples, n_components)

  kf = ::Rumale::ModelSelection::StratifiedKFold.new(
    n_splits: @params[:n_splits], shuffle: @params[:shuffle], random_seed: @params[:random_seed]
  )

  kf.split(x, y_encoded).each do |train_ids, valid_ids|
    x_train = x[train_ids, true]
    y_train = y_encoded[train_ids]
    x_valid = x[valid_ids, true]
    f_start = 0
    @estimators.each_key do |name|
      est_fold = Marshal.load(Marshal.dump(@estimators[name]))
      f_last = f_start + @output_size[name]
      f_position = @output_size[name] == 1 ? f_start : f_start...f_last
      z[valid_ids, f_position] = est_fold.fit(x_train, y_train).public_send(@stack_method[name], x_valid)
      f_start = f_last
    end
  end

  # concatenating original features.
  z = Numo::NArray.hstack([z, x]) if @params[:passthrough]

  # training meta classifier.
  @meta_estimator.fit(z, y_encoded)

  self
end

#fit_transform(x, y) ⇒ Numo::DFloat

Fit the model with training data, and then transform them with the learned model.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model.

  • y (Numo::Int32)

    (shape: [n_samples]) The labels to be used for fitting the model.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components]) The meta features for training data.



189
190
191
192
193
194
195
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 189

def fit_transform(x, y)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  y = ::Rumale::Validation.check_convert_label_array(y)
  ::Rumale::Validation.check_sample_size(x, y)

  fit(x, y).transform(x)
end

#predict(x) ⇒ Numo::Int32

Predict class labels for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the labels.

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) The predicted class label per sample.



145
146
147
148
149
150
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 145

def predict(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  z = transform(x)
  Numo::Int32.cast(@encoder.inverse_transform(@meta_estimator.predict(z)))
end

#predict_proba(x) ⇒ Numo::DFloat

Predict probability for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the probabilities.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_classes]) The predicted probability of each class per sample.



156
157
158
159
160
161
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 156

def predict_proba(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  z = transform(x)
  @meta_estimator.predict_proba(z)
end

#transform(x) ⇒ Numo::DFloat

Transform the given data with the learned model.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to be transformed with the learned model.

Returns:

  • (Numo::DFloat)

    (shape: [n_samples, n_components]) The meta features for samples.



167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'rumale-ensemble/lib/rumale/ensemble/stacking_classifier.rb', line 167

def transform(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  n_samples = x.shape[0]
  n_components = @output_size.values.sum
  z = Numo::DFloat.zeros(n_samples, n_components)
  f_start = 0
  @estimators.each_key do |name|
    f_last = f_start + @output_size[name]
    f_position = @output_size[name] == 1 ? f_start : f_start...f_last
    z[true, f_position] = @estimators[name].public_send(@stack_method[name], x)
    f_start = f_last
  end
  z = Numo::NArray.hstack([z, x]) if @params[:passthrough]
  z
end