Class: Rumale::Ensemble::GradientBoostingClassifier
- Inherits:
-
Base::Estimator
- Object
- Base::Estimator
- Rumale::Ensemble::GradientBoostingClassifier
- Includes:
- Base::Classifier
- Defined in:
- rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb
Overview
GradientBoostingClassifier is a class that implements gradient tree boosting for classification. The class use negative binomial log-likelihood for the loss function. For multiclass classification problem, it uses one-vs-the-rest strategy.
Reference
-
Friedman, J H., “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, 29 (5), pp. 1189–1232, 2001.
-
Friedman, J H., “Stochastic Gradient Boosting,” Computational Statistics and Data Analysis, 38 (4), pp. 367–378, 2002.
-
Chen, T., and Guestrin, C., “XGBoost: A Scalable Tree Boosting System,” Proc. KDD’16, pp. 785–794, 2016.
Instance Attribute Summary collapse
-
#classes ⇒ Numo::Int32
readonly
Return the class labels.
-
#estimators ⇒ Array<GradientTreeRegressor>
readonly
Return the set of estimators.
-
#feature_importances ⇒ Numo::DFloat
readonly
Return the importance for each feature.
-
#rng ⇒ Random
readonly
Return the random generator for random selection of feature index.
Attributes inherited from Base::Estimator
Instance Method Summary collapse
-
#apply(x) ⇒ Numo::Int32
Return the index of the leaf that each sample reached.
-
#decision_function(x) ⇒ Numo::DFloat
Calculate confidence scores for samples.
-
#fit(x, y) ⇒ GradientBoostingClassifier
Fit the model with given training data.
-
#initialize(n_estimators: 100, learning_rate: 0.1, reg_lambda: 0.0, subsample: 1.0, max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, n_jobs: nil, random_seed: nil) ⇒ GradientBoostingClassifier
constructor
Create a new classifier with gradient tree boosting.
-
#predict(x) ⇒ Numo::Int32
Predict class labels for samples.
-
#predict_proba(x) ⇒ Numo::DFloat
Predict probability for samples.
Methods included from Base::Classifier
Constructor Details
#initialize(n_estimators: 100, learning_rate: 0.1, reg_lambda: 0.0, subsample: 1.0, max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, n_jobs: nil, random_seed: nil) ⇒ GradientBoostingClassifier
Create a new classifier with gradient tree boosting.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 68 def initialize(n_estimators: 100, learning_rate: 0.1, reg_lambda: 0.0, subsample: 1.0, max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, n_jobs: nil, random_seed: nil) super() @params = { n_estimators: n_estimators, learning_rate: learning_rate, reg_lambda: reg_lambda, subsample: subsample, max_depth: max_depth, max_leaf_nodes: max_leaf_nodes, min_samples_leaf: min_samples_leaf, max_features: max_features, n_jobs: n_jobs, random_seed: random_seed || srand } @rng = Random.new(@params[:random_seed]) end |
Instance Attribute Details
#classes ⇒ Numo::Int32 (readonly)
Return the class labels.
38 39 40 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 38 def classes @classes end |
#estimators ⇒ Array<GradientTreeRegressor> (readonly)
Return the set of estimators.
34 35 36 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 34 def estimators @estimators end |
#feature_importances ⇒ Numo::DFloat (readonly)
Return the importance for each feature. The feature importances are calculated based on the numbers of times the feature is used for splitting.
43 44 45 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 43 def feature_importances @feature_importances end |
#rng ⇒ Random (readonly)
Return the random generator for random selection of feature index.
47 48 49 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 47 def rng @rng end |
Instance Method Details
#apply(x) ⇒ Numo::Int32
Return the index of the leaf that each sample reached.
172 173 174 175 176 177 178 179 180 181 182 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 172 def apply(x) x = ::Rumale::Validation.check_convert_sample_array(x) n_classes = @classes.size leaf_ids = if n_classes > 2 Array.new(n_classes) { |n| @estimators[n].map { |tree| tree.apply(x) } } else @estimators.map { |tree| tree.apply(x) } end Numo::Int32[*leaf_ids].transpose.dup end |
#decision_function(x) ⇒ Numo::DFloat
Calculate confidence scores for samples.
127 128 129 130 131 132 133 134 135 136 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 127 def decision_function(x) x = ::Rumale::Validation.check_convert_sample_array(x) n_classes = @classes.size if n_classes > 2 multiclass_scores(x) else @estimators.sum { |tree| tree.predict(x) } + @base_predictions end end |
#fit(x, y) ⇒ GradientBoostingClassifier
Fit the model with given training data.
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 92 def fit(x, y) x = ::Rumale::Validation.check_convert_sample_array(x) y = ::Rumale::Validation.check_convert_label_array(y) ::Rumale::Validation.check_sample_size(x, y) # initialize some variables. n_features = x.shape[1] @params[:max_features] = n_features if @params[:max_features].nil? @params[:max_features] = [[1, @params[:max_features]].max, n_features].min # rubocop:disable Style/ComparableClamp @classes = Numo::Int32[*y.to_a.uniq.sort] n_classes = @classes.size # train estimator. if n_classes > 2 @base_predictions = multiclass_base_predictions(y) @estimators = multiclass_estimators(x, y) else negative_label = y.to_a.uniq.min bin_y = Numo::DFloat.cast(y.ne(negative_label)) * 2 - 1 y_mean = bin_y.mean @base_predictions = 0.5 * Numo::NMath.log((1.0 + y_mean) / (1.0 - y_mean)) @estimators = partial_fit(x, bin_y, @base_predictions) end # calculate feature importances. @feature_importances = if n_classes > 2 multiclass_feature_importances else @estimators.sum(&:feature_importances) end self end |
#predict(x) ⇒ Numo::Int32
Predict class labels for samples.
142 143 144 145 146 147 148 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 142 def predict(x) x = ::Rumale::Validation.check_convert_sample_array(x) n_samples = x.shape[0] probs = predict_proba(x) Numo::Int32.asarray(Array.new(n_samples) { |n| @classes[probs[n, true].max_index] }) end |
#predict_proba(x) ⇒ Numo::DFloat
Predict probability for samples.
154 155 156 157 158 159 160 161 162 163 164 165 166 |
# File 'rumale-ensemble/lib/rumale/ensemble/gradient_boosting_classifier.rb', line 154 def predict_proba(x) x = ::Rumale::Validation.check_convert_sample_array(x) proba = 1.0 / (Numo::NMath.exp(-decision_function(x)) + 1.0) return (proba.transpose / proba.sum(axis: 1)).transpose.dup if @classes.size > 2 n_samples, = x.shape probs = Numo::DFloat.zeros(n_samples, 2) probs[true, 1] = proba probs[true, 0] = 1.0 - proba probs end |