Class: Rumale::Tree::GradientTreeRegressor

Inherits:
Base::Estimator show all
Includes:
Base::Regressor, ExtGradientTreeRegressor
Defined in:
rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb

Overview

GradientTreeRegressor is a class that implements decision tree for regression with exact gredy algorithm. This class is used internally for estimators with gradient tree boosting.

Reference

  • Friedman, J H., “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, 29 (5), pp. 1189–1232, 2001.

  • Friedman, J H., “Stochastic Gradient Boosting,” Computational Statistics and Data Analysis, 38 (4), pp. 367–378, 2002.

  • Chen, T., and Guestrin, C., “XGBoost: A Scalable Tree Boosting System,” Proc. KDD’16, pp. 785–794, 2016.

Instance Attribute Summary collapse

Attributes inherited from Base::Estimator

#params

Instance Method Summary collapse

Methods included from Base::Regressor

#score

Constructor Details

#initialize(reg_lambda: 0.0, shrinkage_rate: 1.0, max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil) ⇒ GradientTreeRegressor

Initialize a gradient tree regressor

Parameters:

  • reg_lambda (Float) (defaults to: 0.0)

    The L2 regularization term on weight.

  • shrinkage_rate (Float) (defaults to: 1.0)

    The shrinkage rate for weight.

  • max_depth (Integer) (defaults to: nil)

    The maximum depth of the tree. If nil is given, decision tree grows without concern for depth.

  • max_leaf_nodes (Integer) (defaults to: nil)

    The maximum number of leaves on decision tree. If nil is given, number of leaves is not limited.

  • min_samples_leaf (Integer) (defaults to: 1)

    The minimum number of samples at a leaf node.

  • max_features (Integer) (defaults to: nil)

    The number of features to consider when searching optimal split point. If nil is given, split process considers all features.

  • random_seed (Integer) (defaults to: nil)

    The seed value using to initialize the random generator. It is used to randomly determine the order of features when deciding spliting point.



52
53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 52

def initialize(reg_lambda: 0.0, shrinkage_rate: 1.0,
               max_depth: nil, max_leaf_nodes: nil, min_samples_leaf: 1, max_features: nil, random_seed: nil)
  super()
  @params = {
    reg_lambda: reg_lambda,
    shrinkage_rate: shrinkage_rate,
    max_depth: max_depth,
    max_leaf_nodes: max_leaf_nodes,
    min_samples_leaf: min_samples_leaf,
    max_features: max_features,
    random_seed: random_seed || srand
  }
  @rng = Random.new(@params[:random_seed])
end

Instance Attribute Details

#feature_importancesNumo::DFloat (readonly)

Return the importance for each feature. The feature importances are calculated based on the numbers of times the feature is used for splitting.

Returns:

  • (Numo::DFloat)

    (shape: [n_features])



25
26
27
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 25

def feature_importances
  @feature_importances
end

#leaf_weightsNumo::DFloat (readonly)

Return the values assigned each leaf.

Returns:

  • (Numo::DFloat)

    (shape: [n_leaves])



37
38
39
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 37

def leaf_weights
  @leaf_weights
end

#rngRandom (readonly)

Return the random generator for random selection of feature index.

Returns:

  • (Random)


33
34
35
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 33

def rng
  @rng
end

#treeNode (readonly)

Return the learned tree.

Returns:



29
30
31
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 29

def tree
  @tree
end

Instance Method Details

#apply(x) ⇒ Numo::Int32

Return the index of the leaf that each sample reached.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the labels.

Returns:

  • (Numo::Int32)

    (shape: [n_samples]) Leaf index for sample.



106
107
108
109
110
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 106

def apply(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  Numo::Int32[*(Array.new(x.shape[0]) { |n| partial_apply(@tree, x[n, true]) })]
end

#fit(x, y, g, h) ⇒ GradientTreeRegressor

Fit the model with given training data.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The training data to be used for fitting the model.

  • y (Numo::DFloat)

    (shape: [n_samples]) The taget values to be used for fitting the model.

  • g (Numo::DFloat)

    (shape: [n_samples]) The gradient of loss function.

  • h (Numo::DFloat)

    (shape: [n_samples]) The hessian of loss function.

Returns:



74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 74

def fit(x, y, g, h)
  x = ::Rumale::Validation.check_convert_sample_array(x)
  y = ::Rumale::Validation.check_convert_target_value_array(y)
  ::Rumale::Validation.check_sample_size(x, y)

  # Initialize some variables.
  n_features = x.shape[1]
  @params[:max_features] ||= n_features
  @n_leaves = 0
  @leaf_weights = []
  @feature_importances = Numo::DFloat.zeros(n_features)
  @sub_rng = @rng.dup
  # Build tree.
  build_tree(x, y, g, h)
  @leaf_weights = Numo::DFloat[*@leaf_weights]
  self
end

#predict(x) ⇒ Numo::DFloat

Predict values for samples.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The samples to predict the values.

Returns:

  • (Numo::DFloat)

    (size: n_samples) Predicted values per sample.



96
97
98
99
100
# File 'rumale-tree/lib/rumale/tree/gradient_tree_regressor.rb', line 96

def predict(x)
  x = ::Rumale::Validation.check_convert_sample_array(x)

  @leaf_weights[apply(x)].dup
end