Class: Rumale::ModelSelection::CrossValidation

Inherits:
Object
  • Object
show all
Defined in:
rumale-model_selection/lib/rumale/model_selection/cross_validation.rb

Overview

CrossValidation is a class that evaluates a given classifier with cross-validation method.

Examples:

require 'rumale/linear_model'
require 'rumale/model_selection/stratified_k_fold'
require 'rumale/model_selection/cross_validation'

svc = Rumale::LinearModel::SVC.new
kf = Rumale::ModelSelection::StratifiedKFold.new(n_splits: 5)
cv = Rumale::ModelSelection::CrossValidation.new(estimator: svc, splitter: kf)
report = cv.perform(samples, labels)
mean_test_score = report[:test_score].inject(:+) / kf.n_splits

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(estimator: nil, splitter: nil, evaluator: nil, return_train_score: false) ⇒ CrossValidation

Create a new evaluator with cross-validation method.

Parameters:

  • estimator (Classifier) (defaults to: nil)

    The classifier of which performance is evaluated.

  • splitter (Splitter) (defaults to: nil)

    The splitter that divides dataset to training and testing dataset.

  • evaluator (Evaluator) (defaults to: nil)

    The evaluator that calculates score of estimator results.

  • return_train_score (Boolean) (defaults to: false)

    The flag indicating whether to calculate the score of training dataset.



44
45
46
47
48
49
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 44

def initialize(estimator: nil, splitter: nil, evaluator: nil, return_train_score: false)
  @estimator = estimator
  @splitter = splitter
  @evaluator = evaluator
  @return_train_score = return_train_score
end

Instance Attribute Details

#estimatorClassifier (readonly)

Return the classifier of which performance is evaluated.

Returns:

  • (Classifier)


24
25
26
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 24

def estimator
  @estimator
end

#evaluatorEvaluator (readonly)

Return the evaluator that calculates score.

Returns:

  • (Evaluator)


32
33
34
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 32

def evaluator
  @evaluator
end

#return_train_scoreBoolean (readonly)

Return the flag indicating whether to caculate the score of training dataset.

Returns:

  • (Boolean)


36
37
38
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 36

def return_train_score
  @return_train_score
end

#splitterSplitter (readonly)

Return the splitter that divides dataset.

Returns:

  • (Splitter)


28
29
30
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 28

def splitter
  @splitter
end

Instance Method Details

#perform(x, y) ⇒ Hash

Perform the evalution of given classifier with cross-validation method.

Parameters:

  • x (Numo::DFloat)

    (shape: [n_samples, n_features]) The dataset to be used to evaluate the estimator.

  • y (Numo::Int32 / Numo::DFloat)

    (shape: [n_samples] / [n_samples, n_outputs]) The labels to be used to evaluate the classifier / The target values to be used to evaluate the regressor.

Returns:

  • (Hash)

    The report summarizing the results of cross-validation.

    • :fit_time (Array<Float>) The calculation times of fitting the estimator for each split.

    • :test_score (Array<Float>) The scores of testing dataset for each split.

    • :train_score (Array<Float>) The scores of training dataset for each split. This option is nil if the return_train_score is false.



62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'rumale-model_selection/lib/rumale/model_selection/cross_validation.rb', line 62

def perform(x, y)
  # Initialize the report of cross validation.
  report = { test_score: [], train_score: nil, fit_time: [] }
  report[:train_score] = [] if @return_train_score
  # Evaluate the estimator on each split.
  @splitter.split(x, y).each do |train_ids, test_ids|
    # Split dataset into training and testing dataset.
    feature_ids = !kernel_machine? || train_ids
    train_x = x[train_ids, feature_ids]
    train_y = y.shape[1].nil? ? y[train_ids] : y[train_ids, true]
    test_x = x[test_ids, feature_ids]
    test_y = y.shape[1].nil? ? y[test_ids] : y[test_ids, true]
    # Fit the estimator.
    start_time = Time.now.to_i
    @estimator.fit(train_x, train_y)
    # Calculate scores and prepare the report.
    report[:fit_time].push(Time.now.to_i - start_time)
    if @evaluator.nil?
      report[:test_score].push(@estimator.score(test_x, test_y))
      report[:train_score].push(@estimator.score(train_x, train_y)) if @return_train_score
    elsif log_loss?
      report[:test_score].push(@evaluator.score(test_y, @estimator.predict_proba(test_x)))
      if @return_train_score
        report[:train_score].push(@evaluator.score(train_y,
                                                   @estimator.predict_proba(train_x)))
      end
    else
      report[:test_score].push(@evaluator.score(test_y, @estimator.predict(test_x)))
      report[:train_score].push(@evaluator.score(train_y, @estimator.predict(train_x))) if @return_train_score
    end
  end
  report
end