Class: Rumale::FeatureExtraction::HashVectorizer
- Inherits:
-
Base::Estimator
- Object
- Base::Estimator
- Rumale::FeatureExtraction::HashVectorizer
- Includes:
- Base::Transformer
- Defined in:
- rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb
Overview
Encode array of feature-value hash to vectors. This encoder turns array of mappings (Array<Hash>) with pairs of feature names and values into Numo::NArray.
Instance Attribute Summary collapse
-
#feature_names ⇒ Array
readonly
Return the list of feature names.
-
#vocabulary ⇒ Hash
readonly
Return the hash consisting of pairs of feature names and indices.
Attributes inherited from Base::Estimator
Instance Method Summary collapse
-
#fit(x) ⇒ HashVectorizer
Fit the encoder with given training data.
-
#fit_transform(x) ⇒ Numo::DFloat
Fit the encoder with given training data, then return encoded data.
-
#initialize(separator: '=', sort: true) ⇒ HashVectorizer
constructor
Create a new encoder for converting array of hash consisting of feature names and values to vectors.
-
#inverse_transform(x) ⇒ Array<Hash>
Decode sample matirx to the array of feature-value hash.
-
#transform(x) ⇒ Numo::DFloat
Encode given the array of feature-value hash.
Constructor Details
#initialize(separator: '=', sort: true) ⇒ HashVectorizer
Create a new encoder for converting array of hash consisting of feature names and values to vectors.
55 56 57 58 59 60 61 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 55 def initialize(separator: '=', sort: true) super() @params = { separator: separator, sort: sort } end |
Instance Attribute Details
#feature_names ⇒ Array (readonly)
Return the list of feature names.
45 46 47 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 45 def feature_names @feature_names end |
#vocabulary ⇒ Hash (readonly)
Return the hash consisting of pairs of feature names and indices.
49 50 51 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 49 def vocabulary @vocabulary end |
Instance Method Details
#fit(x) ⇒ HashVectorizer
Fit the encoder with given training data.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 68 def fit(x, _y = nil) @feature_names = [] @vocabulary = {} x.each do |f| f.each do |k, v| k = :"#{k}#{separator}#{v}" if v.is_a?(String) next if @vocabulary.key?(k) @feature_names.push(k) @vocabulary[k] = @vocabulary.size end end if sort_feature? @feature_names.sort! @feature_names.each_with_index { |k, i| @vocabulary[k] = i } end self end |
#fit_transform(x) ⇒ Numo::DFloat
Fit the encoder with given training data, then return encoded data.
95 96 97 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 95 def fit_transform(x, _y = nil) fit(x).transform(x) end |
#inverse_transform(x) ⇒ Array<Hash>
Decode sample matirx to the array of feature-value hash.
126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 126 def inverse_transform(x) n_samples = x.shape[0] reconst = [] n_samples.times do |i| f = {} x[i, true].each_with_index do |el, j| feature_key_val(@feature_names[j], el).tap { |k, v| f[k.to_sym] = v } unless el.zero? end reconst.push(f) end reconst end |
#transform(x) ⇒ Numo::DFloat
Encode given the array of feature-value hash.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/hash_vectorizer.rb', line 103 def transform(x) x = [x] unless x.is_a?(Array) n_samples = x.size n_features = @vocabulary.size z = Numo::DFloat.zeros(n_samples, n_features) x.each_with_index do |f, i| f.each do |k, v| if v.is_a?(String) k = :"#{k}#{separator}#{v}" v = 1 end z[i, @vocabulary[k]] = v if @vocabulary.key?(k) end end z end |