Class: Rumale::FeatureExtraction::TfidfTransformer
- Inherits:
-
Base::Estimator
- Object
- Base::Estimator
- Rumale::FeatureExtraction::TfidfTransformer
- Includes:
- Base::Transformer
- Defined in:
- rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb
Overview
Transform sample matrix with term frequecy (tf) to a normalized tf-idf (inverse document frequency) reprensentation.
Reference
-
Manning, C D., Raghavan, P., and Schutze, H., “Introduction to Information Retrieval,” Cambridge University Press., 2008.
Instance Attribute Summary collapse
-
#idf ⇒ Numo::DFloat
readonly
Return the vector consists of inverse document frequency.
Attributes inherited from Base::Estimator
Instance Method Summary collapse
-
#fit(x) ⇒ TfidfTransformer
Calculate the inverse document frequency for weighting.
-
#fit_transform(x) ⇒ Numo::DFloat
Calculate the idf values, and then transfrom samples to the tf-idf representation.
-
#initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false) ⇒ TfidfTransformer
constructor
Create a new transfomer for converting tf vectors to tf-idf vectors.
-
#transform(x) ⇒ Numo::DFloat
Perform transforming the given samples to the tf-idf representation.
Constructor Details
#initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false) ⇒ TfidfTransformer
Create a new transfomer for converting tf vectors to tf-idf vectors.
49 50 51 52 53 54 55 56 57 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 49 def initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false) super() @params = { norm: norm, use_idf: use_idf, smooth_idf: smooth_idf, sublinear_tf: sublinear_tf } end |
Instance Attribute Details
#idf ⇒ Numo::DFloat (readonly)
Return the vector consists of inverse document frequency.
41 42 43 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 41 def idf @idf end |
Instance Method Details
#fit(x) ⇒ TfidfTransformer
Calculate the inverse document frequency for weighting.
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 65 def fit(x, _y = nil) return self unless @params[:use_idf] n_samples = x.shape[0] df = x.class.cast(x.gt(0.0).count(0)) if @params[:smooth_idf] df += 1 n_samples += 1 end @idf = Numo::NMath.log(n_samples / df) + 1 self end |
#fit_transform(x) ⇒ Numo::DFloat
Calculate the idf values, and then transfrom samples to the tf-idf representation.
87 88 89 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 87 def fit_transform(x, _y = nil) fit(x).transform(x) end |
#transform(x) ⇒ Numo::DFloat
Perform transforming the given samples to the tf-idf representation.
95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 95 def transform(x) z = x.dup z[z.ne(0)] = Numo::NMath.log(z[z.ne(0)]) + 1 if @params[:sublinear_tf] z *= @idf if @params[:use_idf] case @params[:norm] when 'l2' ::Rumale::Utils.normalize(z, 'l2') when 'l1' ::Rumale::Utils.normalize(z, 'l1') else z end end |