Class: Rumale::FeatureExtraction::TfidfTransformer
- Inherits:
- 
      Base::Estimator
      
        - Object
- Base::Estimator
- Rumale::FeatureExtraction::TfidfTransformer
 
- Includes:
- Base::Transformer
- Defined in:
- rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb
Overview
Transform sample matrix with term frequecy (tf) to a normalized tf-idf (inverse document frequency) reprensentation.
Reference
- 
Manning, C D., Raghavan, P., and Schutze, H., “Introduction to Information Retrieval,” Cambridge University Press., 2008. 
Instance Attribute Summary collapse
- 
  
    
      #idf  ⇒ Numo::DFloat 
    
    
  
  
  
  
    
      readonly
    
    
  
  
  
  
  
  
    Return the vector consists of inverse document frequency. 
Attributes inherited from Base::Estimator
Instance Method Summary collapse
- 
  
    
      #fit(x)  ⇒ TfidfTransformer 
    
    
  
  
  
  
  
  
  
  
  
    Calculate the inverse document frequency for weighting. 
- 
  
    
      #fit_transform(x)  ⇒ Numo::DFloat 
    
    
  
  
  
  
  
  
  
  
  
    Calculate the idf values, and then transfrom samples to the tf-idf representation. 
- 
  
    
      #initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false)  ⇒ TfidfTransformer 
    
    
  
  
  
    constructor
  
  
  
  
  
  
  
    Create a new transfomer for converting tf vectors to tf-idf vectors. 
- 
  
    
      #transform(x)  ⇒ Numo::DFloat 
    
    
  
  
  
  
  
  
  
  
  
    Perform transforming the given samples to the tf-idf representation. 
Constructor Details
#initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false) ⇒ TfidfTransformer
Create a new transfomer for converting tf vectors to tf-idf vectors.
| 49 50 51 52 53 54 55 56 57 | # File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 49 def initialize(norm: 'l2', use_idf: true, smooth_idf: false, sublinear_tf: false) super() @params = { norm: norm, use_idf: use_idf, smooth_idf: smooth_idf, sublinear_tf: sublinear_tf } end | 
Instance Attribute Details
#idf ⇒ Numo::DFloat (readonly)
Return the vector consists of inverse document frequency.
| 41 42 43 | # File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 41 def idf @idf end | 
Instance Method Details
#fit(x) ⇒ TfidfTransformer
Calculate the inverse document frequency for weighting.
| 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | # File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 65 def fit(x, _y = nil) return self unless @params[:use_idf] n_samples = x.shape[0] df = x.class.cast(x.gt(0.0).count(0)) if @params[:smooth_idf] df += 1 n_samples += 1 end @idf = Numo::NMath.log(n_samples / df) + 1 self end | 
#fit_transform(x) ⇒ Numo::DFloat
Calculate the idf values, and then transfrom samples to the tf-idf representation.
| 87 88 89 | # File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 87 def fit_transform(x, _y = nil) fit(x).transform(x) end | 
#transform(x) ⇒ Numo::DFloat
Perform transforming the given samples to the tf-idf representation.
| 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | # File 'rumale-feature_extraction/lib/rumale/feature_extraction/tfidf_transformer.rb', line 95 def transform(x) z = x.dup z[z.ne(0)] = Numo::NMath.log(z[z.ne(0)]) + 1 if @params[:sublinear_tf] z *= @idf if @params[:use_idf] case @params[:norm] when 'l2' ::Rumale::Utils.normalize(z, 'l2') when 'l1' ::Rumale::Utils.normalize(z, 'l1') else z end end |