Class: SentencePiece::SentencePieceProcessor

Inherits:
Object
  • Object
show all
Defined in:
ext/sentencepiece/dummy.rb

Overview

This class is a wrapper of SentencePieceProcessor class in SentencePiece.

Examples:

require 'sentencepiece'

sp = SentencePiece::SentencePieceProcessor.new(model_file: '/path/to/model_file.model')

sp.encode('Hello world!')
# => [151, 88, 21, 887, 147]

sp.decode([151, 88, 21, 887, 147])
# => "Hello world!"

Instance Method Summary collapse

Constructor Details

#initialize(model_file: nil) ⇒ SentencePieceProcessor

Creates a new SentencePieceProcessor instance.

Parameters:

  • model_file (String) (defaults to: nil)

    The path to the model file.

Raises:



60
# File 'ext/sentencepiece/dummy.rb', line 60

def initialize(model_file: nil); end

Instance Method Details

#bos_idInteger

Returns BOS (<s>) id.

Returns:

  • (Integer)


196
# File 'ext/sentencepiece/dummy.rb', line 196

def bos_id(); end

#decode(ids, out_type: 'int') ⇒ String

Decodes a list of ids or sentence pieces into a text.

Parameters:

  • ids (Array<Integer>/Array<String>)

    The ids or sentence pieces to decode.

  • out_type (String) (defaults to: 'int')

    The output type (‘int’ or ‘str’).

Returns:

  • (String)

Raises:



145
# File 'ext/sentencepiece/dummy.rb', line 145

def decode(ids, out_type: 'int'); end

#decode_ids(ids) ⇒ String

Decodes a list of ids into a text.

Parameters:

  • ids (Array<Integer>)

    The ids to decode.

Returns:

  • (String)


151
# File 'ext/sentencepiece/dummy.rb', line 151

def decode_ids(ids); end

#decode_ids_as_serialized_proto(ids) ⇒ String

Decodes a list of ids into a serialized proto.

Parameters:

  • ids (Array<Integer>)

    The ids to decode.

Returns:

  • (String)


157
# File 'ext/sentencepiece/dummy.rb', line 157

def decode_ids_as_serialized_proto(ids); end

#decode_pieces(pieces) ⇒ String

Decodes a list of sentence pieces into a text.

Parameters:

  • pieces (Array<String>)

    The sentence pieces to decode.

Returns:

  • (String)


163
# File 'ext/sentencepiece/dummy.rb', line 163

def decode_pieces(pieces); end

#decode_pieces_as_serialized_proto(pieces) ⇒ String

Decodes a list of sentence pieces into a serialized proto.

Parameters:

  • pieces (Array<String>)

    The sentence pieces to decode.

Returns:

  • (String)


169
# File 'ext/sentencepiece/dummy.rb', line 169

def decode_pieces_as_serialized_proto(pieces); end

#encode(text, out_type: 'int') ⇒ Array<Integer>/Array<String>

Encodes a text into a list of ids or sentence pieces.

Parameters:

  • text (String/Array<String>)

    The text to encode.

  • out_type (String) (defaults to: 'int')

    The output type (‘int’ or ‘str’).

Returns:

  • (Array<Integer>/Array<String>)

Raises:



74
# File 'ext/sentencepiece/dummy.rb', line 74

def encode(text, out_type: 'int'); end

#encode_as_ids(text) ⇒ Array<Integer>

Encodes a text into a list of ids.

Parameters:

  • text (String)

    The text to encode.

Returns:

  • (Array<Integer>)


80
# File 'ext/sentencepiece/dummy.rb', line 80

def encode_as_ids(text); end

#encode_as_pieces(text) ⇒ Array<String>

Encodes a text into a list of sentence pieces.

Parameters:

  • text (String)

    The text to encode.

Returns:

  • (Array<String>)


86
# File 'ext/sentencepiece/dummy.rb', line 86

def encode_as_pieces(text); end

#encode_as_serialized_proto(text) ⇒ String

Encodes a text into a serialized proto.

Parameters:

  • text (String)

    The text to encode.

Returns:

  • (String)


92
# File 'ext/sentencepiece/dummy.rb', line 92

def encode_as_serialized_proto(text); end

#eos_idObject

Returns EOS (</s>) id.



199
# File 'ext/sentencepiece/dummy.rb', line 199

def eos_id(); end

#id_to_piece(id) ⇒ String

Returns the string representation of vocab id.

Parameters:

  • id (Integer)

    The id to convert.

Returns:

  • (String)


175
# File 'ext/sentencepiece/dummy.rb', line 175

def id_to_piece(id); end

#load(model_file) ⇒ Object

Loads a SentencePiece model.

Parameters:

  • model_file (String)

    The path to the model file.

Raises:



66
# File 'ext/sentencepiece/dummy.rb', line 66

def load(model_file); end

#nbest_encode_as_ids(text, nbest_size:) ⇒ Array<Integer>

Encodes a text into a list of ids with nbest results.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

Returns:

  • (Array<Integer>)


99
# File 'ext/sentencepiece/dummy.rb', line 99

def nbest_encode_as_ids(text, nbest_size:); end

#nbest_encode_as_pieces(text, nbest_size:) ⇒ Array<String>

Encodes a text into a list of sentence pieces with nbest results.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

Returns:

  • (Array<String>)


106
# File 'ext/sentencepiece/dummy.rb', line 106

def nbest_encode_as_pieces(text, nbest_size:); end

#nbest_encode_as_serialized_proto(text, nbest_size:) ⇒ String

Encodes a text into a serialized proto with nbest results.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

Returns:

  • (String)


113
# File 'ext/sentencepiece/dummy.rb', line 113

def nbest_encode_as_serialized_proto(text, nbest_size:); end

#pad_idObject

Returns PAD (<pad>) id.



202
# File 'ext/sentencepiece/dummy.rb', line 202

def pad_id(); end

#piece_sizeInteger

Returns the number of sentence pieces.

Returns:

  • (Integer)


186
# File 'ext/sentencepiece/dummy.rb', line 186

def piece_size(); end

#piece_to_id(piece) ⇒ Integer

Returns the vocab id of the sentence piece.

Parameters:

  • piece (String)

    The sentence piece to convert.

Returns:

  • (Integer)


181
# File 'ext/sentencepiece/dummy.rb', line 181

def piece_to_id(piece); end

#sample_encode_as_ids(text, nbest_size:, alpha:) ⇒ Array<Integer>

Encodes a text into a list of ids by sampling mode.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

  • alpha (Float)

    The smoothing parameter.

Returns:

  • (Array<Integer>)


121
# File 'ext/sentencepiece/dummy.rb', line 121

def sample_encode_as_ids(text, nbest_size:, alpha:); end

#sample_encode_as_pieces(text, nbest_size:, alpha:) ⇒ Array<String>

Encodes a text into a list of sentence pieces by sampling mode.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

  • alpha (Float)

    The smoothing parameter.

Returns:

  • (Array<String>)


129
# File 'ext/sentencepiece/dummy.rb', line 129

def sample_encode_as_pieces(text, nbest_size:, alpha:); end

#sample_encode_as_serialized_proto(text, nbest_size:, alpha:) ⇒ String

Encodes a text into a serialized proto by sampling mode.

Parameters:

  • text (String)

    The text to encode.

  • nbest_size (Integer)

    The nbest size.

  • alpha (Float)

    The smoothing parameter.

Returns:

  • (String)


137
# File 'ext/sentencepiece/dummy.rb', line 137

def sample_encode_as_serialized_proto(text, nbest_size:, alpha:); end

#unk_idInteger

Returns unknown (<unk>) id.

Returns:

  • (Integer)


191
# File 'ext/sentencepiece/dummy.rb', line 191

def unk_id(); end