Class: LLaMACpp::Context
- Inherits:
-
Object
- Object
- LLaMACpp::Context
- Defined in:
- ext/llama_cpp/dummy.rb
Overview
Class for context
Instance Attribute Summary collapse
-
#model ⇒ Model
readonly
Returns the model.
Instance Method Summary collapse
-
#decode(batch) ⇒ NilClass
Evaluates the tokens.
-
#embeddings ⇒ Array<Float>
Returns the embeddings.
-
#embeddings_ith(i) ⇒ Array<Float>
Returns the embeddings for i-th token.
-
#embeddings_seq(seq_id) ⇒ Array<Float>
Returns the embeddings for a sequence id.
-
#encode(batch) ⇒ NilClass
Processes a batch of tokens with the ecoder part of the encoder-decoder model.
-
#grammar_accept_token(grammar:, token:) ⇒ Nil
Accepts the sampled token into the grammar.
-
#initialize(model:, params:) ⇒ Context
constructor
Create context.
-
#kv_cache_clear ⇒ NilClass
Clear the KV cache.
-
#kv_cache_defrag ⇒ NilClass
Defragment the KV cache.
-
#kv_cache_seq_add(seq_id, p0, p1, delta) ⇒ NilClass
Adds relative position “delta” to all tokens that belong to the specified sequence and have positions in [p0, p1).
-
#kv_cache_seq_cp(seq_id_src, seq_id_dst, p0, p1) ⇒ NilClass
Copies all tokens that belong to the specified sequnce to another sequence.
-
#kv_cache_seq_div(seq_id, p0, p1, d) ⇒ NilClass
Integer division of the positions by factor of ‘d > 1`.
- #kv_cache_seq_keep(seq_id) ⇒ Object
-
#kv_cache_seq_pos_max(seq_id) ⇒ Integer
Returns the maximum position present in the KV cache for the specified sequence.
-
#kv_cache_seq_rm(seq_id, p0, p1) ⇒ NilClass
Removes all tokens that belong to the specified sequence and have positions in [p0, p1).
-
#kv_cache_token_count ⇒ Integer
Returns the number of tokens in the kv cache.
-
#kv_cache_update ⇒ NilClass
Apply the KV cache updates.
-
#load_session_file(session_path:) ⇒ Array<Integer>
Loads session file.
-
#logits ⇒ Array<Float>
Returns the logits.
-
#n_batch ⇒ Integer
Returns the number of batch.
-
#n_ctx ⇒ Integer
Returns the number of text context.
-
#n_seq_max ⇒ Integer
Returns the max number of sequences.
-
#n_threads ⇒ Integer
Returns the number of threads.
-
#n_threads_batch ⇒ Integer
Returns the number of threads for batch processing.
-
#n_ubatch ⇒ Integer
Returns the physical maximum batch size.
-
#pooling_type ⇒ Integer
Returns the pooling type.
-
#print_timings ⇒ NilClass
Prints timings.
-
#reset_timings ⇒ NilClass
Resets timings.
-
#sample_apply_guidance(logits:, logits_guidance:, scale:) ⇒ Object
Apply classifier-free guidance to the logits.
-
#sample_entropy(candidates, min_temp:, max_temp:, exponent_val:) ⇒ Nil
Samples dynamic temeperature.
-
#sample_grammar(candidates, grammar:) ⇒ Nil
Applies constraints from grammar.
-
#sample_min_p(candidates, prob:, min_keep: 1) ⇒ Nil
Minimum p sampling.
-
#sample_repetition_penalties(candidates, last_n_tokens, penalty_repeat:, penalty_freq:, penalty_present:) ⇒ Nil
Sampling with repetition penalty.
-
#sample_softmax(candidates) ⇒ Nil
Sorts candates by their probablities with logits.
-
#sample_tail_free(candidates, z:, min_keep: 1) ⇒ Nil
Tail free samplling.
-
#sample_temp(candidates, temp:) ⇒ Nil
Samples temeperature.
-
#sample_token(candidates) ⇒ Integer
Returns the randomly selected token from the candidates based on their probabilities.
-
#sample_token_greedy(candidates) ⇒ Integer
Returns the selected token with the highest probability.
-
#sample_token_mirostat(candidates, tau:, eta:, m:, mu:) ⇒ Array<Integer, Float>
Returns the token with Mirostat 1.0 algorithm.
-
#sample_token_mirostat_v2(candidates, tau:, eta:, mu:) ⇒ Array<Integer, Float>
Returns the token with Mirostat 2.0 algorithm.
-
#sample_top_k(candidates, k:, min_keep: 1) ⇒ Nil
Top-K sampling.
-
#sample_top_p(candidates, prob:, min_keep: 1) ⇒ Nil
Nucleus sampling.
-
#sample_typical(candidates, prob:, min_keep: 1) ⇒ Nil
Typical samplling.
-
#save_session_file(session_path:, session_tokens:) ⇒ Nil
Saves session file.
-
#set_causal_attn(causal_attn) ⇒ NilClass
Sets whether to use causal attention.
-
#set_embeddings(embd) ⇒ NilClass
Sets whether the model is in embeddings model or not.
-
#set_n_threads(n_threads:, n_threads_batch:) ⇒ NilClass
Sets the number of threads used for decoding.
-
#set_rng_seed(seed) ⇒ Object
Sets the current rng seed.
-
#synchronize ⇒ NilClass
Wait until all computations are finished.
-
#timings ⇒ Timings
Returns the timing information.
Constructor Details
#initialize(model:, params:) ⇒ Context
Create context.
842 |
# File 'ext/llama_cpp/dummy.rb', line 842 def initialize(model:, params:); end |
Instance Attribute Details
#model ⇒ Model (readonly)
Returns the model.
836 837 838 |
# File 'ext/llama_cpp/dummy.rb', line 836 def model @model end |
Instance Method Details
#decode(batch) ⇒ NilClass
Evaluates the tokens.
854 |
# File 'ext/llama_cpp/dummy.rb', line 854 def decode(batch); end |
#embeddings ⇒ Array<Float>
Returns the embeddings.
864 |
# File 'ext/llama_cpp/dummy.rb', line 864 def ; end |
#embeddings_ith(i) ⇒ Array<Float>
Returns the embeddings for i-th token.
870 |
# File 'ext/llama_cpp/dummy.rb', line 870 def (i); end |
#embeddings_seq(seq_id) ⇒ Array<Float>
Returns the embeddings for a sequence id.
876 |
# File 'ext/llama_cpp/dummy.rb', line 876 def (seq_id); end |
#encode(batch) ⇒ NilClass
Processes a batch of tokens with the ecoder part of the encoder-decoder model.
848 |
# File 'ext/llama_cpp/dummy.rb', line 848 def encode(batch); end |
#grammar_accept_token(grammar:, token:) ⇒ Nil
Accepts the sampled token into the grammar
1150 |
# File 'ext/llama_cpp/dummy.rb', line 1150 def grammar_accept_token(grammar:, token:); end |
#kv_cache_clear ⇒ NilClass
Clear the KV cache.
944 |
# File 'ext/llama_cpp/dummy.rb', line 944 def kv_cache_clear(); end |
#kv_cache_defrag ⇒ NilClass
Defragment the KV cache.
992 |
# File 'ext/llama_cpp/dummy.rb', line 992 def kv_cache_defrag(); end |
#kv_cache_seq_add(seq_id, p0, p1, delta) ⇒ NilClass
Adds relative position “delta” to all tokens that belong to the specified sequence and have positions in [p0, p1)
972 |
# File 'ext/llama_cpp/dummy.rb', line 972 def kv_cache_seq_add(seq_id, p0, p1, delta); end |
#kv_cache_seq_cp(seq_id_src, seq_id_dst, p0, p1) ⇒ NilClass
Copies all tokens that belong to the specified sequnce to another sequence.
961 |
# File 'ext/llama_cpp/dummy.rb', line 961 def kv_cache_seq_cp(seq_id_src, seq_id_dst, p0, p1); end |
#kv_cache_seq_div(seq_id, p0, p1, d) ⇒ NilClass
Integer division of the positions by factor of ‘d > 1`
981 |
# File 'ext/llama_cpp/dummy.rb', line 981 def kv_cache_seq_div(seq_id, p0, p1, d); end |
#kv_cache_seq_keep(seq_id) ⇒ Object
963 |
# File 'ext/llama_cpp/dummy.rb', line 963 def kv_cache_seq_keep(seq_id); end |
#kv_cache_seq_pos_max(seq_id) ⇒ Integer
Returns the maximum position present in the KV cache for the specified sequence
987 |
# File 'ext/llama_cpp/dummy.rb', line 987 def kv_cache_seq_pos_max(seq_id); end |
#kv_cache_seq_rm(seq_id, p0, p1) ⇒ NilClass
Removes all tokens that belong to the specified sequence and have positions in [p0, p1).
952 |
# File 'ext/llama_cpp/dummy.rb', line 952 def kv_cache_seq_rm(seq_id, p0, p1); end |
#kv_cache_token_count ⇒ Integer
Returns the number of tokens in the kv cache.
939 |
# File 'ext/llama_cpp/dummy.rb', line 939 def kv_cache_token_count; end |
#kv_cache_update ⇒ NilClass
Apply the KV cache updates.
997 |
# File 'ext/llama_cpp/dummy.rb', line 997 def kv_cache_update(); end |
#load_session_file(session_path:) ⇒ Array<Integer>
Loads session file.
1019 |
# File 'ext/llama_cpp/dummy.rb', line 1019 def load_session_file(session_path:); end |
#logits ⇒ Array<Float>
Returns the logits.
859 |
# File 'ext/llama_cpp/dummy.rb', line 859 def logits(); end |
#n_batch ⇒ Integer
Returns the number of batch.
899 |
# File 'ext/llama_cpp/dummy.rb', line 899 def n_batch; end |
#n_ctx ⇒ Integer
Returns the number of text context.
894 |
# File 'ext/llama_cpp/dummy.rb', line 894 def n_ctx; end |
#n_seq_max ⇒ Integer
Returns the max number of sequences.
909 |
# File 'ext/llama_cpp/dummy.rb', line 909 def n_seq_max; end |
#n_threads ⇒ Integer
Returns the number of threads.
914 |
# File 'ext/llama_cpp/dummy.rb', line 914 def n_threads; end |
#n_threads_batch ⇒ Integer
Returns the number of threads for batch processing.
919 |
# File 'ext/llama_cpp/dummy.rb', line 919 def n_threads_batch; end |
#n_ubatch ⇒ Integer
Returns the physical maximum batch size.
904 |
# File 'ext/llama_cpp/dummy.rb', line 904 def n_ubatch; end |
#pooling_type ⇒ Integer
Returns the pooling type.
1155 |
# File 'ext/llama_cpp/dummy.rb', line 1155 def pooling_type(); end |
#print_timings ⇒ NilClass
Prints timings.
929 |
# File 'ext/llama_cpp/dummy.rb', line 929 def print_timings; end |
#reset_timings ⇒ NilClass
Resets timings.
934 |
# File 'ext/llama_cpp/dummy.rb', line 934 def reset_timings; end |
#sample_apply_guidance(logits:, logits_guidance:, scale:) ⇒ Object
Apply classifier-free guidance to the logits.
1043 |
# File 'ext/llama_cpp/dummy.rb', line 1043 def sample_apply_guidance(logits:, logits_guidance:, scale:); end |
#sample_entropy(candidates, min_temp:, max_temp:, exponent_val:) ⇒ Nil
Samples dynamic temeperature.
1098 |
# File 'ext/llama_cpp/dummy.rb', line 1098 def sample_entropy(candidates, min_temp:, max_temp:, exponent_val:); end |
#sample_grammar(candidates, grammar:) ⇒ Nil
Applies constraints from grammar
1143 |
# File 'ext/llama_cpp/dummy.rb', line 1143 def sample_grammar(candidates, grammar:); end |
#sample_min_p(candidates, prob:, min_keep: 1) ⇒ Nil
Minimum p sampling.
1073 |
# File 'ext/llama_cpp/dummy.rb', line 1073 def sample_min_p(candidates, prob:, min_keep: 1); end |
#sample_repetition_penalties(candidates, last_n_tokens, penalty_repeat:, penalty_freq:, penalty_present:) ⇒ Nil
Sampling with repetition penalty.
1036 |
# File 'ext/llama_cpp/dummy.rb', line 1036 def sample_repetition_penalties(candidates, last_n_tokens, penalty_repeat:, penalty_freq:, penalty_present:); end |
#sample_softmax(candidates) ⇒ Nil
Sorts candates by their probablities with logits.
1049 |
# File 'ext/llama_cpp/dummy.rb', line 1049 def sample_softmax(candidates); end |
#sample_tail_free(candidates, z:, min_keep: 1) ⇒ Nil
Tail free samplling.
1081 |
# File 'ext/llama_cpp/dummy.rb', line 1081 def sample_tail_free(candidates, z:, min_keep: 1); end |
#sample_temp(candidates, temp:) ⇒ Nil
Samples temeperature.
1105 |
# File 'ext/llama_cpp/dummy.rb', line 1105 def sample_temp(candidates, temp:); end |
#sample_token(candidates) ⇒ Integer
Returns the randomly selected token from the candidates based on their probabilities.
1136 |
# File 'ext/llama_cpp/dummy.rb', line 1136 def sample_token(candidates); end |
#sample_token_greedy(candidates) ⇒ Integer
Returns the selected token with the highest probability.
1130 |
# File 'ext/llama_cpp/dummy.rb', line 1130 def sample_token_greedy(candidates); end |
#sample_token_mirostat(candidates, tau:, eta:, m:, mu:) ⇒ Array<Integer, Float>
Returns the token with Mirostat 1.0 algorithm.
1115 |
# File 'ext/llama_cpp/dummy.rb', line 1115 def sample_token_mirostat(candidates, tau:, eta:, m:, mu:); end |
#sample_token_mirostat_v2(candidates, tau:, eta:, mu:) ⇒ Array<Integer, Float>
Returns the token with Mirostat 2.0 algorithm.
1124 |
# File 'ext/llama_cpp/dummy.rb', line 1124 def sample_token_mirostat_v2(candidates, tau:, eta:, mu:); end |
#sample_top_k(candidates, k:, min_keep: 1) ⇒ Nil
Top-K sampling.
1057 |
# File 'ext/llama_cpp/dummy.rb', line 1057 def sample_top_k(candidates, k:, min_keep: 1); end |
#sample_top_p(candidates, prob:, min_keep: 1) ⇒ Nil
Nucleus sampling.
1065 |
# File 'ext/llama_cpp/dummy.rb', line 1065 def sample_top_p(candidates, prob:, min_keep: 1); end |
#sample_typical(candidates, prob:, min_keep: 1) ⇒ Nil
Typical samplling.
1089 |
# File 'ext/llama_cpp/dummy.rb', line 1089 def sample_typical(candidates, prob:, min_keep: 1); end |
#save_session_file(session_path:, session_tokens:) ⇒ Nil
Saves session file.
1026 |
# File 'ext/llama_cpp/dummy.rb', line 1026 def save_session_file(session_path:, session_tokens:); end |
#set_causal_attn(causal_attn) ⇒ NilClass
Sets whether to use causal attention.
1008 |
# File 'ext/llama_cpp/dummy.rb', line 1008 def set_causal_attn(causal_attn); end |
#set_embeddings(embd) ⇒ NilClass
Sets whether the model is in embeddings model or not.
882 |
# File 'ext/llama_cpp/dummy.rb', line 882 def (embd); end |
#set_n_threads(n_threads:, n_threads_batch:) ⇒ NilClass
Sets the number of threads used for decoding.
889 |
# File 'ext/llama_cpp/dummy.rb', line 889 def set_n_threads(n_threads:, n_threads_batch:); end |
#set_rng_seed(seed) ⇒ Object
Sets the current rng seed.
1002 |
# File 'ext/llama_cpp/dummy.rb', line 1002 def set_rng_seed(seed); end |
#synchronize ⇒ NilClass
Wait until all computations are finished.
1013 |
# File 'ext/llama_cpp/dummy.rb', line 1013 def synchronize(); end |
#timings ⇒ Timings
Returns the timing information
924 |
# File 'ext/llama_cpp/dummy.rb', line 924 def timings; end |