Class: LLaMACpp::ModelQuantizeParams
- Inherits:
-
Object
- Object
- LLaMACpp::ModelQuantizeParams
- Defined in:
- ext/llama_cpp/dummy.rb
Overview
Class for quantization parameters
Instance Method Summary collapse
-
#allow_quantization ⇒ Boolean
Returns the flag to allow quantizing non-f32/f16 tensors.
-
#allow_quantization=(flag) ⇒ Object
Sets the flag to allow quantizing non-f32/f16 tensors.
-
#ftype ⇒ Integer
Returns the file type of quantized model.
-
#ftype=(ftype) ⇒ Object
Sets the file type of quantized model.
-
#keep_split ⇒ Boolean
Returns the flag to quantize to the same number of shards.
-
#keep_split=(flag) ⇒ Object
Sets the flag to quantize to the same number of shards.
-
#n_thread ⇒ Integer
Returns the number of threads.
-
#n_thread=(n_thread) ⇒ Object
Sets the number of threads.
-
#only_copy ⇒ Boolean
Returns the flag to only copy tensors.
-
#only_copy=(flag) ⇒ Object
Sets the flag to only copy tensors.
-
#prue=(flag) ⇒ Object
Sets the flag to disable k-quant mixtures and quantize all tensors to the same type.
-
#pure ⇒ Boolean
Returns the flag to disable k-quant mixtures and quantize all tensors to the same type.
-
#quantize_output_tensor ⇒ Boolean
Returns the flag to quantize output.weight.
-
#quantize_output_tensor=(flag) ⇒ Object
Sets the flag to quantize output.weight.
Instance Method Details
#allow_quantization ⇒ Boolean
Returns the flag to allow quantizing non-f32/f16 tensors.
1406 |
# File 'ext/llama_cpp/dummy.rb', line 1406 def allow_quantization; end |
#allow_quantization=(flag) ⇒ Object
Sets the flag to allow quantizing non-f32/f16 tensors.
1402 |
# File 'ext/llama_cpp/dummy.rb', line 1402 def allow_quantization=(flag); end |
#ftype ⇒ Integer
Returns the file type of quantized model.
1398 |
# File 'ext/llama_cpp/dummy.rb', line 1398 def ftype; end |
#ftype=(ftype) ⇒ Object
Sets the file type of quantized model.
1394 |
# File 'ext/llama_cpp/dummy.rb', line 1394 def ftype=(ftype); end |
#keep_split ⇒ Boolean
Returns the flag to quantize to the same number of shards.
1438 |
# File 'ext/llama_cpp/dummy.rb', line 1438 def keep_split; end |
#keep_split=(flag) ⇒ Object
Sets the flag to quantize to the same number of shards.
1434 |
# File 'ext/llama_cpp/dummy.rb', line 1434 def keep_split=(flag); end |
#n_thread ⇒ Integer
Returns the number of threads.
1390 |
# File 'ext/llama_cpp/dummy.rb', line 1390 def n_thread; end |
#n_thread=(n_thread) ⇒ Object
Sets the number of threads.
1386 |
# File 'ext/llama_cpp/dummy.rb', line 1386 def n_thread=(n_thread); end |
#only_copy ⇒ Boolean
Returns the flag to only copy tensors.
1422 |
# File 'ext/llama_cpp/dummy.rb', line 1422 def only_copy; end |
#only_copy=(flag) ⇒ Object
Sets the flag to only copy tensors.
1418 |
# File 'ext/llama_cpp/dummy.rb', line 1418 def only_copy=(flag); end |
#prue=(flag) ⇒ Object
Sets the flag to disable k-quant mixtures and quantize all tensors to the same type.
1426 |
# File 'ext/llama_cpp/dummy.rb', line 1426 def prue=(flag); end |
#pure ⇒ Boolean
Returns the flag to disable k-quant mixtures and quantize all tensors to the same type.
1430 |
# File 'ext/llama_cpp/dummy.rb', line 1430 def pure; end |
#quantize_output_tensor ⇒ Boolean
Returns the flag to quantize output.weight.
1414 |
# File 'ext/llama_cpp/dummy.rb', line 1414 def quantize_output_tensor; end |
#quantize_output_tensor=(flag) ⇒ Object
Sets the flag to quantize output.weight.
1410 |
# File 'ext/llama_cpp/dummy.rb', line 1410 def quantize_output_tensor=(flag); end |