Class: LLaMACpp::ModelQuantizeParams

Inherits:

Object

Object
LLaMACpp::ModelQuantizeParams

show all

Defined in:: ext/llama_cpp/dummy.rb

Overview

Class for quantization parameters

Instance Method Summary collapse

#allow_quantization ⇒ Boolean

Returns the flag to allow quantizing non-f32/f16 tensors.
#allow_quantization=(flag) ⇒ Object

Sets the flag to allow quantizing non-f32/f16 tensors.
#ftype ⇒ Integer

Returns the file type of quantized model.
#ftype=(ftype) ⇒ Object

Sets the file type of quantized model.
#keep_split ⇒ Boolean

Returns the flag to quantize to the same number of shards.
#keep_split=(flag) ⇒ Object

Sets the flag to quantize to the same number of shards.
#n_thread ⇒ Integer

Returns the number of threads.
#n_thread=(n_thread) ⇒ Object

Sets the number of threads.
#only_copy ⇒ Boolean

Returns the flag to only copy tensors.
#only_copy=(flag) ⇒ Object

Sets the flag to only copy tensors.
#prue=(flag) ⇒ Object

Sets the flag to disable k-quant mixtures and quantize all tensors to the same type.
#pure ⇒ Boolean

Returns the flag to disable k-quant mixtures and quantize all tensors to the same type.
#quantize_output_tensor ⇒ Boolean

Returns the flag to quantize output.weight.
#quantize_output_tensor=(flag) ⇒ Object

Sets the flag to quantize output.weight.

Instance Method Details

#allow_quantization ⇒ `Boolean`

Returns the flag to allow quantizing non-f32/f16 tensors.

Returns:

(Boolean)

1406	# File 'ext/llama_cpp/dummy.rb', line 1406 def allow_quantization; end

#allow_quantization=(flag) ⇒ `Object`

Sets the flag to allow quantizing non-f32/f16 tensors.

Parameters:

flag (Boolean)

1402	# File 'ext/llama_cpp/dummy.rb', line 1402 def allow_quantization=(flag); end

#ftype ⇒ `Integer`

Returns the file type of quantized model.

Returns:

(Integer)

1398	# File 'ext/llama_cpp/dummy.rb', line 1398 def ftype; end

#ftype=(ftype) ⇒ `Object`

Sets the file type of quantized model.

Parameters:

ftype (Integer)

1394	# File 'ext/llama_cpp/dummy.rb', line 1394 def ftype=(ftype); end

#keep_split ⇒ `Boolean`

Returns the flag to quantize to the same number of shards.

Returns:

(Boolean)

1438	# File 'ext/llama_cpp/dummy.rb', line 1438 def keep_split; end

#keep_split=(flag) ⇒ `Object`

Sets the flag to quantize to the same number of shards.

Parameters:

flag (Boolean)

1434	# File 'ext/llama_cpp/dummy.rb', line 1434 def keep_split=(flag); end

#n_thread ⇒ `Integer`

Returns the number of threads.

Returns:

(Integer)

1390	# File 'ext/llama_cpp/dummy.rb', line 1390 def n_thread; end

#n_thread=(n_thread) ⇒ `Object`

Sets the number of threads.

Parameters:

n_thread (Intger)

1386	# File 'ext/llama_cpp/dummy.rb', line 1386 def n_thread=(n_thread); end

#only_copy ⇒ `Boolean`

Returns the flag to only copy tensors.

Returns:

(Boolean)

1422	# File 'ext/llama_cpp/dummy.rb', line 1422 def only_copy; end

#only_copy=(flag) ⇒ `Object`

Sets the flag to only copy tensors.

Parameters:

flag (Boolean)

1418	# File 'ext/llama_cpp/dummy.rb', line 1418 def only_copy=(flag); end

#prue=(flag) ⇒ `Object`

Sets the flag to disable k-quant mixtures and quantize all tensors to the same type.

Parameters:

flag (Boolean)

1426	# File 'ext/llama_cpp/dummy.rb', line 1426 def prue=(flag); end

#pure ⇒ `Boolean`

Returns the flag to disable k-quant mixtures and quantize all tensors to the same type.

Returns:

(Boolean)

1430	# File 'ext/llama_cpp/dummy.rb', line 1430 def pure; end

#quantize_output_tensor ⇒ `Boolean`

Returns the flag to quantize output.weight.

Returns:

(Boolean)

1414	# File 'ext/llama_cpp/dummy.rb', line 1414 def quantize_output_tensor; end

#quantize_output_tensor=(flag) ⇒ `Object`

Sets the flag to quantize output.weight.

Parameters:

flag (Boolean)

1410	# File 'ext/llama_cpp/dummy.rb', line 1410 def quantize_output_tensor=(flag); end

Class: LLaMACpp::ModelQuantizeParams

Overview

Instance Method Summary collapse

Instance Method Details

#allow_quantization ⇒ Boolean

#allow_quantization=(flag) ⇒ Object

#ftype ⇒ Integer

#ftype=(ftype) ⇒ Object

#keep_split ⇒ Boolean

#keep_split=(flag) ⇒ Object

#n_thread ⇒ Integer

#n_thread=(n_thread) ⇒ Object

#only_copy ⇒ Boolean

#only_copy=(flag) ⇒ Object

#prue=(flag) ⇒ Object

#pure ⇒ Boolean

#quantize_output_tensor ⇒ Boolean

#quantize_output_tensor=(flag) ⇒ Object

#allow_quantization ⇒ `Boolean`

#allow_quantization=(flag) ⇒ `Object`

#ftype ⇒ `Integer`

#ftype=(ftype) ⇒ `Object`

#keep_split ⇒ `Boolean`

#keep_split=(flag) ⇒ `Object`

#n_thread ⇒ `Integer`

#n_thread=(n_thread) ⇒ `Object`

#only_copy ⇒ `Boolean`

#only_copy=(flag) ⇒ `Object`

#prue=(flag) ⇒ `Object`

#pure ⇒ `Boolean`

#quantize_output_tensor ⇒ `Boolean`

#quantize_output_tensor=(flag) ⇒ `Object`