quantization_config Module¶
ThresholdSelectionMethod¶
Enum to select a method for quantization threshold selection:
- class model_compression_toolkit.ThresholdSelectionMethod(value)¶
Method for quantization threshold selection:
NOCLIPPING - Use min/max values as thresholds.
MSE - Use min square error for minimizing quantization noise.
MAE - Use min absolute error for minimizing quantization noise.
KL - Use KL-divergence to make signals distributions to be similar as possible.
Lp - Use Lp-norm to minimizing quantization noise.
QuantizationConfig¶
Class to configure the quantization process of the model:
- class model_compression_toolkit.QuantizationConfig(activation_threshold_method=ThresholdSelectionMethod.MSE, weights_threshold_method=ThresholdSelectionMethod.MSE, activation_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, activation_n_bits=8, weights_n_bits=8, relu_unbound_correction=False, weights_bias_correction=True, weights_per_channel_threshold=True, input_scaling=False, enable_weights_quantization=True, enable_activation_quantization=True, shift_negative_activation_correction=False, activation_channel_equalization=False, z_threshold=math.inf, min_threshold=MIN_THRESHOLD, l_p_value=2, shift_negative_ratio=0.25, shift_negative_threshold_recalculation=False)¶
Class to wrap all different parameters the library quantize the input model according to.
- Parameters
activation_threshold_method (ThresholdSelectionMethod) – Which method to use from ThresholdSelectionMethod for activation quantization threshold selection.
weights_threshold_method (ThresholdSelectionMethod) – Which method to use from ThresholdSelectionMethod for activation quantization threshold selection.
activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.
weights_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for weights quantization.
activation_n_bits (int) – Number of bits to quantize the activations.
weights_n_bits (int) – Number of bits to quantize the coefficients.
relu_unbound_correction (bool) – Whether to use relu unbound scaling correction or not.
weights_bias_correction (bool) – Whether to use weights bias correction or not.
weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor).
input_scaling (bool) – Whether to use input scaling or not.
enable_weights_quantization (bool) – Whether to quantize the model weights or not.
enable_activation_quantization (bool) – Whether to quantize the model activations or not.
shift_negative_activation_correction (bool) – Whether to use shifting negative activation correction or not.
activation_channel_equalization (bool) – Whether to use activation channel equalization correction or not.
z_threshold (float) – Value of z score for outliers removal.
min_threshold (float) – Minimum threshold to use during thresholds selection.
l_p_value (int) – The p value of L_p norm threshold selection.
shift_negative_ratio (float) – Value for the ratio between the minimal negative value of a non-linearity output to its activation threshold, which above it - shifting negative activation should occur if enabled.
shift_negative_threshold_recalculation (bool) – Whether or not to recompute the threshold after shifting negative activation.
Examples
One may create a quantization configuration to quantize a model according to. For example, to quantize a model using 6 bits for activation, 7 bits for weights, weights and activation quantization method is symetric uniform, weights threshold selection using MSE, activation threshold selection using NOCLIPPING, enabling relu_unbound_correction, weights_bias_correction, and quantizing the weights per-channel, one can instantiate a quantization configuration:
>>> qc = QuantizationConfig(activation_n_bits=6, weights_n_bits=7, activation_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_threshold_method=ThresholdSelectionMethod.MSE, activation_threshold_method=ThresholdSelectionMethod.NOCLIPPING, relu_unbound_correction=True, weights_bias_correction=True, weights_per_channel_threshold=True)
The QuantizationConfig instanse can then be passed to
keras_post_training_quantization()