FrameworkInfo Class¶
The following API can be used to pass MCT framework-related information to use when optimizing the network
- class model_compression_toolkit.FrameworkInfo(kernel_ops, activation_ops, no_quantization_ops, activation_quantizer_mapping, weights_quantizer_mapping, kernel_channels_mapping, activation_min_max_mapping, layer_min_max_mapping)¶
A class to wrap all information about a specific framework the library needs to quantize a model. Specifically, FrameworkInfo holds lists of layers by how they should be quantized, and multiple mappings such as layer to it kernel channels indices, and a layer to its min/max values, etc. The layers lists are divided into three groups: kernel_ops: Layers that have coefficients and need to get quantized (e.g., Conv2D, Dense, etc.) activation_ops: Layers that their outputs should get quantized (e.g., Add, ReLU, etc.) no_quantization_ops:Layers that should not get quantized (e.g., Reshape, Transpose, etc.)
- Parameters
kernel_ops (list) – A list of operators that are in the kernel_ops group.
activation_ops (list) – A list of operators that are in the activation_ops group.
no_quantization_ops (list) – A list of operators that are in the no_quantization_ops group.
activation_quantizer_mapping (Dict[QuantizationMethod, Callable]) – A dictionary mapping from QuantizationMethod to a quantization function.
weights_quantizer_mapping (Dict[QuantizationMethod, Callable]) – A dictionary mapping from QuantizationMethod to a quantization function.
kernel_channels_mapping (DefaultDict) – Dictionary from a layer to a tuple of its kernel in/out channels indices.
activation_min_max_mapping (Dict[str, tuple]) – Dictionary from an activation function to its min/max output values.
layer_min_max_mapping (Dict[Any, tuple]) – Dictionary from a layer to its min/max output values.
Examples
When quantizing a Keras model, if we want to quantize the kernels of Conv2D layers only, we can set, and we know it’s kernel out/in channel indices are (3, 2) respectivly:
>>> import tensorflow as tf >>> kernel_ops = [tf.keras.layers.Conv2D] >>> kernel_channels_mapping = DefaultDict({tf.keras.layers.Conv2D: (3,2)})
Then, we can create a FrameworkInfo object:
>>> FrameworkInfo(kernel_ops, [], [], kernel_channels_mapping, {}, {})
and pass it to
keras_post_training_quantization()
.To quantize the activations of ReLU, we can create a new FrameworkInfo instance:
>>> activation_ops = [tf.keras.layers.ReLU] >>> FrameworkInfo(kernel_ops, activation_ops, [], kernel_channels_mapping, {}, {})
If we don’t want to quantize a layer (e.g. Reshape), we can add it to the no_no_quantization_ops list:
>>> no_quantization_ops = [tf.keras.layers.Reshape] >>> FrameworkInfo(kernel_ops, activation_ops, no_quantization_ops, kernel_channels_mapping, {}, {})
If an activation layer (tf.keras.layers.Activation) should be quantized and we know it’s min/max outputs range in advanced, we can add it to activation_min_max_mapping for saving the statistics collection time. For example:
>>> activation_min_max_mapping = {'softmax': (0, 1)} >>> FrameworkInfo(kernel_ops, activation_ops, no_quantization_ops, kernel_channels_mapping, activation_min_max_mapping, {})
If a layer’s activations should be quantized and we know it’s min/max outputs range in advanced, we can add it to layer_min_max_mapping for saving the statistics collection time. For example:
>>> layer_min_max_mapping = {tf.keras.layers.Softmax: (0, 1)} >>> FrameworkInfo(kernel_ops, activation_ops, no_quantization_ops, kernel_channels_mapping, activation_min_max_mapping, layer_min_max_mapping)