In this lesson you will learn about the symmetric mode of linear quantization. You will also implement quantization at different granularity, such as per tensor, per channel, and per group quantization. Finally, you will check how to perform inference on the quantized linear layer. Let's take a look. There are two modes in linear quantization. The first one is the asymmetric mode. This is when you map the r_min r_max to q_min to q_max and we just did that in the previous lesson. The second one is the symmetric mode. This is when we map negative r_max,r_max to negative q_max,q_max and r_max can be defined as the maximum of the absolute value of the tensor. In the symmetric mode. We do not need to store the zero point, since it is equal to zero. This happens because the floating point range and the quantized range are symmetric with respect to zero. Hence, we can simplify the equation, in the previous lesson, to get the following equation. The quantized tensor is simply the original tensor divided by the scale that we run and cast to the data type of the quantized tensor, and the scale S is simply r_max/q_max. Now let's code the scale. In this classroom, the libraries have already been installed for you. But if you are running this on your own machine, you can download the torch library by simply typing pip install torch. Since the libraries have already been installed, I won't be running this comment and I'll just comment it out. Now let's import torch. Now let's call the function. In order to get the scale for the linear quantization, we use symmetric mode. We will call it get_q _scale_symmetric. And this function will take the tensor and we will also define the d-type. And the default value will be torch.int8. And as we saw in the slides we need to get r_max and q_max. And for r_max we can define it as the maximum value of the absolute value of the tensor. So to get the absolute value of the tensor, we just need to call the abs method. And then to get the maximum value, we just need to use the max method. And to finish, we also need to use the item method to get just the value and not the tensor. To get q_max, we will use the torch.info method. We need to pass the d-type and to get the maximum value, we just need to put that max and we will return the scale which is equal to r_max/q_max. Now let's do a quick test. Let's define a test insert to be equal to random 4*4 matrix. We can print the test tensor and we can call this function yo our test tensor. To see the results. And this is the scale we get. Now that we got the scale, we can get the quantized tensor based on the following equation. We will call it linear_q_symmetric. And it will take as argument the tensor and the d-type. We will set the default the type to be equal to torch.int8. So first we need to give the scale. So we will use the get q scale symmetric function that we just coded. And we just need to pass the tensor. Then to get the quantized tensor what we will do is to import a function that we coded in the last lab, all the function we coded in the last lab or in the helper.py file. So from the helper file we will get the linear q with scale and zero point function. And to get now the quantized tensor, we just need to call this function by specifying the tensor. And we need to specify the scale that we got just above. We need also to set the zero point to be equal to zero, since for symmetry quantization, the zero point is equal to zero and specify the last argument, which is the d-type. To finish, we just need to return the quantized tensor and the scale. Now let's get the quantized tensor and the scale using the function that we just coded on our test tensor. Just like the last lab, we will also get the dequantized tensor and have a summary of the quantization process using the plot quantization error function. Let's import them. Now let's get the dequantized tensor. We will use the linear dequantization function, and we just need to pass the quantized tensor, the scale and the zero point, which is equal to zero. Now we have everything to have the summary. Using the plot quantization error, we just pass the test tensor. The quantized tensor and the dequantized tensor. And this is the result we get. And as you can see, the original tensor is pretty much the same as the dequantized tensor. And the quantization tensor is pretty low. And we can also print the quantization error that we defined in the last lab by passing the test tensor and the dequantized tensor into the quantization error function. And as you can see, the error is quite low. The trade offs between these two modes are: First, the utilization of the quantization range. When you're using the asymmetric quantization, the quantization range is fully used. But when you're using the symmetric mode, if the floating point range is biased toward one side, for example, you can think about the RELU layers where the output is positive. This will result in the quantization range, where a part of the range is dedicated to values that we will never see. The second tradeoff is the simplicity. Symmetric mode is much simpler compared to a symmetric mode. And the last one is the memory. For symmetry quantization, we don't need to store the zero points. In practice, we use symmetric quantization when we are trying to quantize to eight-bits, but when we quantized to low bits such as two, three, or four bits, we often use asymmetric quantization.