Now let's move on to the per channel quantization. We need to store the scales and the zero point for each row if we decide to quantize along the rows and we need to store them along each column, if we decide to quantize along the columns. The memory needed to store all these linear parameters is pretty small. We usually use per channel quantization when quantizing models in eight-bit. You will see that in the next lesson we use units. Now let's call the per channel quantization. And don't worry about this slide. We'll do it in the notebook. So let's code the per channel quantization. To simplify the work we will just restrict ourselves to the symmetric mode of the linear quantization. So the function will be called linear q symmetric per channel. So we expect this function to take as arguments the tensor. The dimension. So whether we want to quantize along the rows or the columns, if we are talking about the 2D matrix, we set the default value to be equal to torch.int8. And at the end we expect to get the quantized tensor and the scale. We don't need the zero point since we are trying to do it with the symmetric mode. So let's define a test tensor so that we can work through the code. We will use the test tensor that we defined previously. The first step is to know how big the scale tensor will be. Since we are doing per channel, we will be having more than one scale and we need to create a tensor to store these values. So the shape of the tensor scale would be equal to that. Tensor. shape. And with a specific dimension we need to set the dimension to be equals to zero. If we want to quantize along the rows. Otherwise we need to set it to one. If we want to quantize along the columns. So let's check the output dimension. As you can see we get three. And indeed we need three scales value. One for these numbers. This one. And the last one is this one. Now we can create the scale tensor using torch.zeros. This will create a tensor with the shape output dimension and each element will be equal to zero. Let's have a look. And indeed we get that. Now what we need to do is to iterate through each one of these rows and calculate the scale for each one of them. To do that, we will loop over the output dimension. Now we need to get a sub row. For example the first row, the second row or the third row. To do that we will use the select method and we need to set two arguments. First, the dimension and second, the index. And now just to be sure let's check out what the sub tensor looks like. We should have a tensor for each row. And indeed we were able to extract each row in a tensor. Now that we manage to get the sub tensor, all we need to do is to apply the get q scale symmetric function to that sub tensor in order to get the scale related to that row and store it inside the scale tensor. So we need to set in the index position of the tensor scale the scale for that particular sub tensor. To do that, we will use the get q scale symmetric function and we just pass the set answers. Let's check now what scale look like. We did manage to store the scales related to each row inside this tensor. Now that we have stored all scales, we need to do a little bit of processing in order to reshape the scale so that when we divide the original tensor by the tensor scale, each column is divided by the correct scale. To do that, we define the shape that the scale tensor should have. Let's have a look at this scale shape. It's full of one. Then we need to set the scale shape at index in to be equal to minus one, which will give us that. And the last thing we need to do is to reshape the scale using the view method, using the scale shape that we just defined. And we get the following scale. And this is the scale we need in order to be able to divide the original tensor by the tensor scale. So that each row is divided by each value of the scale. I know this is a bit complex, since it involves how to divide tensor by tensors in PyTorch. Let's have a look at an example. In order to understand how a view works and how to divide tensor by tensor in such way that you divide each rows or each columns. Let's say we have the following matrix. And we have the following scale. Just like in the previous example, the shape of the scale is three. We can reshape that tensor in such a way that the first dimension is of size one, and the second dimension can contain the rest. To do that, we can use the view function. The shape of the of the scale is three. We can reshape that tensor using the view function. For example, we can reshaped so that the first dimension is one and the second dimension is three. And as expected, we get a tensor of size one by three. An alternative way to do that is just to replace the three by minus one. What this does is that view will be able to find the right shape where you put the minus one. You can also reshape S so that the first dimension will end up with being three, and the last dimension to be one. By doing that. Now let's try to divide the matrix M along the row. So, the scale we need in order to divide each row is this one. As you can see, the scale shape is the following. We have three as the first dimension and one as the second dimension. And let's perform the division. And as you can see, we managed to, divide along the rows. You can see that this rule was untouched since it's always divided by one. The second one was divided by five, and the last one, the third one was divided by ten. If we use the following scale instead. With the following shape. One by three and we divide the matrix by the specific scale, we see that in this case we divided each column by the scales. So here as you can see this column is untouched. We have one four and seven. The second column was divided by five and the last column was divided by ten. Now let's go back to quantizing our tensor. If you remember well the scale that we got at the end was the following. And if we check the shape of the scale, this is the right shape for the scale in order to quantize each row. Now all we need to do is to quantize the tensor by, using the linear shape q scale and zero point function that we called it in the previous lab. And we just need to pass the test tensor, the scale, and the zero point which should be equal to zero. Since we are doing symmetric quantization. And as you can see, we end up with the following quantized tensor. Now let's put everything we did in a function called linear q symmetric per channel. Okay. As you can see here we get the output dimension. We create the scale tensor with the output dimension shape. We iterate through the output dimension. And for each index we get the sub tensor and we store the scale in the index position. Then we reshape the scale here. Lastly, we get the quantized tensor using the linear query scale and zero point function. And that's it. We get the quantized tensor and the scale. Now that we have our function, let's check if we were indeed able to quantize along a specific dimension. So I replaced the test tensor that we defined earlier. And this time we will quantize along the first dimension and the second dimension. So we'll have the quantized tensor zero and the scale zero. We get that by using the linear symmetric per channel function. And we need to pass the test tensor. And we need to precise that the dimension that we are quantizing is zero. Let's do the same for the other dimension. So we'll call it quantized in cell one and scale underscore one. To get the summary we need also to dequantize each tensor. So let's first do the case where the dimension is equal to zero. We have the dequantized tensor underscore zero which is equals to linear dequantization. And we need to precise the quantized tensor on a scale zero its scale and zero. Since the zero point is equal to zero. Now we have everything to get the summary using the plot quantization error function. And that's it. As you can see, we indeed quantized along the rows. You can see that we have the maximum quantized value here 127 here, here and here. And the quantization was pretty good. As you can see, the original tensor is pretty close to the dequantized tensor. And that the quantization error tensor is not so bad. Let's have a better metric by computing the quantization error. And we get a quantization error of 1.8. If we remember well when we did the potential symmetric linear quantization, we had the quantization error around 2.5. Now let's check what happens if we quantize along the columns. We'll do the same thing as we did as a but with the quantized tensor underscore one. So as you can see here we define the dequantized tensor underscore one by using the linear dequantization. And we passed the quantization into underscore one and scale one. And then we plot the quantization error. This will give us the following summary. And as you can see here, we managed indeed to quantize along the columns. This time the quantization error is even lower. You see that we get a lower quantization error in both cases compared to tensor quantization. This is because outlier values will only impact the channel it was in, instead of the entire tensor.