Quantization in Depth - DeepLearning.AI

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

The unpacking algorithm is simply the other way around. So if you pack the tensor as follows, you would like to unpack them to extract the real values of the two bit weights. And most importantly, you want to unpack these tensors so that you can also perform some operations on top of these tensors. Yeah. So if you have for example packages tensor then you would unpack it like this. So 010000 0 or 1 one and then one zero. So let's also try to do it step by step. So unpacked weights the signature would be the same as well. You take in packed uint8 tensor number of bits. So to get the expected number of values you can just use this formula. Yeah. It's simply the number of packed tensors multiplied by eight divided by the number of bits. NUM steps is also going to be the same for each packed tensor. We're expecting to extract eight divided by number of bits tensors. Let's initialize our unpack tensor. So yeah by just initializing some torch zeros. The values for you the eight. And let's keep track of the index of the unpacked tensor. So yeah we're going to go for a simple and naive approach. Yeah. We're just going to loop over the packed weights one by one and extract for each pack tensor number of steps. So here for two bits four. So again if we consider this example okay. So this is our packed weight. We want to extract step by step this value in two bit. And this one and so on. So let's consider the first iteration. So for the first packed weight. So this is in our simple case this is one value right. So in the first iteration you uint8 tensor of I is this value. So if we want to extract only these first two bits, in an iterative manner, we're going to shift it by bits times j. So j starts from zero. And then it goes until three. So for the first iteration this is going to be zero. We're going to do nothing. But it's fine because we just want to take the first two bits. We're going to use the same bitwise or operation on the unpacked tensor. So let's also write down. Here. The stages of the unpacked tensor. So we have num values here is four. So we have four values in four bit. For the first index we're going to make bitwise or operation between these full zeros and this tensor shifted by zero. So basically just this tensor. And in the first iteration it's going to be an exact replication of the packed tensor. So you may be wondering "what do we do to remove all these bits that are before zero one?" But don't worry, you're going to see that in a few moments. So this is on the first iteration. Of course. Don't forget to increment unpacked index. And let's try to see what happens on the next iteration. So in the next iteration we want to consider these two bits and put them here, right. So now unpacked index is going to be one J is going to be one. And you uint tensor is still going to be zero. Because we only have a single packed tensor. So this time instead of shifting by zero we're shifting by two bits bits times j because j is equal to one. So in the second iteration. So this is first iteration. Second iteration. In the second iteration we're going to shift that by two bits on the right. So, the shifted tensor would look like this. So if you shift something on the right the shifted values would be equal to zero. And then you're going to perform a bitwise or. So since this is full zeros the unpack tensor would look like this. And then third iteration. So j is going to be equal to two. So we're going to shift this by four. All right. So it's going to look like this. And we're going to do again bitwise or between this value and our shifted value. Perfect. And then last iteration. All right. We're almost there. There is just one thing. So you may be wondering, "we're not having exactly the same results as our target tensor", which, again, would look like this, right? But now we have these bits that we want to remove from our, unpacked tensors. So to do that, we're just going to perform a simple trick. So, basically the idea is that we're just going to create a mask and use that mask to perform a bitwise operation. That will always give us here zero. And always make sure that these values stay the same. So if we consider this mask and let's say this value, if we perform a bitwise and operation between these two values, so one and 000100 and so on zero everywhere. Except for the last two bits where you have 010 and 10111. So all the bits before these two bits that we want to keep were correctly masked out a zero. And the two bits here are preserved. So that's exactly what we want to achieve. You can try that out on the other values. That's just to double check. But this mask should be sufficient to remove all the bits before these, all of these two bits that we are interested in keeping. And there is a general rule, where you can compute the value of this mask. so this mask is simply three. So three in case of two bits is just two to the power of two number of bits minus one, which corresponds to the maximum value you can, reach within this number of bits. So if you encode in binary in two bits, you can't go above three because you can only encode zero, one, two and three. And this formula of course is extensible. So you can also try it with four eight and you can the same logic. So we can perform the masking operation at the end of everything. So we're just going to perform an and operation to mask out those values as explained. All right. So let's try that out on let's say this tensor. So this is the packed version of the tensor. if we unpack it we should retrieve exactly this tensor. Perfect. It should be the same. Yeah. So yeah, if you compare both tensors, they're exactly the same. 1032 and 3333. Perfect. So. Yeah, we just did it now for two bits. We also, applied a very naive approach where we, you know, did two, two, four loops. we also didn't consider the case where the tensor is, has multiple shapes. Here we just considered a simple vector. But yeah, feel free to enhance this algorithm. Make it faster. Try to also extend this algorithm so that it works on different shapes, arbitrary shapes as well. So yeah, feel free to battle test everything. Make sure that you have understood all the internals that I have explained here. To wrap up the lesson, we're just going to see other challenges when you quantize large models, such as large language models. And yeah, we're going to wrap up the whole course with these explanations. So, yeah let's move on to the slides.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform: