We have now all our building blocks to build our quantizer. So the quantizer is going to be a quantization pipeline that will, iterate over all linear modules of your original model and replace them with our new W8A16, linear layer module and call quantize on using the original weights. Yeah. So let's do that step by step. So let's first build, a method called replace linear with target that is going to loop over the model, identify the modules that are an instance of torch that is not linear and replace it with the new module. Yeah. So this is going to be the signature of our method. So it's going to take a module or also model. But since the method is going to be recursive I decided to call it module so that yeah it's clear that you can pass a model, but you can also pass a module. Target class is yeah, the target class, of the new class that you're going to set, in replacement to the linear layer and module name to exclude is the name of the module that we're going to exclude in this replacement logic. So we're going to see later for language models that usually it's better to keep the last module unquantized for better results. So this is going to be useful for you know this specific use cases. So we're going to simply loop over the modules named children. And if the sub module is an instance of an nn linear and you don't have any name that matches the names that are inside the module name to exclude, then we're going to move forward with the module replacement. So we're going to get the bias of the sub module here, because we're going to use it to create our new target class. And then we can create our new module. Which is going to be target class of. So in features out features should be the same as the linear layers. The original layers one bias. We're just simply going to check if old bias is not "none". Then we're going to use the same data type as the submodules weight. And we're going to call set attributes to the parent module. We're going to replace the current attribute of the module by calling set attribute module name. Because name gives you then the name of the current attribute we're going to modify and then new module. So this is simply going to replace the parent modules attributes that has the name "name", with the new module. And if the old module has a bias we're going to explicitly set the bias of the new module to old bias. And yeah and as I said previously, we're going to call that method recursively. So if we're not in this case we're going to call that method again. But this time on the child module using the same arguments. Okay. So let's let's try this this method out. So we're going to create a dummy module for testing purposes. Yeah with two linear layers, one language model head, which is usually the last module in a transformer model. Since the method modifies the model in place, we're going to create new two models. So one where we're going to test out the module name to exclude feature, and the other one which is just going to replace all linear layer instances with the new one. So let's try out the first case. So yeah we just have to call replace with target model one our target class. So this time we don't want to replace the LM head with the new class. So perfect. It worked. And we were able to replace all linear layers with new ones. Except for the Lm head. And let's see what happens if we pass an empty list. Yeah. So as expected, for the second case, we replaced all instances of linear layers within with the target class. Yeah. So now let's just tweak a bit. this method, in addition to replacing, all linear layers with target class, we're also going to quantize the new module once we have replaced the old module with the new one. So just going to copy this method and slightly replace it in order to quantize the new module as well. So here we can also retrieve the old weight. Perfect. So I think the quantization should happen here. Once we have replaced the module with the new module we can get that module again with get attribute module name. And at this point this should return the new module and call quantize and pass the old weight. Let's also update the recursive function call. So let's try out again just to see if it works. Using a new dummy model. Perfect. So yeah, it seems that it worked. Great.