Now you will use the Clip model to classify images with zero-shot classification. It's zero-shot because the model will be able to classify the image from among the list of any labels that you give it. This is great because you don't have to fine-tune the model to recognize specific categories. You can just use the model out of the box. Let's try it out. For zero-shot classification tasks, you will use the Clip model from OpenAI. Clip is a multi-model vision and language model. The zero-shot image classification task consists of classifying an image based on your own labels during inference time. For example, you can pass a list of labels such as plane, car, dog, bird, and the image you want to classify. The model will choose the most likely label. In this case, it should classify it as a photo of a dog. Yeah, the photo is a little bit small, but this is indeed a dog. Let's see. Let's see how to do that. For this classroom, the libraries have already been installed for you. If you are running this on your own machine, you can install the Transformers library by running the following. pip install transformers Since the libraries are already installed for us in this classroom, I won't be running this command. Just like in the previous lessons, we need two things. We need the model and the processor. Let's first load the model from the Transformers library. So we need to load the clip model from Transformers. To load the model, we will use the fromPretrain method. And we need to pass the correct checkpoint for this specific task. And we need to pass the correct checkpoint for this specific task. Let's type function And and then we can change the client's родin function to be Ocean 예아 type and «Roi19Studio2» And we need to set the %r c 포 and rop correct processor. Now that the processor and the model is loaded, let's get an image. To do that, just like the previous lessons, we will use the PL libraries and import image class. We will load the image using image.open and we will specify the path to the image. Here's our image, two lovely kittens. Let's try to classify this image. Let's create the labels. So let's put a photo of a cat as the first label and the second label we can put a photo of a dog. Then we need to create the input that will go inside our models. To do that, we will use the processor. We need to pass the text, which are the labels, the image, and the image. All we need to give each obese two orders 함, and an pozimod to τ vian t, straight into our model. We've added our program 1, 2, 3, NaN 1, 2, pto , p série and so on. The response log models double store inputs since the inputs is a dictionary of arguments and let's check what we get we get a very big output but the thing that we are interested in are the logits per image so let's print that and as we've seen in previous lessons to get the probability we need to pass these logits into our softmax layers so let's get the probability just like that by calling the softmax on the output per logits let's get the softmax of the logits and we take the first element which is a single tensor and as you can see now we have something that looks like a probability the first element is a single tensor and as you can see now we have something that looks like a probability the first element is the probability that the image is indeed a photo of a cat and the second one is the probability that the photo is a photo of a dog so to conclude we see that for the label a photo of a cat the probability is practically a hundred percent whereas for the second label the probability is near zero now I invite you to pause the video and try a couple of things you can either maybe change the label so that you put labels that doesn't have anything to do with the image and try to see how the model responds to that you can also upload a new image or just change the labels in the next lesson Younes will show you how to deploy the blip model using hugging face spaces