In this lesson you build a recommendation and you're going to use reading to build a what do you vector recommender system that can recommend rather than items by searching over different look, I do presentations of items. All right, let's push the boundaries of innovation. First, let's think about how searching is different from recommendation. Search is objective. It doesn't depend on who is searching. You don't need to worry about personalizing the results to the user as long as the results must occur. Semantically, the search can be considered successful. So if a user searches for basketball jersey returning any of the following jerseys would be correct. Recommendation, on the other hand, is subjective. It's very dependent on who is asking the question. Let's say your store sells basketball jerseys and a customer ask for a jersey. You need to know more about the customer to provide the right response. So for example, if this Lakers fan comes over to your shop, then the best jerseys that you can recommend are these Lakers shirts. On the other hand, if you have a Boston Celtics fan coming over, then the best recommendations are the Celtics jerseys. Multimodal Data offers a great solution to personalizing a recommendation system to a user by allowing you to capture the interest using all available modalities. So let's see an example. so if I asked you for your perfect burger, you could write down a description for me in text. Or you could show me in an image. Or you could give me nutritional facts about your perfect burger. I could use all this information of your perfect burger to identify what you like and dislike. I could also use these separate modalities to personalize the recommendations To do this, you could capture all this information to a vector or even multi vector representation where vector comes from a specialized model and allows you to search over that modality. This is important because some people might buy due to the way their product looks or people might buy because of the detailed descriptions where someone else might pick it based on the nutritional information. All right. So let's now see all of this in practice. In this lab, you load an embedded multimodal movie data with two separate animal models, then run text and multimodal queries in different vector spaces. Great. That's go. so like with all the previous lessons, you're going to start with ignoring all the unnecessary warnings. And again, we are going to load necessary API keys. This time, though, we are going to use two different keys. One for the multimodal embeddings, which is the same model as we use before, but we also going to use a text embedding from OpenAI here. and then we're going to load them just like this. And you are going to connect to the embedded instance of weaving again. this time we're going to use the two different modules, one for text, one for multimodal. And we have to pass both of those API keys as part of the headers. And by running this you can connect to an embedded within instance. And now that you are connected to deviate, you're going to create a new collection called movies. and then to help you, you already have here predefined properties. so we can expect title, overview, vote, average, release year, etc., etc.. And then this is the really important one, the path where the images will go. and that's one step. And because this is a multi vector collection, first you need to configure a name vector for your text embeddings. and you're going to call this vector space text vector. And specifically for this named vector you want to vectorize on title and overview. and ignore all the other properties. And for the second vector space, you're going to add a second named vector. And then this one will be the multimodal vector. And we want to run it on the image fields which will be inside the poster field. And together with the, the specific model that we need to run and all the additional information. and just to clarify the name vector space, will be called poster vector. And with this we can execute it. And this will create a new collection for you. In case you need to rerun and recreate this collection again. you can, uncommon this line, which will delete the collection for you, and then you can recreate it. just be careful with this, because if you do run this, you also lose all the data that was in that collection at the time. When you run this. And now that you have a ready collection, you can start looking into importing some data. so inside this project, there is a Movies data Json file which list 20 different movies together with posters for them. So let's have a look at, what we can see in there. And you can see that there is a link to the poster, which is in here called, Backdrop path. And in here, if I scroll through the columns, you can see that you already have all the necessary data. Like the original title, the overview, the popularity, but and then really important one, the poster path, etc., etc.. Next, you need to load the helper function, which will take a file path and then convert it to a base64 representation, which is something you've done already in previous lessons. And now that you have all the building blocks, it's time to import the data. so one of the first things you need to do is grab the collection, the movies one, and then it's over here. And then on that collection, you want to create a batch object with the rate limit 20, which basically means that you want to only upload 20 images per minute, which can help us with rate limits of the service. And then using the data frame, we can iterate through all the movies in the preloaded data set. there are only 20, so that also matches the rate limit that we're after. And then this, statement here is quite helpful, because using the generate UID, which is imported at the top, we can provide a movie ID and if that, a movie ID already existed, in our data set, then we can skip it. So even if you rerun this cell, you're not going to import the same movie twice. which is super helpful. And, we're good to go. next. As you iterate through all the objects in the data frame, you want to pass, the, the path to the, poster file, which is inside the process folder, and then go and one by one. convert that into a base64 representation. Then using this information, you need to construct a movie object which will contain the title, overview, vote, average ID, poster path, and poster be 64. And the final thing you need to do now is just call box. Add object. so passing the movie object as the properties and then generate a Uuid from the movie ID, which is then used in the earliest stage, to verify that we're not ever importing the same thing. And, with this, we can run it and import all our images. So what's happening right now? All these 20 movies have been sent to two different vectorizer. so in here, we are sending the posters into the multimodal model. we do vectorize. All of them. but also the title and description has been sent to the text model from OpenAI to vectorize the title and the description. And this should take about a minute. And, after that, you should be good to start querying this data set. and as a final step, you need to check if the input went well. so the way you can do it, you can go into the movies batch object and look for failed objects. If there are any failed objects at all, because this is a list. you can print it, or maybe print the first one. And if there's no errors, we should get a happy import, complete with no errors message, which is great. And now it's time for you to get to the fun part where you can run some queries on the data sets. so let's search for some movies about lovable, cute pets. And this query is very similar to what you've run before. However, because we are using a multi vector space or the collection class configuration for two different vectors. you need to specify which of the vector spaces you want to search through. So in this case for this query you want to search across the text vector space. and when that executes, you want to print the results. So it's good to print the title, the overview, and also display a poster from the objects that we got back that matched to the query that we searched through. please note that this specific query would only try to match it based on what's in the title and overview, because that's what the text vector was constructed on. All right. Let's run this. and you can see the first result for me is the 101 Dalmatians, together with the description and the poster from the movie. I also got the poster from Stuart Little too, together with the title and its description. And the third response was A Bug's Life and together with the description for it again and the poster from that movie. And now let's try another query on the same vector space. But this time let's look for epic superheroes. And you can see we got an Iron Man Incredibles two and Shazam! and one thing that you probably are thinking right now is like, okay, this query is based on the text vector embeddings, but, we still get in here, posters. The thing is that, yes, that is true. The query is still only running on the title and, overview. however, the objects that we're getting back contain the information about the poster. So that's how we were able to display them. But the posters were not used for the query just yet. This is coming in the next step. so to actually search across the posters what you need to do is run similar query. But on the poster vector vector space. by running this then would will be trying to match is movies about lovable cute pets where the posters reflect this query. So let's run this and see what we get back. And you can see here, the results are very similar, but they're not exactly the same. However, they do match the original query really well. So the lovable cute pets are visible across all three posters that we got back here. So let's try another query here to again search to the poster vector space. But this time you want to look for epic superheroes, so that you can compare like, and like the results to what you got back when you searched through the text vector space. and again, the results are very similar. but they're not exactly again, the same because the information that we got in the posters are slightly different to what we have in the title and description of the movie. And you'd probably get even more differences between what you get from one and the other. But in this case of the data set that we have, we only have 20 different images. So really there's like five posters of superheroes, five with animals, etc., etc.. so there's a high likelihood that we repeat those. But if you had like hundreds of thousands, then probably both would match really well, but wouldn't necessarily return the exactly same object. and another thing that you can do with this data set is to use images to search across the posters on the poster vector space. so let's start with this image of this spooky situation. So to search using the spooky image, what you need to do is call a new image, with this spooky image, convert to base64. you want to search across the poster vector space, and then, when that's finished, let's just display the title and the poster to make the output a bit more cleaner and then execute this. and what you got back are posters for spooky movies like The Nightmare Before Christmas, Hocus Pocus. And scream. And as a final example, let's try to search for some superheroes based on this picture. and the search is super similar, but this time just use a different image. And what we get back here. Yeah. Is Batman versus Superman. You also get Iron Man and Ice Age. Well, let's face it, the squirrel chasing the nut is definitely a superhero of mine. And now that you're done with all the querying, you can close the client and complete the lesson. Well, that wraps up this lesson and the course. And in this lesson, what you learn is how to use a multi vector space. And then search across two different vector spaces, one being the text vector space another multimodal space. And then understand the data from different perspectives. And as you could see some of the results could be similar across different vector spaces. But also each vector space and each model could give you different types of results which can be very powerful. It could be very useful. And I'm pretty sure you are going to come up with many ideas and many use cases on how to take this and make the most of it.