RAG systems start by using vector representations of text to match your prompt to relevant sections within the unstructured data. So, in order to be able to find relevant text in a knowledge graph in the same way, you'll need to create embeddings of the text fields in your graph. Let's take a look at how to do this. So, to get started, you'll import some packages as we did in the last notebook, and we'll also set up Neo4j. You'll load the same environment variables that we have in the previous notebook, but now including a new variable called OpenAI API Key, which we'll use for calling the OpenAI Embeddings model. And finally, as before, we'll use the Neo4j Graph class for creating a connection to the Knowledge Graph so we can send it some queries. The first step for enabling vector search is to create a vector index. Okay, in this very first line, we're creating a vector index, we're going to give it a name, movie tagline embeddings. And we're going to add that we should create this index only if it doesn't already exist. We're going to create the index for nodes that we're going to call m that have the label movie, and on those nodes for the tagline property of the movies. We're going to create embeddings and store those. We have some options while we're setting up the index as well that we're passing in as this index config object right here. There's two things of course that are important. It's how big are the vectors themselves, what are the dimensions of the vectors. Here it's 1536, which is the default size for OpenAI's embedding model. And OpenAI also recommends using cosine similarity, so we're specifying that here as the similarity function. That cipher query is nice and straightforward. Looks like this. We can see that there's the name that we specified before. We can see that it's ready to go and that it's a vector index. So fantastic. We're going to match movies that have the label movie, and where the movie.tagline is not null. In this next line, we're going to take the movie and also calculate an embedding for the tagline. We're going to do that by calling this function that's called genai.vector.encode. We're passing in the parameter which is the value we want to encode. Here that's movie.tagline. We're going to specify what embedding model we want to use. That's OpenAI. And because OpenAI requires a key, we're also going to pass in a little bit of configuration here. It says here's the token for OpenAI. It's going to be this OpenAI API key. Now, This value here is what we call a query parameter. Okay. This query may take a few seconds to run because it calls out to the OpenAI API to calculate the vector embeddings for each movie in the dataset. So let's pull out from that result just the tagline itself. You can see what that is. Since we only have one movie that we did the tagline, it's welcome to the real world. Super. And let's also take a look at what the embedding looks like. I'm not going to show the entire embedding. We'll just get the first 10 values out of it. Okay, great. That looks like a good embedding to me. And for the last step in verifying what we've got for the embeddings, we'll make sure that those embeddings are the right size. We're expecting them to be 1536. So, Great. The vector size is 1536, just as we expected. So now, we can actually query the database and We'll start by specifying what the question is we want to ask and find similar movies that might match that question. Remember, we've done vector indexing on the taglines. Here, we're going to start with a call towards calculating and embedding using that same function we had before. We're going to do that by saying, with this function call, JNAI vector and code and a parameter for the question that will pass in. We want to calculate an embedding using the OpenAI model and the OpenAI of course needs an API key so we're going to pass that in as well. The result of that function call we're going to assign to something we call question embedding. We're then going to call another function for actually doing the vector similarity search itself. That's the name of the index that we created earlier. And this is another parameter that is interesting. We just wanted the top k results. And then, of course, we're going to pass in the embedding that we just calculated. Do the similarity search and actually give us those results. Now from the results we want to be able to yield the nodes that we found, and we will rename those as movies, and also what the similarity score was. With that we're going to return the movie title, the movie tagline, and the score. We're passing in some query parameters for the OpenAI API key itself, the question that we asked that's going to be calculated into an embedding, and here the top case is 5, so we only want the 5 closest embeddings. Cool. So, we've got movie titles like Joe vs. the Volcano, You can see through all of these tag lines, that's a pretty good match for movies that are about laws. We'll save that question and we'll run this query again. Oh yeah, Castaway Ninja Assassin. That sounds like something adventurous. Duel vs. the Volcano. Apparently, it's about love and adventure. Maybe that's a good one to have on your Netflix list. This is a good point to actually pause the video and try just changing that question to explore the movie data set yourself, asking for different movies with different qualities and seeing what kind of results you get. Now, in all the examples so far, you've been working with an existing database. But to build your own RAG applications, you'll need to build one up from scratch to represent and store your data. Let's take a look at how to do that in the next lesson.