In this lesson you are practice implementing RAG with Mistral models. Retrieval Augmented Generation is an AI framework that combines the capabilities of large language models and information retrieval systems. Is useful to answer questions or generate content leveRAGing external knowledge. Let's check it out. So why do we need RAG? Large language models can face a lot of challenges. For example, it doesn't have access to your internal document. It doesn't have the most up to date information and it can hallucinate. One of the potential solution for these problems is RAG. At a high level, here's how RAG works. When users ask a question about an internal document or a knowledge base, we retrieve relevant information from the knowledge base, where all the text embeddings are stored in a vector store. This step is called retrieval. Then in the prompt, we include both the use of query and the relevant information, so that our model can generate output based on the relevant context. This second step is called generation. In this lesson, let's take a look at how we can do RAG from scratch. Let's first get an article from The Batch. This is the link of the article we're interested in, and we use an HTML parser called Beautifulsoup to find the main text of the article. Next, let's split this document into chunks. It's crucial to do so in a RAG system to be able to more effectively identify and retrieve the most relevant piece of information. In this example, we simply split our text by character, combining 512 characters into each chunk. And we get eight chunks. Depending on your specific use cases, it may be necessary to customize or experiment with different chunk sizes. Also, there are various options in terms of how you split the text. You could split by tokens, sentences, HTML headers, and others depending on your application. Now with these eight text chunks, let's create embeddings for each of them. Again, we use a helper function to load our API key. You can replace this with your own API key outside of the course environment. We define this def get_text embedding function using the Mistral embeddings API endpoint to get embedding from a single text chunk. Then we use this list comprehension to get text embeddings for all text chunks. Let's take a look at how it looks. This resulting text embeddings are numerical vectors representing the text in the vector space. If we take a look at the length of the first embedding vector, it returns 1024, which means that our embedding dimension is 1024. Once we get the embeddings, a common practice is to store them in a vector database for efficient processing and retrieval. There are several vector databases to choose from. In our simple example, we use an open source vector database faiss. With faiss, we define an instance of the index class with the embedding dimension as the argument. We then add the text embeddings to this indexing structure. When users ask question, we also need to create embeddings for this question using the same embedding models as before. Here we get the question_embeddings. Now, we can retrieve text chunks from the vector database that's similar to the question we asked. We can perform a search on the vector database with index.search. This function returns the distances and the indices of the k most similar vectors to the question vector in the vector database. And then, based on the return indices, we can retrieve the actual relevant text chunks that correspond to those indices. As you can see here, we get two text chunks because we defined k equals two to retrieve the two most similar vectors in the vector database. Note that there are a lot of different retrieval strategies. In our example we used a simple similarity search with embeddings. Depending on your use case, sometimes you might want to perform metadata filtering first, or provide weights to the retrieved documents, or even retrieve a larger parent child that original retrieved chunks belong to. Finally, we can offer the retrieved text chunks as the context information within the prompt. Here's a prompt template where we can include both the retrieved text chunks and the user question in a prompt. Let's again use this Mistral function we have seen before. With the prompt, we get a response. So this is how RAG works from scratch. Feel free to use another Batch article, or combine multiple Batch articles and ask questions about these articles. Also, we just went through a very basic RAG workflow. If you're interested in more advanced RAG strategies, there are several other courses you can learn from. If you're developing a complex application where RAG is one of the tools you can call, or if you have multiple RAGs as multiple tools, you can call, then you may consider using RAG in setup function calling. Let's take a look at a simple example here. Let's wrap up the RAG logic we defined above in a function we call it QA with context. Now we organized this function into a dictionary called names_to_function. As we have seen in the previous lesson. This might not look that useful with just one function, but if you have multiple tools or functions, this is very useful to organize them into one dictionary. Now we we can outline the function specs with a json schema to tell our model what this function is about, where the function name is QA with the context, the required argument is the user question. Now we pass in the user question and the tool to the model. We get the two call results with the function name as QA with the context and the arguments is our user question. Let's extract the function information from the model response. We get the function name and the function arguments. Then we execute the function to get the function results. As an exercise, please feel free to write another RAG function asking question about another Batch article and provide both of them as tools to our Mistral model. Just as an exercise, what if we change the user query to write a Python function to sort the letters in a string? What would happen? It shouldn't use our tool QA with the context, right? Because this question has nothing to do with this tool. So why would this happen? It's because we use the tool choice as any, which of course is a tool use. Now we change it to auto, which means the model decides if we use a tool or not. But now is still uses the QA with context tool. Okay, so maybe this is because our description of the tool is too general. We need to specify the tool to be more specific. Let's add some details in this description. You answer user question about AI by retrieving relevant context. Let's run this. Did not work. Let's change our description to: answer user a question about an AI article by retrieving relevant context about the article. So now the description is more specific. Well we wrote this again. As we can see, is returning the Python function we asked in the content. And now returning tool calls. And this is exactly what we needed. And of course if you know that this question is not supposed to use a tool, we can set to a choice as none here. And it will guarantee that we're not going to call any tools or functions. Now let's try any again. Remember that any forces function call. And now you can see, even though we changed the function description is still using the function call because our tool choice is "any" which forces function calling. Okay. So, the default behavior is auto. And I recommend you to use auto for tool choice. Just a side note that you can use Mistral to do RAG with other tools like LangChain Llama Index and Haystack. Check out our documentation to see how it works. In the next lesson, we'll learn how to create simple UI interfaces with the Mistral models and panel. See you in the next lesson!