Building Agentic RAG with Llamaindex

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

In a basic RAG pipeline, LLMS are only used for synthesis. The previous lesson showed you how to use our LLMs to make a decision by picking a choice of different pipelines. This is, a simplified form of tool calling, and in this lesson, we'll show you how to use an LLM to not only pick a function to execute, but also infer an argument to pass through the function. All right, let's start coding. One of the promises of our LLMs is their ability to take actions and interact with their external environment. Unnecessary component to make this possible is a good interface for the LLMs to use. We call this tool calling and a basic RAG pipeline, LLMs are only used for synthesis. The previous lesson showed you how to use our LLMs in a slightly more sophisticated manner, by using it to pick the best query pipeline to answer the user query. This is a simplified form of tool calling. In this lesson, we'll show you how to use an LLM to not only pick a function to execute, but also infer an argument to pass to the function. This allows the LLM to figure out how to use a vector db instead of just consuming its outputs. And the final result, is that users are able to ask more questions and get back more precise results than standard RAG techniques through tool calling. So let's get started. Similar to before, we are first going to set up our OpenAI queu. We are also going to set up and import our nest async IO module. Next, we'll give you a basic overview and introduction to tool calling. We'll show you how to find a tool interface from a Python function. And the LLM will automatically infer the parameters from the signature of the Python function using LlamaIndex abstractions. To illustrate this, let's first define to toy calculator functions and show you how tool calling works. We'll define an add function and also a mystery function. The core abstraction in LlamaIndex is the function tool. This function tool wraps any given Python function that you feed it. And so we see that the function tool takes in both the add function defined here, as well as a mystery function which is just x plus y times x plus y. You see that both add and mystery have type annotations for both the x and y variables, as well as the docstring. This is not just for stylistic purposes, this is actually important because these things will be used as a prompt for the LLM. Our function tools integrate natively with the function calling capabilities of many LLM models, including OpenAI. To pass the tools to an LLM, you have to import the LLM module and then call, predict and call. This code snippet imports the OpenAI module explicitly, and you see that the model is 3.5 turbo. And then we call the predict and call function on top of the LLM. What predict and call does is it takes in a set of tools as well as an input prompt string or a series of chat messages. And then it's able to both make a decision of the tool to call, as well as call the tool itself and get back the final response. Here we see the intermediate steps calling function mystery with arguments. You know x equals two, y equals nine. So we see that at least I call the right tools and also infer the right parameters. And the output is 121 and 11 times 11 is 121. So we got back the right answer. Note that this simple example effectively is an expanded version of the router. Not only does the LLM pick the tool, but also decides what parameters to give to the tool. Let's use this key concept to define a slightly more sophisticated agentic layer on top of vector search. Not only can the LLM choose vector search, we can also get it to infer metadata filters, which is a structured list of tags that helps to return a more precise set of search results. We'll use the same paper, Meta GPT, as before, and this time let's pay attention to the nodes themselves or the chunks, because we'll take a look at the actual metadata attached to these chunks. Similar to the last lesson, we'll use simple directory reader from LlamaIndex to load in the parse representation of this PDF file. Next, similar to the last lesson, we'll use the sentence splitter to split these documents into a set of even chunks with a chunk size of 1024. Each node here represents a chunk, and let's actually take a look at the content of an example chunk. So we'll take a look at the content of the very first chunk. And we can do this from doing node dot Get content. We set metadata mode equals all, which is a special setting to actually not just enable you to print out the content of the node itself, but also the metadata attached to the document which is propagated to every node. Here we see that once we print this out, we not only get back a parce representation of the front page of the paper, we can also see the metadata attached at the very top. So this includes a few things. This includes page label equal to one. This includes file name, metagpt dot PDF file type, file size and credit and dates. We'll pay special attention to the page labels because you know, for instance, if we actually go in and try out a different node, we see that, we actually get back a different page number. And so we actually add a page number annotation to every chunk. Next we'll define a vector store index over these notes. Similar to the last lesson. This will basically build a RAG indexing pipeline over these nodes. It will add an embedding for each node and it'll get back a query engine. Differently from last time, we can actually try querying this RAG pipeline via metadata filters. Just to show you how metadata filtering works. So we import this object called metadata filters. And then we simply specify a filter where the page label is equal to the value two. In addition to top k equals two. So once we define this as a query engine and we call what are some high level results of Meta GPT, if we take a look at the source nodes, once we run this and we call what are some high level results a meta gpt and we get back a response. We'll first take a look at the response string. Which outlines the overall results Meta GPT. But crucially we'll take a look at the page numbers. And we see that as we iterate through the source nodes, we can actually print out the metadata attached to these source nodes. And we see that here are the page labels equal to two. And so we see that it's able to properly filter out the page numbers to only restrict the search to the set of pages where the page label is equal to two. The last section of this lesson will allow us to wrap this overall retrieval tool into a function. This function takes in both the query string and page numbers as filters. The LLM can actually then infer the page numbers to filter for a user query, instead of actually having the user manually specify the metadata filters. Just a quick note over here is that the metadata is not only limited to page numbers. As you've seen, you can define whatever metadata you want like section IDs, headers, footers, or anything else through LlamaIndex abstractions. The ability to use many metadata filters is especially prominent and better models like GPT-4. So we highly encourage you to check that out. Here we'll define a Python function that encapsulates this. We define a function called vector query, which takes in the query as well as page numbers. This allows you to perform a vector search over an index, along with specifying page numbers as a metadata filter. At the very end, we see that we define a vector query tool equals function tool from defaults. So we pass in the vector query function into a vector query tool, which allows us to then use it with a language model. So let's try calling this tool with an LLM, specifically 3.5 turbo. We'll find that the LLM is able to infer both the string as well as the metadata filters. We do product and call on the vector query tool and ask the same question high level results of Meta GPT as described on page two. Here we see that the LLM is able to formulate the right query high level results of Meta GPT as well as specify the page numbers, which is two. And we get back the correct answer. Similarly, as before, we can just quickly verify the source notes. And we see that the page label of the source notes there is one source note returned and it has a page label of two. Finally, we can bring in the summary tool from the router example in the first lesson. And we can combine that with the vector tool to create this overall tool picking system. So this code just sets up a summary index over the same set of notes and wraps us in a summary tool similar to lesson one. Now let's try a tool calling again. So the LLM has a slightly harder task of actually picking the right tool in addition to inferring the function parameters. We asked what our comparisons with chapter five on page eight. We see that it still calls a vector tool, actually with page number equal to eight, and it's able to still give back the right answer. We can verify this by printing out the sources. Lastly, we can ask the question what is the summary of the paper to show that the LLM can still pick the summary tool while necessary. And we see that it gives back the right response. So that's it for lesson two. And in the next lesson we'll show you how to build a for agent over a document.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

Loading...

Learn Code

Next Lesson

Building Agentic RAG with Llamaindex

Introduction

Router Query Engine

Tool Calling

Building an Agent Reasoning Loop

Building a Multi-Document Agent

Conclusion

Course Feedback

Community

0%