Building Agentic RAG with Llamaindex

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

In the previous lesson, we built an agent that can reason over a single document and answer complex questions over it, while maintaining memory. In this lesson, you will learn how to extend that agent to handle multiple documents and increasing degrees of complexity. Let's get coding. We'll start with a three document use case, and then we'll expand to an 11 document use case. We'll set up our OpenAI key as well as import nest asyncio. And then the first task is to set up our function calling agent over three papers. So we do this by combining the vector summary tools for each document into a list and passing it to the agent so that the agent actually has six tools in total. So we'll download three papers from Eichler 2024, this includes Meta GPT from last time, but this also includes longlora as well as selfrag. Next, we'll convert each paper into a tool. So if you remember from lesson three we have a helper function called Gap Doc Tools, which automatically builds both a vector index tool as well summary index tool over a given paper. And so the vector tool performs vector search. And the summary tool performs summarization over the entire document. So for each paper, we get back both the vector tool and summary tool, and we put it into this overall dictionary, mapping each paper name to the vector tool and summary tool. Next we'll simply get these tools in a flat list. We'll define 3.5 turbo from OpenAI as our LLM of choice. If we quickly take a look at the number of tools that are going to be passed to the agent, we'll see that the number is six. That's because we have three papers, and we have two tools for each paper a factory and a summary tool. The next step is to construct our overall agent worker. And this agent worker includes the six tools as well as the LLM that we pass out. And now we're able to ask questions across these three documents or within a single document. For now, let's quickly ask a question about Longlora. "Tell me about the eval data set. Use longlora, and then tell me about the eval results." We get back the answer that the one of the eval data sets use is a feature 19 test split. And that we're able to look at the eval results for our long haul models. The next question we can ask is "give me a summary of both selfrag and Longlora." So this will actually allow us to do summarization across two papers. So first we call the summary tool for self rag with the input of its name variable to get back the output describing what the paper is about. We see that the agent then calls a longLora summary tool with the input longlora, and then we get back an overall summary of longlora. The final response is that we're able to get back both a summary of selfrag, as well as longora. If you want to try out some queries on your own, you can try out any combination of these two papers or even three papers and ask for both summaries as well as specific information within the papers to see whether or not the agent is able to reason about the summary and vector tools for each document. Now, let's expand into a more advanced use case. Here it will actually have 11 Eichler papers, and so we'll download 11 research papers from Eichler 2024. This includes papers like Meta GPT, LongLora, LoftQ, Swebench, SelfRag, as well as a few others. Similar to the previous section, we will now build a dictionary mapping each paper to its vector and summary tool. This section can also take a little bit of time, since we need to process, index and embed 11 documents. Now let's collapse these tools into a flat list. This is the point at which we need a slightly more advanced agent and tool architecture. The issue is that let's say we try to index all 11 papers, which now includes 20 tools. Or let's say we try to index 100 papers or 1000 papers or more. Even though LLM context windows are getting longer, stuffing too many tool selections into the LLM prompt leads to the following issues: One is, the tools may not all fit in the prompt, especially if your number of documents are big and you're modeling each document as a separate tool or a set of tools. Costs and latency will spike because you're increasing the number of tokens in your prompt, and also the outline can actually get confused. The LLM may fail to pick the right tool when the number of choices is too large. A solution here is that when the user asks a query, we actually perform Retrieval Augmentation, but not on the level of text, but actually on the level of tools. We first retrieve a small set of relevant tools, and then feed the relevant tools to the agent reasoning prompt instead of all the tools. This retrieval process is similar to the retrieval process used in RAG. At its simplest, it can just be top k vector search. But of course you can add all the advanced retrieval techniques you want to filter out the relevant set of results. Our agents let you plug in a tool retriever that allows you to accomplish exactly this. So let's show you how to actually get this done. First, we'll want to index the tools. LlamaIndex already has extensive indexing capabilities over general text documents. But since these tools are actually Python objects, we need a way to convert and serialize these objects to a string representation and back. This is solved through the object index abstraction in LlamaIndex. So we'll define an object index and retriever over these tools. We see that we import a vector store index, which is our standard interface for indexing text. But then we wrap this with object index. And to construct an object index, we see we directly plug in these Python tools as input into the index. You can retrieve from an object index through an object retriever. This will call the underlying retriever or from the index, and return the output directly as objects. In this case, it will be tools. Now that we've defined the object retriever, let's walk through a very simple example. Let's ask: "Tell me about the eval data set used a Meta GPT and also a Swebench." Now let's take a look at the first tool in this list. We see that we actually directly retrieved a set of tools, and that the first tool is the summary tool for Meta GPT. If we take a look at the second tool, we see that this is a summary tool for an unrelated paper to Meta GPT and Swebench, so, of course, the quality of retrieval so dependent on your embedding model. However, we see that the last tool that's retrieved is indeed the summary tool for Swebench. Now we are ready to set up our function calling agent. We note that the setup is pretty similar to the setup in the last lesson. However, just as an additional feature, we show that you can actually add a system prompt to the agent if you want. This is optional. You don't need to specify this, but you can if you want, just additional guidance. If you want to prompt the agent to output things in a certain way, if you want it to take into account certain factors when it reasons over these tools. And so this is an example of that. Now let's try asking some comparison queries. We ask: "Tell me about the eval data set. Use in Meta GPT and compare it against Swebench." We see that it calls both the summary tool for Meta GPT as well as the summary tool for Swebench. Is able to get back results for both. And then it generates a final response. Here. Now, as a final example, let's compare and contrast the two Lora papers, Longlora, as well as LoftQ and analyze the approach in each paper first. We see that the agent is executing this query, and the first step it takes is it takes this input task and actually retrieves the set of input tools that help it fulfill this task. And so through the object retriever, the hope is that it actually retrieves longlora and Loftq query tools in order to help it fulfill its response. So if we take a look at the intermediate outputs of the agent, we see that it is able to have access to relevant tools from LongLora and also LoftQ. We see that first calls summary tool LongLora with arguments, approach on LongLora, and you able to get back a summary of the approach. Similarly, you're able to get back the approach and LoftQ you by calling summary tool LoftQ. The final LLM response is able to compare these two approaches by comparing the responses from these two tools and combining it to synthesize an answer that satisfies the user query. So that concludes our lesson. So now you should be equipped with the right tools to build agents not only over a single document, but also over multiple documents, enabling you to build more general, complex context augmented research assistance that can answer complex questions.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

Loading...

Learn Code

Next Lesson

Building Agentic RAG with Llamaindex

Introduction

Router Query Engine

Tool Calling

Building an Agent Reasoning Loop

Building a Multi-Document Agent

Conclusion

Course Feedback

Community

0%