Building Agentic RAG with Llamaindex

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

In this lesson, you will start with the simplest form of agentic RAG, a router. Given a query, the router will pick one of several query entrants to execute a query. You'll build a simple router over a single document that can handle both question answering as well as summarization. Let's dive in. Given a query, the router will pick one of several query engines to execute a query. So this gives you some dynamic query understanding capabilities. We'll show you how to build a very simple router over a single document that can handle both question answering, as well as summarization over that document. The first three lessons show you how to build a drone to capabilities over a single document. And the last lesson we'll show you how to build a multi-document agent. So let's get started. The first thing we'll do is to import, our OpenAI key. And to do this we'll define a helper function. Next we need to import a special module called nest_asyncio. The reason for this is that Jupiter runs an event loop behind the scenes, and a lot of our modules use async and to make async play nice with Jupiter notebooks, We need to import this. The next step in terms of setup is to load in a sample document. Here we use Meta GPT, a recently accepted oral paper Eichler 2024 on a cool new multi-page framework. You should definitely check out the paper if you're interested or if you're interested in uploading your own documents, please feel free to do so. Next we'll import the sentence splitter from LlamaIndex The next piece here is we are going to use our simple directory reader module and LlamaIndex to read in this PDF into a parse document representation. In order to split these documents into even sized chunks, and we'll split on the order of sentences. So we set the trunk size to 1024. And we call splitter dot. Get nodes from documents to split these documents into nodes. Next step is optional. And this allows us to find an outline and embedding model. And you can do this by specifying a global config setting where you specify the alignment and embedding model that you want to inject as part of the global config. By default, we use 3.5 turbo and tex embedding to oh two in this course. But of course this allows you to have the groundwork to basically inject your own LLMs as well as embeddings. So we define the settings object and settings.llm=OpenAI and settings.embed_model=OpenAI. Embedding. Now we're ready to start building some indexes. Here we define two indexes over these nodes. This includes both a summary index and a vector index. A refresher is that we can think of an index as a set of metadata over our data. You can query an index at different indexes will have different retrieval behaviors. A vector index, for instance, indexes nodes via text embeddings and its core abstraction and LlamaIndex, and a core abstraction for building any sort of RAG system. Querying a vector index will return the most similar nodes by embedding similarity. A summary index, on the other hand, is also a very simple index, but querying it will return all the nodes currently in the index, so it doesn't necessarily depend on the user query, but will return all the nodes that's currently in the index. To set up both the summary and factor index. It's a matter of importing these two simple models. Now let's turn these indexes into query engines and then query tools. Each query engine represents, overall query interface over the data that's stored in this index, and combines retrieval with LLM synthesis. Each query engine is good for a certain type of question, and this is a great use case for a router, which can route dynamically between these different query entrants. A query tool now is just the query engine with metadata, specifically a description of what types of questions the tool can answer. So here we define the summary query engine equals summary indexed as query engine. And then also a vector query engine equals vector index. We can see that the query engine is derived from the each of the indexes. We can see that for the summary query engine we set use async equals true to basically enforce faster query generation by leveraging async capabilities. Next a query tool is just the query engine with metadata. It's specifically a description of what types of questions the tool can answer. And we'll define a query tool for both the summary and vector query engines. Through this code snippet right here, you see that the summary tool description is useful for summarization questions related to Meta GPT and the vector tool description is useful for retrieving specific context from the Meta GPT paper. Now that we have our query engines and tools, we're ready to define our router. LlamaIndex provides several different types of selectors to enable you to build a router. And each of these selectors has distinct attributes. The LLM selector is one option, and it involves prompting an LLM to output a json that is then parsed and then the corresponding indexes are queried. Another option is to use our pydantic selectors instead of directly prompting the LLM with text, we actually use the function calling APIs supported by models like OpenAI, to produce pydantic selection objects, rather than parsing raw json. For each of these types of selectors, we also have the dynamic capabilities to let you select one index to route two, or actually multiple. Let's try an LLM unpowered single selector called LLM single selector. So we import two modules. We import a router query engine as well as the LLM single selector. You see that the router query engine takes in a selector type as well as a set of query engine tools. The selector type you see is just the LLM single selector, which means that it prompts the LLM, makes a single selection, and the query engine tools include the summarization tool, as well as the vector tool. Now let's try testing out some queries. The first question we'll ask is "what is a summary of the document?" The verbose output allows us to view the intermediate steps that are being taken. We see that the output includes selecting query engine zero. Useful for summarization questions related to Meta GPT. This means that the first option, the summary tool, is actually picked in order to help answer this question. And as a result, you're able to get back a response. The document introduces Meta GPT, a metaprogramming framework for LLM based multi-agent collaboration. And this gives an overall summary of the paper and is synthesize over all the context in the paper. So the response comes with sources. And to inspect the sources, we can take a look at response dot source nodes. When we take a look at the length of response dot source nodes, we see that the length is equal to 34. Coincidentally, this is exactly equal to the number of chunks of the entire document. And so we see that the summary query engine must have been getting called, because the summary query on trunk returns all the trunks corresponding to the items within its index. Let's take a look at another example. We'll ask the question: "How do agents share information with other agents?" So let's ask this against the overall router query engine and take a look at both the verbose output as well as the response. Here, we see that we actually select query engine one, and LLM gives some reasoning as to why actually pick the vector search tool as opposed to the summary tool. It's because the focus is on retrieving specific context from the Meta GPT paper, where, agents sharing information with other agents is probably located within a paragraph of that paper. And you see that it's able to find that context and generate a response. You'll be able to utilize a shared message for where they can publish structured messages. And that basically helps to conclude lesson one. To put everything together, all the code above can be consolidated into a single helper function that takes in a file path and builds the router query engine with both factor search and summarization over it. And we've included this as a helper function for you to try out and if you want to load in your own PDFs, by all means do so, and take a look at the results. So this is an a utils module called get router query engine. And we just do query engine equals get router query engine Meta GPT PDF. And we can test out an example question. So our example question is: "Tell me about the ablation study results." We query the query engine and get back the response. In this case you, see that we also look at Query Engine one because the ablation study results reference specific context from Meta GPT paper, and so we want to do a vector search. And we're able to get back a final answer. So that's basically it for lesson one. And we'll see you in lesson two.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

Loading...

Learn Code

Next Lesson

Building Agentic RAG with Llamaindex

Introduction

Router Query Engine

Tool Calling

Building an Agent Reasoning Loop

Building a Multi-Document Agent

Conclusion

Course Feedback

Community

0%