So far, our queries have been done in a single forward pass. Given the query, call the right tool with the right parameters and get back the response. But this is still quite limiting. What if the user asks a complex question consisting of multiple steps, or a vague question that needs clarification? In this lesson, you will learn how to define a complete agent reasoning loop. Instead of tool calling it in a single shot setting, an agent is able to reason over tools and multiple steps. You will use the function calling agent implementation, which is an agent that natively integrates with the function calling capabilities of LLMs. All right. Let's have some fun. To get started, we'll set up our OpenAI key. And then we'll set up and import asyncio. 17 00:00:50,483 --> 00:00:50,950 For now we use the same Meta GPT paper as the previous two lessons. In the next lesson we'll expand to the multi-doc setting. And we will also set up the auto retrieval factor search tool and the summarization tool from the last lesson. To make this more concise, we've packaged this into a single line of code that you can just import from the utils module. We now set up our function calling agent. As with the previous notebooks, we use Jupyter 3.5 turbo as our LLM. We will then define our agent. In LlamaIndex, an agent consists of two main components: an agent worker, as well as an agent runner. Think of an agent worker that's responsble for executing the next step of a given agent and an agent runner, or a the overall task dispatcher, which is responsible for creating a task orchestrating runs of agent worker on top of a given task, and being able to return back the final response to the user. So we import both function calling agent worker from LlamaIndex and also the agent runner. And for the function calling agent worker, we pass in the two set of tools, the vector tool, as well as the summary tool. We also pass in the LLM and set verbose equal to true to look at the intermediate outputs. Think about the function calling agent workers primary responsibility as given the existing conversation history, memory and any passed state along with the current user input. Use function calling to decide the next quote or call. Call that tool and decide whether or not to return a final response. The overall agent interface is behind the agent runner, and that's what we're going to use to query the agent. Well, first ask this question. "Tell me about the agent roles in Meta GPT and then how they communicate with each other." So let's trace through the outputs of this agent. We see that the agent is able to break down this overall question into steps. So the first part is asking about agent roles and Meta GPT. And it calls the summary tool to answer this question. Now a quick note is that the summary tool isn't necessarily the most precise. You could argue that the vector tool will actually give you back a more concise set of context that better represents this relevant pieces of text that you're looking for. However, a summary tool is still a reasonable tool for the job. And of course, more powerful models like Four Turbo or Cloud three, Sonnet or Opus might be able to pick the more precise vector tool to help answer this question. In any case, we see that we're able to get back the output, the agent roles in Meta GPT include product Manager, architect, project manager, QA, engineer, etc. And then it uses this to perform chain of thought to then trigger the next question, which is communication between agent roles and Meta GPT. We see we're able to get back an answer about that too. You know, communication between agent roles and Meta GPT is structured and efficient, and we're able to combine this entire conversation history to generate a final response. Communication. And Meta GPT include the agent roles include these. They collaborate in a sequential manner to follow standard operating procedures. And they use a shared message and subscription method mechanism. When you run a multi-step query like this, you want to make sure that you're actually able to trace the sources. And so luckily, similar to the previous lessons, we're able to look at response that source notes take a look at the content of these notes. So this allows you to for instance, inspect the content of the first source note that's retrieved, which is just the first page of the paper. Calling agent query allows you to query the agent in a one off manner, but does not preserve state. So now let's try maintaining conversation history over time. The agent is able to maintain chats in a conversational memory buffer. The memory module can be customized, but by default it's a flat list of items that's a rolling buffer depending on the size of the context window of the LLM. Therefore, when the agent decides to use a tool and not only uses a current chat, but also the previous conversation history to take the next step or perform the next action. So instead of query, we'll do agent dot chat. The first thing we'll ask is "tell me about the evaluation data sets used." Here we see that it uses the summary tool to ask what are the evaluation data sets. Use and Meta GPT. And we see the eval data sets used include human eval MVP and software dev. You'll see an example of this ability to maintain conversation history because we'll ask a follow up question. Tell me the results over one of the above data sets. Obviously, to know what the above data sets are, you have to have that stored in the conversation history somewhere. So let's run this and it's able to translate this query plus conversation history into a query on the vector tool actually. And it asks results over the human eval data set which is one of the eval data sets are used. And it's able to give you back final answer. So we just provided a nice high level interface for interacting with an agent. And this next section will show you capabilities that let you step through and control the agent in a much more granular fashion. This allows you to not only create a higher level research assistant over your RAG pipelines, but also debug and control it. Some of the benefits include greater debug ability and to the execution of each step, as well as steerability by allowing you to inject user feedback. Having this low level agent interface is powerful for two main reasons. The first is debug ability. If you're a developer building an agent, you might want to have greater transparency and visibility into what's actually going on under the hood. Especially, say, if your agent isn't working the first time around. Then you can actually go in and trace through the execution of the agent,see where it's failing, and actually try out different inputs to see if that actually modifies the agent execution into a correct response. Another reason why this could be useful is actually enable richer UXs, where you're building a product experience around this core agent capability. For instance, let's say you want to listen to human feedback in the middle of agent execution, as opposed to only after the agent execution is complete for a given task. Then, you can imagine creating some sort of async queue, where you're able to listen to inputs from humans throughout the middle of agent execution. And if human input actually comes in, you can actually interrupt and modify the execution of an agent as it's going on through a larger task, as opposed to having to wait until the agent task is complete. So we'll start by defining our agent again through the function calling agent worker, as well as the agent runner setup. And then we start using the low level API. We'll first create a task object from the user query. And then we'll start running through steps for even interjecting our own. Now let's try executing a single step of this task. So let's create a task for this agent. And we'll use the same question we used in the very first part of this lesson. "Tell me about the agent roles in Meta GPT and then how they communicate with each other." This will return a task object which contains the input as well as additional state in the task object. And now let's try executing a single step of this task. We'll call agent dot run step task dot Task ID. And the agent will execute a step of that task through the task ID and give you back a step output. We'll see that it calls the summary tool with the input agent roles and Meta GPT, which is a very first part of this question. And then it stops there. When we inspect the logs and the output of the agent, we see that the first part was actually executed. So we call agent that completed steps on the task I.D.. And we're able to look at NUM completed for the task. We see that one step has been completed. And this is a current output so far. We can also take a look at any upcoming steps for the agent. And we can do this through agent to get upcoming steps. Again we pass the task that task ID and to the agent, and we're able to print out the number of upcoming steps for the task. We see that it's also one and we're able to look at a task step object with a task ID as well as an existing input. This input is currently "none", because the way the agent works is actually just auto generates, action from the conversation history and doesn't need to generate an additional external input. The nice thing about this debugging interface is that if you wanted to pause execution now, you can. You can take the intermediate results without completing the agent flow. But let's keep going, and let's run the next two steps and actually try and drafting user input. So let's actually ask: "what about how agents share information as user input." This was not part of the original task query, but by injecting this, this can actually modify agent execution to give you back the result that you want. We'll see that we added the user message to memory, and that the next call here is how agents actually share information in Meta GPT. And we see here that is able to give back the response. The overall task is roughly complete. And we just need to run one final step to synthesize the answer and to double check that this output is the last step, we just need to do step output.is_last. So we see we're able to get back the answer about how agents and Meta GPT share information. And this is indeed the last step. To translate this into an agent response similar to what we've seen in some of the previous notebook cells, then all you have to do is call response equals agent dot finalist response. And we're able to get back the final answer. So that's it for lesson three. You've learned both about the high level interface for an agent as well as a low level debugging interface. And in the next lesson we'll show you how to build an agent over multiple documents.