but discovered at the end that it did not have memory of past questions or results. In this next lesson, you'll learn about some techniques you can use to remedy that by adding conversational capabilities to your chain. This can get a bit involved, so we'll go through the steps here before jumping right into the code. The chain you built in the last lesson would go through the steps shown in the diagram here. First, the chain queries a vector database with a question. The vector database returns four relevant documents, referred to as context and that context along with the question,would be added to the prompt and sent to the LLM. Here, we asked for the prerequisites of the course, and it provided a good answer grounded by the retrieved context. Then, we tried a new question. Can you list them in bullet point form? And the LLM replied that it didn't see a specific question or prompt. So what went wrong here? Well, the question is referencing past information, but the LLM has no memory of that past information and can't answer. So how do we fix that? One way is to rewrite or rephrase the new question while being aware of the past chat history into a standalone question, free of external references. If we did that, the LLM might rewrite the question to be something like, can you list the prerequisites for this course in bullet point form? The chain above could fetch the correct data from the vector DB given that input, and the LLM could generate a good suitable answer. So how do we do this? The first step is to save the chat history. You'll see how to do this in code a bit later, but the basic idea is that each time we pass through the chain, you'll store in a history variable the question being asked as the human message and the LLM's formatted response as the AI message. Later, you can make this available in a prompt for the LLM to have as additional context. Now, you'll need to add some logic to rephrase the question. You'll do this with an LLM of course. Let's first clean this up, and make room for the new start to this chain. Now, you'll add a prompt, LLM, and output parsing chain that will form a standalone question. The prompt will be something like, given the following conversation and a follow-up question, rephrase that follow-up question to be a standalone question. This prompt now includes our chat history and the original question. Now, when asked, can you list them in bullet form, it can reply with a new rephrase standalone question. Could you please list the prerequisites for this course in bullet point form? This is provided to our original retrieval chain, which has a slightly modified prompt that includes the history, and the rephrase standalone question. Now, given an input like, can you list them in bulleted form, it can provide an answer like the one below. Note here that we're creating a chain that we can use repeatedly for question after question, even for the first question where we don't have any history yet. We can see that our steps will work on the first question. It just won't be reformatting much. Ok, let's jump into the code and get started. So to start with, you'll need to rebuild some of the components from your previous lab. You'll start by loading your configuration. Then you load your text splitter. Then you'll load your vector store with your documents. And then you'll define your retriever. And then you'll build a document retrieval chain that extracts the input questions, sends that to the retriever, and finally converts the output to a string. Then you can build your retrieval chain which will call an LLM with the question and the results of the vector database lookup. First you'll build a prompt template. Notice it has an input variable for context from the vector database and a spot for the question. Now you initialize your model and then you build your retrieval chain. It gets the context from the document retrieval chain and the input question, and passes all of that to the prompt. And finally, to the model and the output parser. Great! So now we are ready to make our new chain. So we'll make a new chain that's going to be purely responsible for rephrasing the user's input that may contain references to past chat history into a question free of references that both our vector store, and later LLM calls can follow. And to do that, we will define a new prompt using something called a messages placeholder. And this is what we'll use to pass around history. So we're going to use a more complicated way of declaring our template from messages because we want to pass again history in as a series of messages. And we'll show that here. So you can see here, we've declared a system prompt template up here. Given the following conversation and a follow-up question, rephrase the follow-up question to be a standalone question. That's going to be a system prompt. And we're putting this placeholder where we're going to pass messages, actual chat messages, as history. And then we're going to have a small human prompt where we ask it to rephrase that question as a standalone question and pass the question itself. Great. Now let's create a chain that uses this prompt. And you'll see our familiar runnable sequence from method here. And let's paste in our prompt from above, a model, and an output parser. Cool. Now let's try running this chain on our follow-up question from before. And you'll note that messages placeholder is itself a parameter in our prompt history that we'll need to pass the chat messages making up that history. So to start, let's import and ask it an original question. So what are the prerequisites for this course as above? And we'll store the answer as this original answer property and let's log that. So original answer. And we'll log this original answer and we see that we get the prerequisites of course, as outlined to be the instructor include familiarity with basic probability etc. So a pretty good answer, now let's create a chat history that's gonna be again an array because that's these are gonna be the messages that are injected into that messages placeholder area that you saw earlier, and let's have two messages because we've as the human have asked the first question which was what are the prerequisites for this course as original question and the ai has responded with its first answer, then we'll take this chat history and invoke our rephrase question chain with both our follow-up question, which, if you recall, was, can you list them in bullet point form? And that chat history as history. So as you can see, our rephrase question chain has taken this input, can you list them in bullet point form? And rephrase it to say, list the prerequisites for this course in bullet point form. And that's something that LLMs and vector stores can handle on their own. Now, let's put all of that together with a new chain. So putting it all together. So as a reminder, here's our previous document retrieval, and formatting chain that we defined earlier. And then let's define our answer generation prompt to also use a messages placeholder for chat history. So this is going to look pretty similar to our existing answer chain, but with the system template and we're going to use that same from messages method, so we can have a placeholder for the chat history. And this lets our final answer generation chain also take into account chat history. So you can see your experienced researcher, like before, using the provided context and chat history this time, answer the user's question to the best of your ability. And then we have our messages placeholder for chat messages, and then a human prompt at the end here for the final question. And you'll also note that this time around, this final prompt and what we're gonna use for our final generation requires those three inputs instead of two. Let's try running that on its own and get a sense of like what we need to pass this final answer generation chain to get a good output. So we're just gonna format the prompt purely with some dummy context. This is gonna be a pretend output from the rephrase question chain and then some sample chat history. And if we invoke it, whoops, answer generation chain prompt. Oh, I forgot to invoke this previous line, whoops. Now let's try it. Cool. And this time around, we get a list of messages. Again, if you recall to lesson one, these are the inputs that the chat model expects formatted. So you're an expert researcher based on provided sources. We've injected a few messages here for the chat history. So you can imagine a conversation that starts with, how are you? And then find thank you. And then now answer the following question using the previous context and chat history. Why is the sky blue? So this is roughly the data we need to pass to our final generation chain. So let's assemble our conversational capable retrieval chain by passing that history cleanly through until the final prompt. We'll be using a new type of runnable here, and it's one that's nice to describe visually. There are times when a processing step in the chain may want to pass through some of its inputs unchanged to the next step. One method of doing this is with the convenience runnable pass-through dot assign method. In the diagram, step one is taking a history and a question and outputting a standalone question, but our prompt in step two also needs to receive that original input history as well as the revised standalone question that was the output of step one. Essentially, we want to pass through the old properties while assigning a new one to the current state of the chain. Let's describe this again looking at the code. .So you'll notice we were using a couple of new things here. There's this runnable pass through dot assign method, which I think I have to import here. And what this is going to do is because this pattern of basically extracting one property from the input and passing it through to the next step it's going to be equivalent to something like this, just this placeholder because this pattern is so common for something like, where we extract one property of the original input. So we can pass it as a object property to the next step here which is going to be our document retrieval chain. So for remedy it's you can think of it as just taking the original input and adding one additional field to it useful little shortcut. So we're going to go through the steps here where we first rephrase our question and make it a standalone question free of references to chat history, then we're going to pass that dereference question into our vector store to get the context documents relevant to the query. And then we're going to take all of that and generate an answer with this last step here. So we could pass history back and forth here, but we can instead streamline chat history tracking and sessions using a message history object, then wrap our chain in a manager that will automatically update the history called Runnable with Message History. Let's take a closer look at this runnable with message history class. It wraps another LCEL chain and adds persistence date by both updating and injecting chat history. It automatically updates the chat history by saving part of the input to the chain, which in your case is the user-defined question field in the input, as a new human message. It also saves the output of the chain as a new AI message. Additionally, it adds the current history messages to the input of the wrapped chain under a history messages key, which we'll use as history. Note that in a RAG application, the results of the VectorDB lookup are not stored in the chat history. Let's describe this again, looking at the code. So let's import that plus a chat history object. And you can, again, think of this as tracking chat sessions. We'll initialize it like this, and then we'll take this conversational retrieval chain that we just defined and wrap it in this new class. So it'll automatically add an additional property given by this historyMethods key, which is history, which you will call is what our answer generation prompt expects for history. And it stores, and updates the chat history in this chat message history object with the value passed in as input messages key in this case question. So the question will be appended to the chat history as a human message and the final output of the chain will be appended as an AI message in this message history object. And then getMessageHistory here is a function that returns a new chat history object based on the past session ID, which is here. In this case for the demo, we're just going to be using the same message history object every single time. But in production environments, you'd want to assign a new object here for each session to avoid mixing up conversation histories, because many people could be using your endpoint at the same time. Cool, so let's try out the finished version. You'll take your original question here, what are the prerequisites for this course, and we're going to invoke our final retrieval chain to get an original answer to our original question. Because we're using this message history with runnables class, we need to give it a session ID, even though we're not using it quite yet. And so we'll just put test. Then we'll invoke it again with a follow-up question. Can you list them in bullet point form? And let's log the result. So it's going to be thinking for a bit. It's a few model calls in a row now. But eventually we get familiarity with basic probability and statistics, linear algebra, and some programming experience. All in nice bullet point form here. And this is an example of where tracing really comes in handy and to get a visual sense of like what is going on here, because you know we have multiple calls going back and forth and a lot happening behind the scenes, we can look at a LangSmith trace to explore visually. So here you can see visually what's going on internally. We have this runnable with message history class wrapping our runnable sequence here, which is our conversational retrieval chain. And it's going to insert and load those history messages as parameters to that chain. So the first thing we're going to do is do this rephrase question step, where we take the loaded history from the manager, runable with message history as a reminder. And you can see the first couple of questions. The first question here, what are the prerequisites for this course? The initial answer, and then our follow-up question, can you list them in bullet point form? So let's shrink that down. And then you can see the user's question as well. And you can see that the output from this rephrase step is, could you provide a bullet point list of the prerequisites for this course? Which is something that is free of references to the chat history and is something that our vector store, and LLM can handle. Then we go into the next step of our chain which is going to be retrieving the documents relevant to that standalone question. So the history you've already seen the question is can you list them in bullet point form? And then could you list a bullet point list of the prerequisites for this course? And you can see that here. So some documents relevant to the prerequisites for this course. And we can see that we are basically in your algebra, matrix and vector stores, etc. And finally, we have a synthesis of our final results where we pass all this information in. So here's the context from the vector store right there and all the documents, the chat history, and then a final list of this generation. Retrieval is a very deep topic, and there's no one size fits all approach for loading, splitting and querying your data. It's going to really depend on the format, how information dense it is, and other factors. So you're encouraged to modify the above prompts and parameters for different models and data types. In the next lesson, we'll show how to put this retrieval chain into production, including showing off some interactions with common web APIs and HTTP.