build a conversational question-answering, LLM application capable of using external data as context. Let's go! So our chain will retrieve chunks that are most similar to the input query via vector similarity search, and then we'll present them to the LLM as context to ground the LLM generation of a final answer. So to start, let's load our environment variables, of course, and then let's split and load the CS229 lesson PDF transcript from earlier. For brevity, I factored out the vector store initialization code into a helper function that takes arguments for chunk size and chunk overlap. You'll use bigger chunks this time. The more simulate a production environment. 1536, and we'll use an overlap of 128 characters. Now let's load those docs into a vector store, the same way we did in the previous lesson using OpenAI embeddings. So we have this helper function, and even though we're not explicitly initializing the OpenAI embeddings. That's all happening within this initialized vector store with documents method. We'll pass in the documents we split in the previous cell. Then we'll create a retriever as before from that vector store that will fetch the documents for given natural language query. Awesome. So now we're ready to start constructing our retrieval chain. And there are a few pieces here. The first we'll get into is a wrapper around our retriever that formats the inputs and outputs for the other steps. And we'll call this step Document retrieval In a chain. So again, retrievers take string natural language input, but we often find it convenient to have chains take an object parameter for flexibility. So let's start by creating a simple sequence that will take an object with a field called question as input and then format the resulting document's page content as strings. And we'll import a few things here. You might recall a runnable sequence from our first lesson. And we'll define a small function that formats the document content as output that'll look and be parsable by an LLM. So for each document, we're going to surround it with these XMLS tags to separate a separate document's content to help the LLM distinguish between different ideas. Now let's define our chain. So we'll start with this. Again, we want our chain here to take a object as input, but our retriever takes a string. So to get around this, you'll add this little extraction function here that will take the object input, which again will look something like this. And we're going to extract this question using this little Lambda function here. And you'll pass it directly to the retriever after that. Now the retriever can take that string input, extract it from this question field in the input object, do its thing, and return some documents. Then we'll pipe that output into our little helper function up here. Cool. So that's our chain. And let's try invoking it now. And we can see we get some document content separated by these doc tags, each with some prerequisites for the CS229 course. So we can see that there is some familiarity with basic probability and statistics, and maybe stat 116. You need to know a bit about what random variables are and some other requirements here. Cool. So this looks like it contains the information that we need. Now, let's construct a chain that synthesizes all this into a human legible response. So we'll start with a prompt, and we'll import our familiar chat prompt template. And let's define a quick prompt here, template rather. You're an experienced researcher, expert at interpreting and answering questions based on sources. And answer the question using only those resources. Cool. And we'll wrap this in a chat prompt template using the from template method from earlier. And one thing to note here is that our prompt requires an object, the context properties as an argument. And our previously defined document retrieval chain actually outputs a string. So there's some input coercion and output coercion that we need to deal with. And to do that, we'll use something called a runnable map. So show that off here, we'll import class. And when a runnable map is invoked, it calls all the runnables or runnable-like objects that it has properties here, context in question, corresponding to our document retrieval chain, and then a simple extraction function here in parallel. And it'll call each of those functions or runnables with the same input. Then the output is an object whose properties are the results of those calls. So to show this off let's try it. What are the prerequisites for this course? And we can see the output is that the question has been preserved. So we can pass this along to a later step in the chain. And we get the context from the document retrieval chain here, the same thing you saw above. And that's the exact format coincidentally that we need to pass to our prompt. So let's see what this looks like in sequence. This is the augmented generation step. So let's take our map that we defined above and combine it with some models and output parsers. And let's do that. It's a little bit clearer. So we'll import a few classes here from earlier. We'll initialize our model. And then we'll construct a sequence, again, using our map from before and our answer generation prompt that we declared above. So it's going to look like this and you might notice that we're not wrapping our object here in a runnable map constructor and that's because objects are automatically converted into runnable maps when they're in this runnable sequence step from initializer. And then we want to pass our question all the way through to our answer generation prompt, which if you'll recall, looks something like this. It requires variables for context and question. So to do that, we extract the input as a question and pass the entire object, which looks like this, output from the runnable map into our prompt, generate an output, pass that to the model, and then parse the output as a string. All right, now let's try invoking it. So we'll ask for an answer, and say, what are the prerequisites for this course with our new retrieval chain? And we'll log the answer, and give it a shot. So we get the prerequisites for this course include familiarity with basic probability and statistics, as well as basic linear algebra, and assumes that you're familiar with various concepts, which seems pretty reasonable and seems like a much more human readable response than that list of documents that we got before. Awesome. But what if we wanna ask a follow-up question? So let's say, it's a little bit more readable, but still not maybe quite as readable as you want. So we might ask, can you list them referring to the prerequisites of the course in bullet point form? And let's see what we get. So we get based on the previous context, the information does not specify a specific list to be organized in bullet point form, which didn't do so well there, did it? And this occurs because LLMs do not have an innate sense of memory. And since we're not passing in any chat histories context, the LLM doesn't know what them refers to. You can update the prompt to take chat history into account as well. But then you have a more fundamental problem, which is that our vector store needs to return relevant documents related to this reference full query. To illustrate the problem, here's what happens if we try to query our vector store with that follow up question directly. And we'll use our document retrieval chain to show this. And we get some docs that don't look like they have anything about prerequisites. You can see some stuff about supervised learning, something about selling a house in Portland, Oregon, and some other information that really doesn't contain much relevant about prerequisites to the machine learning course. And that's because our query, the vector store itself has no concept of what them is either. All right, you've just gone over the basics of LLM powered question answering augmented by retrieval. We will look at how we can fix some of the shortcomings around conversation history in the next lesson on conversational question answering. I'll see you there.