One of the most common, complex applications that people are building using an LLM is a system that can answer questions on top of or about a document. So, given a piece of text, maybe extracted from a PDF file or from a webpage or from some company's intranet internal document collection, can you use an LLM to answer questions about the content of those documents to help users gain a deeper understanding and get access to the information that they need? This is really powerful because it starts to combine these language models with data that they weren't originally trained on. So it makes them much more flexible and adaptable to your use case. It's also really exciting because we'll start to move beyond language models, prompts, and output parsers and start introducing some more of the key components of LangChain, such as embedding models and vector stores. As Andrew mentioned, this is one of the more popular chains that we've got, so I hope you're excited. In fact, embeddings and vector stores are some of the most powerful modern techniques, so if you have not seen them yet, they are very much worth learning about. So with that, let's dive in! Let's do it! So we're going to start by importing the environment variables as we always do. Now we're going to import some things that will help us when building this chain. We're going to import the retrieval QA chain. This will do retrieval over some documents. We're going to import our favorite chat open AI language model. We're going to import a document loader. This is going to be used to load some proprietary data that we're going to combine with the language model. In this case it's going to be in a CSV. So we're going to import the CSV loader. Finally we're going to import a vector store. There are many different types of vector stores and we'll cover what exactly these are later on but we're going to get started with the "DocArrayInMemorySearch" vector store. This is really nice because it's an in-memory vector store and it doesn't require connecting to an external database of any kind so it makes it really easy to get started. We're also going to import display and markdown to common utilities for displaying information in Jupyter notebooks. We've provided a CSV of outdoor clothing that we're going to use to combine with the language model. Here we're going to initialize a loader, the CSV loader, with a path to this file. We're next going to import an index, the "VectorStoreIndexCreator". This will help us create a vector store really easily. As we can see below, there will only be a few lines of code to create this. To create it, we're going to specify two things. First, we're going to specify the vector store class. As mentioned before, we're going to use this vector store, as it's a particularly easy one to get started with. After it's been created, we're then going to call "from_loaders", which takes in a list of document loaders. We've only got one loader that we really care about, so that's what we're passing in here. It's now been created and we can start to ask questions about it. Below we'll cover what exactly happened under the hood, so let's not worry about that for now. Here, we'll start with a query. We'll then create a response using "index.query" and pass in this query. Again, we'll cover what's going on under the hood down below. For now, we'll just wait for it to respond. After it finishes, we can now take a look at what exactly was returned. We've gotten back a table in markdown with names and descriptions for all shirts with sun protection. We've also got a nice little summary that the language model has provided us. So we've gone over how to do question answering over your documents, but what exactly is going on underneath the hood? First, let's think about the general idea. We want to use language models and combine it with a lot of our documents. But there's a key issue. Language models can only inspect a few thousand words at a time. So if we have really large documents, how can we get the language model to answer questions about everything that's in there? This is where embeddings and vector stores come into play. First, let's talk about embeddings. Embeddings create numerical representations for pieces of text. This numerical representation captures the semantic meaning of the piece of text that it's been run over. Pieces of text with similar content will have similar vectors. This lets us compare pieces of text in the vector space. In the example below, we can see that we have three sentences. The first two are about pets, while the third is about a car. If we look at the representation in the numeric space, we can see that when we compare the two vectors on the pieces of text corresponding to the sentences about pets, they're very similar. While if we compare it to the one that talks about a car, they're not similar at all. This will let us easily figure out which pieces of text are like each other, which will be very useful as we think about which pieces of text we want to include when passing to the language model to answer a question. The next component that we're going to cover is the vector database. A vector database is a way to store these vector representations that we created in the previous step. The way that we create this vector database is we populate it with chunks of text coming from incoming documents. When we get a big incoming document, we're first going to break it up into smaller chunks. This helps create pieces of text that are smaller than the original document, which is useful because we may not be able to pass the whole document to the language model. So we want to create these small chunks so we can only pass the most relevant ones to the language model. We then create an embedding for each of these chunks, and then we store those in a vector database. That's what happens when we create the index. Now that we've got this index, we can use it during runtime to find the pieces of text most relevant to an incoming query. When a query comes in, we first create an embedding for that query. We then compare it to all the vectors in the vector database, and we pick the n most similar. These are then returned, and we can pass those in the prompt to the language model to get back a final answer. So above, we created this chain and only a few lines of code. That's great for getting started quickly. But let's now do it a bit more step-by-step and understand what exactly is going on under the hood. The first step is similar to above. We're going to create a document loader, loading from that CSV with all the descriptions of the products that we want to do question answering over. We can then load documents from this document loader. If we look at the individual documents, we can see that each document corresponds to one of the products in the CSV. Previously, we talked about creating chunks. Because these documents are already so small, we actually don't need to do any chunking here. And so we can create embeddings directly. To create embeddings, we're going to use OpenAI's embedding class. We can import it and initialize it here. If we want to see what these embeddings do, we can actually take a look at what happens when we embed a particular piece of text. Let's use the "embed_query" method on the embeddings object to create an embeddings for a particular piece of text. In this case, the sentence, "Hi, my name is Harrison." If we take a look at this embedding, we can see that there are over a thousand different elements. Each of these elements is a different numerical value. Combined, this creates the overall numerical representation for this piece of text. We want to create embeddings for all the pieces of text that we just loaddand then we also want to store them in a vector store. We can do that by using the "from_documents" method on the vector store. This method takes in a list of documents, an embedding object, and then we'll create an overall vector store. We can now use this vector store to find pieces of text similar to an incoming query. So let's look at the query, "Please suggest a shirt with sunblocking". If we use the similarity search method on the vector store and pass in a query, we will get back a list of documents. We can see that it returns four documents, and if we look at the first one, we can see that it is indeed a shirt about sunblocking. So, how do we use this to do question answering over our own documents? First, we need to create a retriever from this vector store. A retriever is a generic interface that can be underpinned by any method that takes in a query and returns documents. Vector stores and embeddings are one such method to do so, although there are plenty of different methods, some less advanced, some more advanced. Next, because we want to do text generation and return a natural language response, we're going to import a language model and we're going to use ChatOpenAI. If we were doing this by hand, what we would do is we would combine the documents into a single piece of text. So we'd do something like this, where we join all the page content in the documents into a variable and then would pass this variable or a variant on the question, like, "Please list all your shirts with sun protection in a table in markdown and summarize each one." into the language model. And if we print out the response here, we can see that we get back a table exactly as we asked for. All of those steps can be encapsulated with the LangChain chain. So here we can create a retrieval QA chain. This does retrieval and then does question answering over the retrieved documents. To create such a chain, we'll pass in a few different things. First, we'll pass in the language model. This will be used for doing the text generation at the end. Next, we'll pass in the chain type. We're going to use "stuff". This is the simplest method as it just stuffs all the documents into context and makes one call to a language model. There are a few other methods that you can use to do question answering that I'll maybe touch on at the end, but we're not going to look at in detail. Third, we're going to pass in a retriever. The retriever we created above is just an interface for fetching documents. This will be used to fetch the documents and pass it to the language model. And then finally, we're going to set "verbose=True". Now, we can create a query and we can run the chain on this query. When we get the response, we can again display it using the display and markdown utilities. You can pause the video here and try it out with a bunch of different queries. So that's how you do it in detail, but remember that we can still do it pretty easily with just the one line that we had up above. So, these two things equate to the same result. And that's part of the interesting stuff about LangChain. You can do it in one line, or you can look at the individual things and break it down into five more detailed ones. The five more detailed ones let you set more specifics about what exactly is going on, but the one-liner is easy to get started. So up to you as to how you'd prefer to go forward. We can also customize the index when we're creating it. And so if you remember, when we created it by hand, we specified an embedding. And we can specify an embedding here as well. And so this will give us flexibility over how the embeddings themselves are created. And we can also swap out the vector store here for a different type of vector store. So there's the same level of customization that you did when you create it by hand that's also available when you create the index here. We use the "stuff method" in this notebook. The stuff method is really nice because it's pretty simple. You just put all of it into one prompt and send that to the language model and get back one response. So it's quite simple to understand what's going on. It's quite cheap and it works pretty well. But that doesn't always work okay. So if you remember, when we fetched the documents in the notebook, we only got four documents back and they were relatively small. But what if you wanted to do the same type of question answering over lots of different types of chunks? Then there are a few different methods that we can use. The first is "Map_reduce". This basically takes all the chunks, passes them along with the question to a language model, gets back a response, and then uses another language model call to summarize all of the individual responses into a final answer. This is really powerful because it can operate over any number of documents. And it's also really powerful because you can do the individual questions in parallel. But it does take a lot more calls. And it does treat all the documents as independent, which may not always be the most desired thing. "Refine", which is another method, is again used to loop over many documents. But it actually does it iteratively. It builds upon the answer from the previous document. So this is really good for combining information and building up an answer over time. It will generally lead to longer answers. And it's also not as fast because now the calls aren't independent. They depend on the result of previous calls. This means that it often takes a good while longer and takes just as many calls as "Map_reduce", basically. "Map_rerank" is a pretty interesting and a bit more experimental one where you do a single call to the language model for each document. And you also ask it to return a score. And then you select the highest score. This relies on the language model to know what the score should be. So you often have to tell it, "Hey, it should be a high score if it's relevant to the document and really refine the instructions there". Similar to "Map_reduce", all the calls are independent. So you can batch them and it's relatively fast. But again, you're making a bunch of language model calls. So it will be a bit more expensive. The most common of these methods is the "stuff method", which we used in the notebook to combine it all into one document. The second most common is the "Map_reduce" method, which takes these chunks and sends them to the language model. These methods here, stuff, map_reduce, refine, and rerank can also be used for lots of other chains besides just question answering. For example, a really common use case of the "Map_reduce" chain is for summarization, where you have a really long document and you want to recursively summarize pieces of information in it. That's it for question answering over documents. As you may have noticed, there's a lot going on in the different chains that we have here. And so in the next section, we'll cover ways to better understand what exactly is going on inside all of these chains.