In this lesson, we'll add a generation step using an LLM at the end of the search pipeline. This way, we can get an answer instead of search results, for example. This is a cool method to build apps where a user can chat with a document or a book, or, as we'll see in this lesson, an article. Large language models are great at many things. There are, however, use cases where they require some help. Let's take an example. So let's say you have a question that is. Are side projects important when you are starting to learn about AI? You can ask this to a large language model. Some of them might give you interesting answers but what is more interesting really is if you ask an expert or the writings of an expert. An example here is if you can ask this to Andrew Ang or consult some of Andrew's writings about a question like this. Luckily, we have access to some of Andrew's writings. So you can go in Deep Learning AI. There's this newsletter called The Batch, and you can find this series of articles called. How to Build a Career in AI. It's in multiple articles. We'll use what we've learned in this course to search and then generate an answer from this article using a generative large-language model. Let's visualize this and describe exactly what we mean. We can ask a large-language model a question, and they're able to answer many questions. But sometimes we want them to answer from a specific document or archive. This is where you can add a search component before the generation step to improve those generations. When you rely on a large-language model's direct answer, you're relying on the world information it has stored inside of it. But you can provide it with context using a search step beforehand, for example. When you provide it to the context in the prompt, that leads to better generations for cases when you want to anchor the model to a specific domain or article or document or our text archive in general. This also improves factual generations. So, in a lot of cases where you want facts to be retrieved from the model, and you augment it with a context like this, that improves the probability of the model being more factual in its generation. The difference between the two steps is, instead of just asking the question to a generative model and seeing the result that it prints out, we can first present the question to a search system, exactly like the ones we built earlier in this course. Then we retrieve some of these results, we pose them in the prompt to the generative model in addition to the question and then we get that response that was informed by the context. We'll look at exactly how to do that in the code example next. So this is our question. Let's build our text archive. For this use case we'll simply just open these articles and copy the text. We can just copy it and paste it in this variable we can call text. Just dump all of them in there. We can copy three, so this is the second article. And here we have a variable that contains the text of three articles. You can do more. The series is a great read. And it's, I think, maybe in seven or eight parts, but we can do this example with three. Some familiar code that you've seen in the past to set up the environment we run here. Also some more familiar code so we can import co here, because next we will be embedding this text. We'll be chunking it first and then embedding it and then building our semantic search index. So this is where we've chunked it. Let's look at what text looks like now. Now, let's look at the first three examples. So these are the first three chunks. So the rapid rise of AI has led to a rapid rise in AI jobs. Three key steps for career growth initially. So these are three passages, three paragraphs from Andrew's article. Next, we can proceed to setting up the Cohere SDK and embedding the texts. So here, we're sending it to embed and getting the embeddings back. Now, let's build our text archive. We do a few imports. We've seen all of these before. So this is Annoy, which is the vector search library. NumPy, Pandas will not be using regular expressions, but it's always good to have them handy when dealing with texts in general. So the same code goes here, and to run through this is we're just turning it into a NumPy array. So these are the vectors that we got back. So these are the embeddings. We create a new index, a vector index. We insert the vectors into it and then we build it and save it to file. Now we have our vector search. Let's now define a function. Let's call this one search Andrew's article. And we give it a query and it will run a search on this data set. And to do that, these steps are exactly what we've seen in the past, so we embed the query, we do a vector search of the archive, so we compare the query versus the embedding of every paragraph in the text, and then we return the result. Now we can ask this search system a question, kind of like this, are side projects a good idea when trying to build a career in AI? So I wonder what Andrew would say about this. And here we return the first result. So this is a long paragraph, and it's the closest match to this question. And if you look somewhere here, develop a side hustle, even if you have a full-time job. A fun project that may or not develop into something bigger can stir the creative juices. So that's the answer way in the center of this big text here. This is a great case for why we can use a large language model to answer this. So we can give it this and we have it extract that relevant piece of information for us. So let's do that next. So instead of searching, we want to define a new function that says, "ask_andrews_article", and here we give it a question and let's say "num_generations=1" so a few things to do here so the first step before we do anything we will search so we will get to that relevant context from from the article we can get the top result and this is a design choice for you do you want to inject one result in the prompt 2 or 3, but we'll use 1 because that's the simplest thing to do here. The prompt that we can use can look like this. So excerpt from an article titled, How to Build a Career in AI by Andrew Ang. So this is a general prompt engineering tip that the more context we provide for the model, the more it's able to tackle the task better. Then in here, we will inject the context that we received. So this is the paragraph from the article. And then we pose the question to it. And we give the instruction or command to the model to say, extract the answer from the text provided. And if it's not there, tell us that it's not available. That we then say, prediction that we need to send to the model. Now that we have our prompt, we say "co.generate", "prompt=prompt", "max_tokens", let's say 70. Some of these tend to be on the longer model. We want to use a model that we call command nightly. This is the generation model from Cohere that is most rapidly updated. So if you're using command nightly, you're using the latest models that are available on the platform. So this tends to be some experimental models, but they're the latest and generally greatest. So we can stop here. We're not using Gnome Generations yet, but we can use that later. Then we will return "prediction.generations". That is our code. And now, exactly this question, let's pose it here. And instead of this being a search exercise, we want this to be a conversational exercise informed by search and posed to a language model. And if we execute it, we get this answer. Yes, side projects are a good idea when trying to build a career in AI. They can help you develop your skills and knowledge and can also be a good way to network with other people. However, you should be careful not to create a conflict with your employer and you should make sure that you're not validating any and then we ran out of tokens here so we can just increase the number of tokens here if we want a longer answer. So this is a quick demo of how that works. You can take it for a spin, ask it a few questions. Some of these might need a little bit of prompt engineering, but this is a high-level overview of some of these applications. There are a bunch of people who are doing interesting things with things like this, with, for example, ask the Lex Fridman podcast anything, and that does exactly this flow. So semantic search against the transcripts of the entire podcast. Somebody did that with Andrew Huberman's podcast as well. You see this with transcripts of YouTube videos, of books as well. So this is a common thing that people are building with large language models, and it's usually powered by this one-two step of search and then generate and you can throw a re-rank in there to improve the search component as well. Feel free to pause here and try this for yourself, run the code up until this point and change the questions that you want to send the model or get another data set that you're interested in. You don't always have to copy the code, this is just a very quick example. You can use tools like Llama Index and LangChain to import text from PDF, if you want to work on a more industrial scale. So remember this "num_generations" parameter. This is a cool tip when you're developing, and you want to test out the behavior of the model on a prompt multiple times in every time you hit the API. So you can say, this is a parameter that we can pass to "code.generate". So we can say "num_generations=num_generations". And then when we're asking the question here, we can say "num_generations=3". And we don't want it necessarily to print this. We want to print multiple. So what happens here is that this question is going to be given to the language model. And the language model is going to be asked to give us three different generations at the same time, not just one. So it runs them in like a batch. And then here we can say for gen in results, "print(gen)", gen for generations, basically. Print, this is just for us to see. Because when you debug model behavior, you want to have this to be able to quickly see, OK, is the model answering this question or responding to this prompt multiple times correctly or not, without having to continue to run it one after the other. You can see three or up to five, I think you can pass to this, where this is a generation from the model and this is a generation, and they're all in response to the same prompt. And that's one way for you to do prompt engineering and get a sense of the model behavior in response to the prompt that you're using at a glance multiple times.