by augmenting the query we send with an LLM. In this lesson, we're going to use a technique called cross-encoder re-ranking to score the relevancy of our retrieved results for the query that we sent. Let's dig in. Re-ranking is a way to order results and score them according to their relevancy to a particular query. So let's take a look at how this works underneath. In re-ranking, after you retrieve results for a particular query, you pass these results along with your query to a re-ranking model. This allows you to re-rank the output so the most relevant results have the highest rank. Another way to think about this is your re-ranking model scores each of the results conditioned on the query, and those with the highest score are the most relevant. Then you can just select the top ranking results as the most relevant to your particular query. So let's take a look at how to do this in practice. First, we import our helper functions as before. And we load the data into Chroma. So one use of re-ranking is to get more information out of the long tail of query results. So let's take a look at this query that we've already covered once before, which is what has been the investment in research and development. And usually we've been asking for five results in return for our particular query, but now we're gonna ask for 10. That means we're gonna get a longer tail of possibly useful results. And again, we're gonna include documents and embeddings. And so let's retrieve the documents and taking a look at what we get, we see that we get the same first five results as before because retrieval is deterministic, but we also have five new results, which might have relevant information to our question. The trick is to figure out which of these results are actually relevant to our specific query instead of just being the nearest neighbors in embedding space. And the way we do that is through using a cross-encoder re-ranking. So we're going to use the sentence transformers cross-encoder, and we're going to instantiate it with a particular model. So what is a cross-encoder model? Sentence transformers are made up of two kinds of models. There's something called a Bi-encoder, where a Bi-encoder encodes queries separately, and then we can use the output of those Bi-encoders to perform cosine similarity and find the nearest neighbors. In contrast, a BERT cross-encoder takes both our query and our document and passes it through a classifier which outputs a score. And in this way, we can use our cross-encoder to score our retrieved results by passing our query and each retrieved document and scoring them using the cross-encoder. We can use the cross-encoder by passing in the original query and each one of the retrieved documents and using the resulting score as a relevancy or ranking score for our retrieved results. So we've instantiated our cross encoder. And the first thing we're going to do is create pairs. The pairs consist of our query and each doc in our retrieved documents. And we're just going to ask the cross encoder to score each pair. So let's print out our scores. And while we see that the first two documents have high scores for our query, we notice first of all that the second retrieved document is actually a much higher score than our first one. And also some documents in the longer tail of retrieved results have higher scores than some of the documents in the first five. So what would that look like if we were to reorder our documents according to score? We see that the second document is now ranked first. First document is ranked second. And something in the long tail actually makes it into the top five and in fact the top five re-ranked results contain results that are originally the sixth and seventh results while the fourth and fifth results are actually ranked quite low. So in this way we've used the cross-encoder and the score that it produces to re-rank our results, and now if we were to cut to this top five we'd see that the results should be much more relevant than what we had before because we've mined more of the long tail for information that's actually relevant to our question. Now you might already see where I'm going with this. But given the number of results that we get with query expansion and the way in which each generated query addresses a different part of the complex question, we can use the cross-encoder re-ranker technique to actually get all of the best results for the original query from the augmented expanded queries instead of just sending all of them to the LLM. And here's how we do that. So from the previous lab, this is just our original query and the generated queries. And I've saved them into text here for you. And then we do the same thing. We concatenate the original and generated queries together and we retrieve the results. And then, as last time, we deduplicate the retrieved results. And now we create pairs, just as we did in the previous example, where we make pairs of the original query and each retrieved document. In this way, we can compute the relevancy of the retrieved results for the augmented queries to the original query and select among them the five best that we actually want to pass to the LLM. So let's create those pairs and let's score them. And one great thing about using a cross-encoder model like this one is it's extremely lightweight and runs completely locally. So here are the scores of all of our retrieved results. And we can use these scores to order our results and give us a new ordering. And then we can pass the top five of these new results to the LLM and get the most relevant information from this long tail of queries that we got through query expansion and the retrieved results for our augmented queries. So, in this lab, we learned how to use a cross-encoder as a re-ranking model, and we've seen how we can apply re-ranking both to get more out of the long tail of a single query, as well as to filter the results of an augmented expanded query to only the results relevant to the original query itself. This is a really powerful technique and worth experimenting with some more. And it's a good idea to try to understand and get an intuition for how the re-ranking score might change depending on query, even when the result that your retrieval system is giving you are the same. This is because the cross-encoder re-ranker can emphasize different parts of the query than the embedding model does. And so the ranking that it provides is much more conditional on the specific query than just what is naively returned by the retrieval system itself. In the next lab, we'll talk about query adapters. Query adapters are a way to directly alter or augment the query embedding itself using user feedback or other types of data to get better query results.

Learn Code

Next Lesson