Knowledge Graphs for RAG - DeepLearning.AI

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

Let's get started. Here in the notebook, the first step as usual, is to import some packages for Python, and then setting up some global variables that we will use in the notebook. And for all the queries that we want to send to Neo4j, we'll again use the langchain integration called Neo4jGraph. You already have chunk nodes. You'll want to create a new node to represent the 10k form itself. This 10k form node will have a form label and the following properties. It'll have a form ID, which is a unique identifier for the form. It'll have a source property, which will be a link back to the original 10k document from the SEC. There's also a CIK number, which is the central index key from the SEC and also a QSIP 6 code. Now, Each of the chunks has the information that we need to create the form node. So, And with that chunk, just return one of those nodes. And then, we'll use this special notation pull specific You can see that we have all the information that we need here. When you create the form node, it'll be with a parameterized query. And if you remember that parameterized query can take in a dictionary as one of the parameters. So here, we've got the perfect dictionary that we need for creating the form node. Let's save that into just a Python variable. Perfect. We'll now use that dictionary to create a form node. You can see down here at when we're calling the query, we're going to pass in some parameters, this form info parameter, and we're going to pass in the dictionary. That dictionary under the name form info param will be available inside of the query. And when we create it, we'll set the names, the source, the SIC, and also the QSOAP IDs based on the passed in parameter. As always, we'll do some sanity checking to make sure that we did the right thing. We'll go ahead and do a match for all forms and return a count to those forms. We're expecting there to be only one. Form count is one, just as we wanted. Perfect. Our goal is to add relationships to improve the context around each chunk. You will connect chunks to each other and to that newly created form node. The result will reflect the original structure of the document. You can start by creating a linked list of nodes for each section. First, let's just find all the chunks that belong together. So we'll match all the chunks where they come from the same form ID, and we're going to pass in the form ID as a parameter. Here, you only have one form, but I'm showing you a query that will work even if you had chunks from multiple forms. The form ID is passed in as a query parameter. It is then used in the where clause to check that the chunks have the same form ID. Looking at the result, you can see that each of these come from the same form ID. They have different chunk IDs and different sequences. So, that looks pretty good. In particular, notice that we're returning the chunk sequence ID that seems to be incrementing 0123 and four, that looks good. We're going to be using that to make sure that we have all the chunks in the right order. Let's change this query to make sure that we actually have them all in order. Okay, looking at the results, they're still all from the same form. That's good. And if we look at the sequence ID to make sure it's in order, we see zero and then another zero. Okay, so that's not good. Why is that? Looking more closely, we can see that we've got chunks from different sections of the form. Here, this is from section seven a and this is from section seven, they both have a chunk with a sequence ID zero, that's not what we want. We just want a sequence of chunks from the same section. So now, let's refine this a little bit more. We've also added a new query parameter called f 10k item parameter. And that's the name of the section that we want the chunks to be from. So, Okay, checking the results again, everything from the same form, that's good. We're also from the same form item, item one, item one, item one, and we've got the sequence we want 0123 incrementing. Perfect. So there's a new line here. And this here at the end, this slash slash, that's just a way of having a comment at the end of a line. Cool. And in order by their sequence ID. We can check the graph schema to see that there's a new relationship type. We see that we have nodes with properties and the relationships are the following. We have chunks that are connected by next relationship to other chunks. Perfect. Because we have avoid duplicates true, even if we try to do item 1 again, we're not going gonna create a new linked list because that'll avoid the duplicates. Next, you can connect the chunks to the form they're part of. Match a chunk and a form where they have the same form ID, then merge a new part of relationship between them. You can see we've created 23 new relationships. As a reminder, we're working with a small collection of chunks here to keep the notebook running smoothly. In the full sample 10k form, there are several hundred chunks. You can add one more relationship to the graph, connecting the form to the first chunk of each section. This is similar to the previous query, but also checks that the chunk is sequence ID 0. The section relationship that connects the form to the first chunk will also get an F10K item property. It will take the value of that from the chunk. You can see that happening in the merge here. This is a kindness for humans looking at the knowledge graph, enabling them to easily navigate from a form to the beginning of a particular section. We've got four sections. So of course, we created four section relationships. We can now try some example cipher queries to explore the graph. For example, you can get the first chunk of a section using a pattern match from a form to the chunk connected by a section relationship. You'll use the where clause to make sure that the form has got the form ID we want and that the relationship is for the section we want. We're going to pass those in as query parameters. There's the chunk ID for the chunk. That is the first one in the section. And then, With information about the first chunk, you could then get the next chunk in the section by following the next relationship. And you can see that this section continues to talk about our favorite company, NetApp. To sanity check the work that we've been doing, we'll take a look and make sure that we actually do have two chunks that are in sequence. And this one is chunk 0001. Perfect. To find a window of chunks, you can use a pattern match using the next relationships between three nodes. You only need to specify the chunk ID of one of the chunks. Because of the relationships, you know you will get all three chunks back. Here, You can see that we've got chunks C1, C2 and C3, and that they've got chunk IDs 000, 01, and 02. This is great! You're starting to see the advantage of having a graph. Once you have a place to start in the graph, you can easily get to connected information. With a RAG application, you might discover a node using semantic search. With a graph, information is stored in nodes and relationships. There's also information in the structure of the graph. You just matched a pattern with three nodes and two relationships. The thing you found is called a path. Paths are powerful features of a graph, famous in algorithms like finding the shortest path between two nodes. You can capture the entire matching path as a variable by assigning it at the beginning of the pattern like this. Paths are measured by the number of relationships in the path. For a three-node path, we'd expect that the length is going to be 2. Let's take a look. Perfect. You'll be using paths in the next couple of queries. It's going to be a lot of fun. Notice that the chunk window pattern is around the second chunk in the list, taken from the next chunk info. What happens to the pattern match if we look for a window around the first chunk instead? Let's see. Change next chunk info into first chunk instead. Let's see. Change next chunk info into first chunk info, and then let's run. That's because the first chunk info doesn't have a previous chunk as required by the pattern. We can change the pattern to look for what's called a variable length path. With a variable length path, you can specify a range of relationships to match. You can see the notation here when we specify the relationship type. It's colon relationship type. Then an asterisk, and then, the range. By using a variable length, both in the beginning and at the end of this pattern, we can match the boundary conditions of a linked list, whether you're looking at the very first item in the linked list or the very end of the linked list. Notice when we ran the query, that two patterns actually were matched. The first has a length of zero and the second has a length of one, meaning that it has two nodes in one relationship. If we're going to look for a window of chunks around a node, we want the longest possible path. Let's see how to do that. This is like the query that we just did. But now, And we're gonna look for the longest chunk window by ordering all those paths, according to how long they are descending and limit that to just one, that should be the longest path that matches the pattern. As we would have hoped the lungs path is one. So, that looks correct. This is a pretty good time to actually pause the video and try out variations on this query. For example, you might want to try looking for two chunks to either side. Try different variations and see what you get. You can now create a question and answer chain. If you look at this cipher query here, this is an extension of the vector search cipher. What you see at the beginning is that there are two variables that get passed in, node and score. Those come from the vector similarity search itself. We're taking that. And then, We're going to return that extra text prepended to whatever the text was of the node or the chunk, and we're going to call that text, return the score, and then also some metadata about the result. This is the smallest bit of Cypher that we can run that will illustrate what happens. Let's go ahead and build a langchain workflow that uses this query. The new part that I'll highlight here is that we're passing in a parameter called retrieval query. And for that parameter, we're going to pass in that cipher query that we just defined above. That's what's going to do the extension of the vector search with whatever extra bit of querying we want to do. Okay. Well, apparently I know about a lot of things. Not only do I know about Cypher, I also know about natural disasters and catastrophes. But we know this isn't the case, right? So, We might be able to do a little bit of change to the question. Maybe what single topic does Andreas know about? Okay, you now know how to customize the results of a vector search by extending it with Cypher. You could use this capability to expand the context around a chunk with the chunk window query. Let's try this out and compare the results both with the extra window and without it, just the chunk that the vector finds. First, create a chain that uses the default Cypher query included with Neo4j vector. Call it windowless chain. Now, create another chain that uses the chunk window query. Your goal here is to expand the context with adjacent chunks, which may be relevant to providing a complete answer. To do that, use the chunk window query, then pull out text from each chunk that is in the window. Finally, all that text will be concatenated together to provide a complete context for the LLM. With that query in hand, we can now create another vector store. Notice as we create this vector store, we'll be passing it in as the retrieval query, right? So, Okay, let's give that a try and compare both having the chunk window and without the chunk window. When we've run both of these chains, we'll do a little extra formatting to kind of provide a nice, That seems like a pretty good summary. Okay, not bad. Let's try with the chunk window. Okay, you can see that these two answers are pretty similar. The one difference that I can spot is that with the expanded context, it actually highlighted NetApp's Keystone, which is their premier product. So, You can pause here to try out variations of the chunk window query. Also, try asking different questions to see what you get. When you're ready, join me in the next lesson, where you will expand the context with data from an additional SEC form that contains information about NetApp's investors.

course detail

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

Loading...

Learn Code

Next Lesson

Knowledge Graphs for RAG

Introduction

Knowledge Graph Fundamentals

Querying Knowledge Graphs

Preparing Text for RAG

Constructing a Knowledge Graph from Text Documents

Adding Relationships to the SEC Knowledge Graph

Expanding the SEC Knowledge Graph

Chatting with the Knowledge Graph

Conclusion

Course Feedback

Community

0%