In this lesson, you'll bring in a second SEC dataset to expand the context of the original filing forms. This second set of forms provides information about institutional investment managers and the interests they hold in companies. By adding this data to the graph, you'll be able to ask more complex questions with the combined datasets to help you understand market dynamics. Let's take a look. You will start in the usual way by importing some libraries, then define some global variables, and of course we need a Neo4j graph instance to connect to the Neo4j database. SEC form 13 is filed by institutional investment management firms to report what public companies they have invested in. The forms are available as XML files. During data preparation, particular fields were pulled out of the XML and added as a row in a CSV file. To begin, you can read the CSV file in using a CSV.dictreader, which will parse each row and turn it into a dictionary using the CSV header row for keys. Let's take a quick look at what the rows look like though. Maybe just the first five. Okay, you can see that each of these firms have invested in the same company. If you kind of take a look here at company name, there's NetApp, there's company name NetApp again. All of these management companies have different names, but they're all investors in NetApp, there's company named NetApp again. You can see that there are details about the firm itself, things like the manager name, the address of the manager, and also this central index key or CIK key. There's also about the particular, you know, information about the investment that they've made. So, You know, what was the values, the number of shares? That makes sense. Here, the value is a monetary value. So, what is this reporting calendar? You know, what was the values, the number of shares? That makes sense. Here, Let's see how many rows there are. We'll just check the length. Okay, there's 561 rows. We'll expect 561 companies to be created. The management company nodes will have a manager label. They'll be unique based on the central index key from the SEC and they'll also have a manager name property. The company nodes will have a company label. They will be unique based on the QSIP 6 identifier. The company nodes will also get a company name and a full QSIP 6 property from the Form 13 data. Let's start by creating the company nodes. Merge in the company nodes with a company label that is unique by the QSIP 6 identifier. We see that on creation, we're going to set the company name and then the QSIP number itself. Quick bit of sanity checking. We're going to expect that the company which is created is NetApp. Of course NetApp is there, perfect. You already have a form 10k form for NetApp in the knowledge graph. You can match the newly created company node to the related form 10k by finding the pair based on the QSIP 6 identifier. And then, we're going to just return those two nodes. You can run that match again. But now, We can take those values and pull them over to the company node to enrich it. And we'll take one more step. With that same pairing from company to form, we're actually, The investment manager nodes will have a manager label. Let's create those next. We've got a manager with a manager label. We want it to be unique based on the manager CIK number. When we're creating that, we're going to set the manager name and the manager address. As before, we'll pass in a dictionary to the manager parameter. That'll be the query parameter that is used inside of this query to create these nodes. We're going to do this first, just for the first form 13. And there's the manager node. We've got the name Royal Bank of Canada, their address. Looks good. There will be many management companies, up to 561, right? Also, we can create a full text index on the manager nodes. The full text index is useful for keyword search. If you think about a vector index that allows you doing searching based on similar concepts, a full text index allows searching based on similar looking strings. You can try out directly querying the full-text index the same way that you can directly query the vector index. That query will return a node and score very much like what happens with the vector search. If that matches, we'll find the node manager name. And then, also the score. Just use Python to loop through all the rows. So here, in params, we've got manager param set to whatever form 13 is. We know that's a dictionary because all form 13s is a list of dictionaries. Again, sanity checking is always a good idea. So, We're expecting 561. Perfect. You can now find pairs of manager nodes and company nodes using information from the form 13 CSV. You can see in this query that we're going to pass in a query parameter called investment parameter, Cool. If you remember our first row, it was Royal Bank of Canada. Of course, they invested in NetApp. So, this all looks correct. You can find a manager node and the company they invested in. That's great. You can now connect those nodes together. This is something you've done before in the course, but the query will get a little long. So, let's go through it one line at a time. And the first line we want is the exact same match we had before. So now, we have a manager and the related company that they've invested in. We'll emerge from that manager through a own stock in relationship over to the company. And we want the own stock in to be unique in case they've had multiple investments. And if you recall, looking back at that CSV file, some of the rows had a report calendar or quarter value. Let's use that as unique property on the own stock in relationship we're about to create. The property will be called report calendar or quarter. And we're going to grab that from the query parameter. Okay. And now, you've seen this before. This is on create. Okay, let's close that out. And when we call this with KG query, the parameter that's going to get passed in is called owns param. That's what we've used throughout here. And we're going to use just form 13 for now, the first form 13. Okay, before we run this, it looks like I've missed one thing. When it's created, we want this to be the owns. Okay, I think, Our good friend, Royal Bank of Canada and NetApp. Awesome. Okay, we'll just run a quick query to sanity check again, make sure that the relationship actually exists. We'll grab the relationship here in the pattern. And then, Cool. Having done that once, you can now look through all the rows of the CSV file to create an own stock, Of course, that company will be NetApp. Now, Another quick check to make sure that we did the right thing. We're expecting that we have 561 relationships created. Perfect. We have changed the Knowledge Graph quite a bit from when we first started. Let's take a look at the scheme of the Knowledge Graph We can do that by refreshing the schema on the knowledge graph and then just printing out that schema. We'll take advantage of Textwrap here to try to get some good formatting. Okay, first the nodes that we've got inside of the Knowledge Graph. You can see that we've got a chunk node with its properties. And over here, there's the form node properties for that. There's the manager that we created, a company we created, and all those properties. That's awesome. The bottom half is relationships. And we can go down to here, and see we've got from We know that chunks have a next to other chunks. That's the linked list of chunks. That's awesome. And also you can go from the form through a section to a chunk. That's how we found the beginning of the linked list, right? Finally, we've got that the manager owns stock in a company and that the company has filed a form. All that together ends up being the Knowledge Graph we just created. What I love about graphs is that they are awesome for exploration. Let's have some fun looking around to see what we can find. To begin, find a random chunk to use in later queries. Cool, there's our trunk ID. Well, let's store that so we can use it later. You can see that's a list with a dictionary inside. So first, let's grab things out of the list. So now, we have just the dictionary inside. From that chunk first row, we'll grab the chunk ID and store that in ref chunk ID for later queries. We'll take one more step here, and in this pattern, And we're gonna return the company name. As you'd expect, it's our good friend NetApp. Okay, we're gonna extend that yet one more step. Okay, the com is the company here. All these variables have to match up. This is one big pattern broke up into three sections. We'll return the company name and account of how many managers, as we'll call it the number of investors, invested in that company. This is great validation, okay? Of course the company is NetApp and we know, as expected, there's 561 investors in that company. We've done good work. You're starting to see some of the fun things that are possible with the knowledge graph. That pattern you just created, from a chunk all the way to an investor, is useful information. You can use that information for expanding the context provided to an LLM. For example, you can find investors for a company. Then, You'll use the same match you had before, the same pattern. And then, This is just concatenating strings together. But then here, we're taking the value. And we want to have that be nicely formatted, we know that that represents dollars. Let's save that into results and see if we can pull out the sentences so you can read them a bit better. Okay, looking at just the first sentence we created, you can see that this fun named company owns a lot of shares of NetApp and at a value of well, gosh, quite a lot as well. You see the kinds of things you can do with pattern matching and turning the values from those patterns into sentences. Let's put that to work inside of a rag workflow. We'll set up two different langchain chains as we did before. One will be just a regular vector lookup. And then, We'll test them both out on some questions. The first chain that we'll call plain chain, this will be the one that doesn't have anything other than the strict vector search. And then, we're going to define a cipher query This pattern should look familiar at this point. From a particular node, remember this will come from the vector search. It'll give us a node that it found that is similar to the question that was asked. We're going to go from that node, node that is part of a form. From the form, we know that it was filed by some company. There's a manager who owns stock in that company. The arrows are pointing the other directions. You got to read those backwards this time. From the original node, we're also going to take the score, the manager, the owns relationship, and then the company. We're going to order all of that by the investment shares descending and just limited to 10. As in previous lessons, we'll create a new vector store. But now, Let's start with an obvious one since we know this is all about NetApp. Let's have a question that says, in a single sentence, tell me about NetApp. We'll use the plain chain that we just defined, which is the one that just does the vector search. We'll run that first. You can see that here, NetApp is a global led cloud company. Yeah, this all makes sense. Let's try the same thing with the investment chain. Let's see if that extra context changes a summary about what NetApp does. So that actually looks pretty similar, which isn't unexpected. The LLM ignored the extra information about investments because we didn't really ask about that. Again, we'll go ahead and start with plain chain. That's only going to do the vector search and see what answer we get. Okay, things that the investors are a diversified customer base. Okay, just kind of putting things together here, trying to kind of come up with some kind of an answer. So now, we'll try that same question, but with the investment chain, This is actually a great place to start tinkering around a little bit. You can change a couple of different things. You can change the sentences that we're creating out of the investments, see how that impacts what things you get, change the question that you're asking here at the end, and see how different prompts actually adjust what you get out of the LLM. There's a bit of an art still involved in getting the LLM to understand the information you're giving it, We'll explore a lot more of that in lesson seven. So, let's go to that next.