I think, text embeddings are really fun to play with. Let's dive in, and take a look at some examples of text embeddings. So, here's my empty Jupyter notebook. If you're running this locally on your own computer, you need to have the Google Cloud, AI platform. installed. So, you run pip install Google Cloud AI platform. But I already have that on this computer, so I don't need to do that. Let me start by authenticating myself to the Google Cloud AI platform, I'm going to use this authenticate function, which is a helper function that I'm using just for this particular Jupyter Notebook environment to load my credentials and the project ID. If you want, you can print out these objects. So, project ID is just a string that specifies what project you're on. And then, I'm going to run my commands using a server in the region US Central, and then lastly, with this import Vertex AI, and I'm going to initialize this by specifying my project ID, the region where my API calls reserved as well as passing the credentials which contains the secret authentication key to authenticate me to the Vertex AI platform. If you set up your own accounts on Google Cloud, there are few steps needed to register an account, then set up and pull out the project ID, which becomes a string that you copy in here, and then you can select the region, US central one will work fine for many people, or you can choose service closer to wherever you are. And we also have an optional Jupyter notebook later that steps through in detail how to get your own credentials for the Google cloud platform. But for now, I would say, don't worry about this. This is all you need to get through this short course. And if you want to run this locally on your own machine. you can follow the instructions in that later optional Jupyter Notebook to figure out how to get your own credentials and project ID and so on. So, the main topic for this short course is to use text embedding models. So, I'm going to import the text embedding model like so. Next, I'm going to specify this particular text embedding Gecko-001 model, which is the model we'll use today. So, what this command does is it saves an embedding model here. And now, to actually compute an embedding, this is what you do. Set embedding equals, call the embedding model to get an embedding. Let's start off with this single word string life. Next, let's set vector equals embedding 0 dot values. This just extracts the values out of the embedding. And let's print the length of vector. And let's print the first 10 elements of the vector. So here, vector is a 768 dimensional vector and these are the first 10 elements of the embedding. Feel free to print out more elements of this if you want to take a look at all these numbers. So, what we just did was we took the single word life, really the text string with the word life and computed an embedding of it. Let's look at a different example. I can also pass in now a question. What is the meaning of life? and take this text string and compute that embedding. Once again, we end up with a 768 dimensional vector that computes 768 different sort of features for this sentence, and this is the first 10 elements. Because each of these embeddings is a lot of numbers, it's difficult to look at these numbers to understand what they mean. But it turns out one of the most useful applications of these embeddings is to try to decide how similar are two different sentences or two different phrases or two different paragraphs of text. So let's take a look at some more examples and how similar different embeddings are. For this, I'm going to use the scikit-learn packages, cosine similarity, measure similarity. What this does is basically take two vectors, and normalize them to have length one, and then compute their dot product. But this gives one way to measure how similar are two different 768-dimensional or really any other dimensional vectors. And I'm going to compute three embeddings. For the first sentence, I'm going to embed what is the meaning of life. Is it 42 or is it something else? And if you don't know the 42 reference, it's actually a reference to one of my favorite novels that you can search online for the number 42 if you're interested. But let's also embed how does one spend their time well on Earth, which, you know, seems a little bit like asking, what's the meaning of life? And then sentence three is, would you like a salad? I hope the meaning of my life is much more than eating salads. So hopefully, sentence three has maybe a little bit, but not too much to do with sentence one. And then, similar as above, let's just pull out the vectors of these embeddings. And now, let me compute and print out the similarity of, um, all three pairs of sentences. So, let me add this over here and rerun this. And now, we see that the similarity between Vec1 and Vec2, first two sentences is higher, 0.655. So, what is the meaning of life is judged by this embedding to be more similar, to how does one spend their time while on Earth. And the similarity between sentences 2 and 3 is 0.52, between 1 and 3 is 0.54. So, this accurately judges that the first two sentences are more similar in meaning than 1 and 3 or 2 and 3. And it accomplishes this even though there are no words in common between the first sentence and the second sentence. What I'd encourage you to do is pause this video and in the Jupyter Notebook on the left, go ahead and type in some other sentence. Maybe write some sentences about your favorite programming language or your favorite algorithm and maybe your favorite animals or your favorite weekend activities and plug in a few different sentences and see if it accurately judges whether different sentences are more or less similar to each other. I do want to point out one thing, which is that you might see that these numbers, they all look like they're in a pretty similar range. Cosine similarity in theory can go anywhere from 0 to 1. But it turns out that because these vectors are very high dimensional vectors, there are 768 dimensional vectors. It turns out that the cosine similarity values you get out will tend to fall within a relatively narrow range. You probably won't ever get 0 distance or 1.0 distance. But it turns out that even though these numbers feel like they may be in a relatively narrow range, the relative values between these distances are still very helpful. And again, if you plug in different sentences and play with this yourself, hopefully you get a better sense of what these similarity measures might be like. Let's take a deeper look at why sentence embeddings are more powerful, I think, than word embeddings. Let's look at another two different inputs. First input, the kids play in the park. You know, during recess, the kids play in the park. And in the second input is the play was for kids in the park. So, someone puts on a play that is a show for a bunch of kids to watch. If you were to remove what's called stop words, so stop words like the, in, for, and is, those are words that are often perceived to have less semantic meaning in English sometimes. But if you were to remove the stop words from both of these sentences, you really end up with an identical set of three words. Kids play park and play kids park. Now, let's compute the embedding of the words in the first inputs. I'm gonna do a little bit of data wrangling in a second. So, I'm gonna import the NumPy library. And then, let me use this code snippet to call the embedding model on the first input, kids play park. And then, the rest of this code here using an iterator and then NumPy stack, It's just a little bit of data wrangling to reformat the outputs of the embedding model into a 3 by 768 dimensional array. So, that just takes the embeddings and puts it in a, in an array like that. If you want, feel free to pause the video and print out the intermediate values to see what this is doing. But now, let me just do this as well for the second input. So, embedding array 2 is another 3 by 768 dimensional array. And there are three rows because there are three embeddings, one for each of these three words. So, one way that many people used to build sentence level embeddings is to, then take these three embeddings for the different words and to average them together. So, if I were to say the embedding for my first input, the kids play in the park after stop word removal. So, kids play park is, I'm going to take the embedding array one, and take the mean along x is zero. So that just averages it across the three words we have. And, you know, do the same for my second embedding. If I then print out the two embedding vectors, not surprisingly, you end up with the same value. So, because these two lists have exactly the same words, when you embed the words, and then average the embeddings of the individual words, you end up with the same values. Here I'm printing out just the first four elements of this array. You can feel free to check that, you know, all of these 768 elements of this array are identical. In contrast, if you were to call the embedding on the original input sentence like so then if you print out the values of the embeddings, you can see that they're then very different. And that's because the embedding model, um, in addition to not ignoring stop words, a common word like is, a, of, the, it also is much more sophisticated in understanding the word order so that it understands that the semantics or the meaning of the kids play in the park is very different than the play was for kids in the park. So, I do strongly encourage you to pause this video and go play with this yourself. Plug in different sentences, see what embeddings you get, look through the lines of code, make sure they make sense to you and play with these embeddings. So, before wrapping up this video, just to go over the key pieces of syntax we use in this course, you saw me use import Vertex AI. Vertex AI is the name of the Google Cloud Machine Learning platform, and then we use vertex in it, which require specifying the project ID, which references your Google Cloud project, the location where the service will run, so which data center will this run in, and then also the secret credentials for authentication. After setting up Vertex AI, this was the syntax to get an embedding. We will import the text embedding model, then specify the text embedding model, and load the model into embedding model, and then simply call get embeddings on a piece of text. With that, I hope you pause this video and go back to the Jupyter Notebook, and plug in other pieces of text. Write something fun or write something not fun if you insist, but plug in different pieces of text, and see what embeddings, and what results you get, and I hope you have fun with it. When you're done, let's go on to the next video where we'll dive into deeper conceptual understanding of what embeddings are and how they work. I look forward to seeing you in the next video.