In this video we'll talk about sampling. We'll get into the details of it and how it works across multiple different iterations. So before we talk about how to train this neural network, let's talk about sampling or what we do with the neural network after it's trained at inference time. So what happens is you have that noise sample. You put it through your trained neural network that has understood what a sprite kind of looks like and what it does is it predicts noise. It predicts noise as opposed to the sprite and then we subtract that predicted noise from the noise sample to get something a little bit more sprite-like. Now realistically that is just a prediction of noise and it doesn't fully remove all the noise so you need multiple steps to get high quality samples. That's after 500 iterations we're able to get something that looks very sprite-like. So now let's step through this algorithmically. So first you can sample a random noise sample and now we start with that original noise you had in the beginning. And then you want to step through time and actually you're stepping backwards through time all the way from the last iteration, 500, where it's completely noisy all the way down to one. And just think of your ink drop. You're actually going backwards in time. It's fully diffused initially and then you're going back all the way up to when it was first dropped into the water. Next, you'll sample some extra noise. We'll actually touch on this in a minute, so don't worry about this just now. Here is where you actually pass that original noise, that sample, back into your neural network, and you get some predicted noise. And this predicted noise is noise that the trained neural network wants to subtract from the original noise to get something that looks more sprite-like. And finally, there's a sampling algorithm called "DDPM", which stands for Denoising Diffusion Probabilistic Models, a paper written by Jonathan Ho, Ajay Jain, and one of my good friends, Pieter Abbeel. And this sampling algorithm essentially is able to get a few numbers for scale. That's not super important, but what is important is that this is where you are actually subtracting out that predicted noise from the original noise sample. And again, you're adding that little extra noise back in, which we'll return to in a moment. All right, let's jump to the notebook. So you'll see here some setup code. I think all that's really important here is that you're importing PyTorch and a lot of utilities from PyTorch. We also import some helper functions here that we had written for the neural network. So I'm just gonna hit shift-enter to run that cell so that we import everything. And now here's setting up the neural network, which we're going to use for sampling. We'll go into the details of this later. So I'm just going to run this and no need to really follow everything that's going on there just yet. Here we're setting up some hyperparameters and that includes those time steps that you've seen there. So that's the 500 time steps. Beta 1 and beta 2 are just some hyperparameters for DDPM. And here you can also see the height. This is the 16 by 16 image. And again, it's just a square image. So I'm going to run this shift-enter again. And this is just a noise schedule defined in the DDPM paper. And all a noise schedule is it determines what level of noise to apply to the image at a certain time step. So this part is just constructing some of the parameters for the DDPM algorithm that you remember those scaling factors. Those scaling values S1, S2, S3, that's being computed here in the noise schedule. And it's called a "Schedule" because it is dependent on the time step. Remember, you're looking through 500 time steps because you're going through those 500 iterations that you see here of slowly removing noise. So I'm just going to run that here. So just dependent on that time step that we're on. Next, I'm just going to instantiate the model, that unit, which we will come back to. And then here is the sampling algorithm, the "denoise add noise" that you had seen previously, which is really the main important part is that it is removing the predicted noise, which is what the model thinks is not a sprite from the original noise. So we can run that shift enter, shift enter to load that model. And then here is what we had just stepped through that sampling algorithm. And specifically here is running the model to get the predicted noise. And then running the denoise. Now let's visualize what sampling looks like over time. This may take a few minutes, depending on what kind of hardware you're running on. And we're going to speed this up in the video. In the next video, you'll also see a more efficient sampling technique as well. All right, let's see it in action. Wow, look at those sprites! So you should definitely pause and try these yourself. Alright, so there's one more extra detail. So right now you have your neural network that's predicting noise from your original noise sample. You subtract it out. Great. And you get something a little bit more sprite-like. But the thing is, this neural network expects this noisy sample, this normally distributed noisy sample as input. And once you've denoised it like this, it's no longer distributed in that way. So actually what you have to do after each step and before each next step is to add in additional noise that's scaled based on what time step you're at, to get passed in as the next sample, the next iteration into your trained neural network. And empirically, this actually helps stabilize the neural network so it doesn't collapse to something that's close to the average of the data set. Meaning it doesn't look like this thing on the left. When we don't add that noise back in, the neural network just produces these average looking blobs of sprites versus when we go add it back in, it actually is able to produce these beautiful images of sprites. So here in the algorithm is where this happens. So we actually sample a random noise again at each time step based on the time step. And then down here, we actually add it back in with that scaling factor S3. So now let's take a look at the notebook. So now in this function "denoise add noise", we are talking about the "add noise" part. And what that is, is this Z that you randomly sample, that's that extra noise. And you scale it by some factor. And then you actually do add it back in. And again, that all happens down here in your main algorithm. All right, so let's pick up where we left off then. So for the incorrect way where we don't add the noise back in, we actually just set Z to zero, and we pass that in. It only subtracts the predicted noise from the original noise, and it doesn't add any extra noise back in. And let's run this with shift enter. And again, this will take a couple minutes. All right, so let's take a look at what this does instead. Oh, no, blobs. So this is obviously not what you want. So make sure you add that extra noise back in. And so you should definitely pause and try these yourself and compare it to the other method where you do add that extra noise. And in the next video, we're going to cover that neural network architecture, that UNet.