In this video, we'll discuss how to train this UNet neural network and get it to predict noise. So the goal of the neural network is for it to predict noise, and really it learns the distribution of what is noise on the image, but also what is not noise, what is sprite likeness, right? And so how we do that is that we take a sprite from our training data, and we actually add noise to it. We add noise to it, and we give it to the neural network, and we ask the neural network to predict that noise. And then we compare the predicted noise against the actual noise that was added to that image, and that's how we compute the loss. And that backprops through the neural network, so then the neural network learns to predict that noise better. So how do you determine what this noise here is? You could just go through time and sampling and give it different noise levels. But realistically, in training, we don't want the neural network to be looking at the same sprite all the time. It helps it to be more stable if it looks at different sprites across an epoch, and it's just more uniform. So actually what we do is we randomly sample what this time step could be. We then get the noise level appropriate to that time step. We add it to this image, and then we have the neural network predict it. We take the next sprite image in our training data. We again sample a random time step. It could be totally different like you see here. And then we add it to this sprite image, and again we have the neural network predict the noise that was added. And this results in a much more stable training scheme. So what does training actually look like? Here is a wizard hat sprite, and here is what a noise input would look like. And when you first put it into the neural network at epoch 0, the neural network hasn't really learned what a sprite is yet. So the predicted noise doesn't quite change what the input looks like, and when it's subtracted out, it actually just turns into this, which looks about the same. But by the time you get to epoch 31, the neural network has a better understanding of what this sprite looks like. So then it predicts noise, that is then subtracted from this input to produce something that does look like this wizard hat sprite. Cool, so that was for one sample. This is for multiple different samples, multiple different sprites, across many epochs, and what that looks like. As you can see in this first epoch, it is quite far from sprites, but by the time you get to epoch 32 here, it looks quite like little video game characters. And even before that. Alright, so now we'll go through the training algorithm with some code. So first you want to sample a training image. So here we are loading up all the data into the data loader. We're putting it into a progress bar so that we can visualize it. But essentially imagine all the data here. We're then iterating through all of the data samples, so "x" here is a training image. So just looking at "x" now. Within this loop, we are sampling a time step "t". And this determines the level of noise. We're not going through all the time steps, just sampling a time step "t". We sample a noise, we add that noise to the image based on that time step, and then we input that noise image into the neural network. We also put in the time step because we also care about adding that time embedding in, and the neural network predicts the noise as output. Comparing that predicted noise with the noise that we actually added, we can compute the loss using mean squared error, MSE. And then all we have to do is backprop and learn. So the model will then learn what is noise and what is sprite. Cool, so on to our training notebook here. So this is all the same as before, just hitting shift enter, setting things up with the UNet. Here what's interesting is our training hyperparameters are batch size of 100, and we are going through 32 different epochs and our learning rate. I'm just going to shift enter there to run it. Again, similar things here of setting up the model and the noise schedule for those scaling parameters. Now here's where you get into training. So you can load your data set and we're loading it into that data loader here. It's this 16x16 sprites data set. And we're also loading up our optimizer. And here's the function that perturbs our input, meaning that it takes the image. It adds the right noise level for that specific time step to that image and returns it. So I can hit shift enter here. So here we won't actually step through training because it takes many hours on CPU. And that's where these notebooks are hosted. But I really recommend that you do go through this. This is the exact code that we had just looked at together. But what we can do is that we did train this model and save the model at different epochs such that you can then run sampling and be able to see how it does at each epoch. So this is, again, the same sampling code you saw before. I'm just going to breeze through that. And here is where you would load the model for epoch 0. So this is the path to that model checkpoint model_zero for epoch 0. I'm going to load that model. And then here you can just visualize the samples. And again, this is running the same sampling method as before, called DDPM, that you learned in the previous video. This takes a couple minutes and we'll speed this up in the video. Great! So we can hit play here. A bit amorphous still, but you know, starting to understand the general outlines of sprites. It's not pure noise. So we also have epoch 4 here for you to see. So you can see the model improving. These look a little bit more like sprites. And epoch 8. A little bit more. See some books in there, actually. And finally, epoch 31. Or this might actually be 32 when we index from zero. So these look a lot more like sprites. You can see a sword here. This is probably the wizard hat. See a potion here. But of course, there are still some blobs here and there, some people here. So it's not perfect and it could keep going. So in the next video, you'll get to control what you generate, meaning you can tell it to generate objects or these people, for example.