In this video we'll go over the neural network, the architecture of it, and how we can incorporate additional information into it. So the neural network architecture that we use for diffusion models is a UNet. And the most important thing that you need to know about a UNet is that it is taking as input this image, and it's producing as output something of the same size as that image, but here it is that predicted noise. UNets have been around for a very long time, since 2015, and it was first used for image segmentation. It was first used to take an image and actually segment it into either a pedestrian or a car, so it's used a lot in self-driving car research. But what's special about UNets is just that its input and outputs are the same size. And what it does is it first embeds information about this input, so it down-samples with a lot of convolutional layers, into an embedding that compresses all that information into a small amount of space. And then it up-samples with the same number of up-sampling blocks into the output back out for its task. And in this case that task is to predict the noise that was applied to this image. And if you want to look a little bit deeper, which we'll do together, is that each of these named blocks here are also shown in the code with the same names. And this predicted noise is the same dimension, 16x16x3 of the original input image. What's also great about this UNet is that it can take in additional information. So it's compressed that image to understand what's going on, but it can also take in more information. And so what information do we want to include? Well, one thing that's really important for these models is the time embedding. And so this is an embedding that kind of tells the model what the time step is, and therefore what kind of noise level we need. And all you have to do for this time embedding is you embed it into some kind of vector, and you can add it into these up-sampling blocks. Another piece of information that could be useful is a context embedding. We'll do more of this later, but all that context embedding does is it helps you control what the model generates. For example, a text description, like you really want it to be Bob, or some kind of factor, like it needs to be a certain color. We'll discuss this a bit more later. And for that context embedding, you can just multiply it in. Great, so what does that look like in code? Here you can see a context embedding. This is just one of them. And then here you see the time embedding. And in the up-sampling block, all you have to do, again, just like in this diagram, you multiply the context embedding with the up-sampling block, and you add the time embedding. Cool, so now in the notebook, in the forward pass of the model, so this is running the model, you can see some of these down, down, down blocks, and then also these up, up, up blocks here. And again, here are your context and time embeddings. We have two of them here for each of those up blocks. And how these down and up blocks are defined is up here in initialization for the unit. And so for the down, this is what a unit down looks like. And we actually do have that in our helper functions if you want to look at them in greater detail. But they are just convolutional blocks. And in the next video, you'll learn how to train this neural network.