Finetuning Large Language Models - DeepLearning.AI

Loading...

Welcome back!

We'd like to know you better so we can create more relevant courses. What do you do for work?

Subscribe to receive AI news, events and course updates from DeepLearning.AI!

Course Syllabus

AI Python for Beginners is a sequence of 0 connected courses. You can navigate to the other courses by clicking on the cards below

Explore Courses
Community
My Learnings

You’ve achieved today’s streak!

Complete one lesson every day to keep the streak going.

Su

Mo

Tu

We

Th

Fr

Sa

You earned a Free Pass!

Free Passes help protect your daily streak. Complete more lessons to earn up to 3 Free Passes.

All right, you made it to our last lesson and these will be some considerations you should take on getting started now, some practical tips, and also a bit of a sneak preview on more advanced training methods. So first, some practical steps to fine-tuning. Just to summarize, first you want to figure out your task, you want to collect data that's related to your tasks inputs and outputs and structure it as such. If you don't have enough data, no problem, just generate some or use a prompt template to create some more. And first, you want to fine tune a small model. I recommend a 400 million to a billion parameter model just to get a sense of where the performance is at with this model. And you should vary the amount of data you actually give to the model to understand how much data actually influences where the model is going. And then you can evaluate your model to see what's going well or not. And finally, you want to collect more data to improve the model through your evaluation. Now, from there, you can now increase your task complexity, so you can make it much harder now. And then you can also increase the model size for performance on that more complex task. So for task-defined tune, you learned about, you know, reading tasks and writing tasks. Writing tasks are a lot harder. These are the more expansive tasks like chatting, writing emails, writing code, and that's because there are more tokens that are produced by the model. So this is a harder task in general for the model. And harder tasks tend to result in needing larger models to be able to handle them. Another way of having a harder task is just having a combination of tasks, asking the model to do a combination of things instead of just one task. And that could mean having an agent be flexible and do several things at once or in just in one step as opposed to multiple steps. So now that you have a sense of model sizes that you need for your task complexity, there's also a compute requirement basically around hardware of what you need to run your models. For the labs that you ran, you saw those 70 million parameter models that ran on CPU. They weren't the best models out there. And I recommend starting with something a little bit more performant in general. So if you see here in this table, the first row, I want to call out of a "1 V100" GPU that's available, for example, on AWS, but also any other cloud platform and you see that it has 16 gigabytes of memory and that means it can run a 7 billion parameter model for inference but for training, training needs far more memory for to store the gradients and the optimizers so it only can actually fit a 1 billion parameter model and if you want to fit a larger model you can see some of the other options available here great so maybe you thought that that was not enough for you you want to work with much larger models? Well, there's something called PEFT or parameter efficient fine tuning, which is a set of different methods that help you do just that, be much more efficient in how you're using your parameters and training your models. And one that I really like is LoRa, which stands for low rank adaptation. And what LoRa does is that it reduces the number of parameters you have to train weights that you have to train by a huge amount. For GPT-3, for example, they found that they could reduce it by 10,000x, which resulted in 3x less memory needed from the GPU. And while you do get slightly below accuracy to fine tuning, this is still a far more efficient way of getting there and you get the same inference latency at the end. So what is exactly happening with LoRa? Well, you're actually training new weights in some of the layers of the model and you're freezing the main pre-trained weights, which you see here in blue. So that's all frozen and you have these new orange weights. Those are the LoRa weights. And the new weights, and this gets a little bit mathy, are the rank decomposition matrices of the original weights change. But what's important is less so, you know, the math behind that here, it's that you can train these separately, alternatively to the pre-trained weights, but then at inference time be able to merge them back into the main pre-trained weights and get that fine-tuned model more efficiently. What I'm really excited about to use LoRa for is adapting it to new tasks and so that means you could train a model with LoRa on one customer's data and then train another one on another customer's data and then be able to merge them each in at inference time when you need them.

How Was Your Experience

Thank you for taking the time to provide feedback on your course experience! Please take a moment to rate the course and share any comments you may have.

Would you recommend this short course to people in your network? (0=Not likely, 10=Extremely likely)
012345678910
Feedback about the Course:
Feedback about the Platform:

Finetuning Large Language Models

Where finetuning fits in

Instruction finetuning

Data preparation

Training process

Evaluation and iteration

Consideration on getting started now

Course Feedback

0%