Skip to content

Commit 64e0724

Browse files
committed
update
1 parent 332285e commit 64e0724

File tree

1 file changed

+9
-6
lines changed
  • 3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development

1 file changed

+9
-6
lines changed

3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/README.md

+9-6
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,15 @@
11
# End-to-End LLama3-70B model development with Torchtune <!-- omit in toc -->
22

3-
In this tutorial, you will see how to:
3+
This tutorial guide you through each following LLM model development steps using Llama3-70B:
4+
45
* Contious Pretraining
56
* Instruction Finetuning
67
* Alignment
78
* Evaluation
89
* Deployment
910

11+
for details of each step, refer the [overview documentation](../../README.md).
12+
1013
## 1. Prerequisites
1114
Before starting, ensure you have requested access to Meta-Llama-3-70B by visiting [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on Hugging Face and following the access request instructions. Additionally, make sure all prerequisites described in the [slurm](..) directory are set up.
1215

@@ -22,8 +25,6 @@ Navigate to the [test case path](..) and prepare your environment by sourcing th
2225
source .env
2326
```
2427

25-
This step is crucial for configuring the necessary paths and credentials for accessing and working with the Llama3-70B model.
26-
2728
### Fetching the Model Weights and Tokenizer
2829

2930
Execute the `download_hf_model.sh` script with the model identifier as an argument to download the model weights and tokenizer:
@@ -67,13 +68,15 @@ By following these steps, you ensure that the necessary model components are in
6768

6869
## 3. Continuous Pretraining
6970

70-
In this step, you will fine-tune the Llama model. Specifically, the finetune process in this step is called Full-parameter finetuning, which will update all the parameters in the original model.
71+
In this step, you will fine-tune Llama3 model from the orinal checkpoint. Specifically, the finetune process in this step is called Full-parameter finetuning, which will update all the parameters in the original model. One of the problem we encounter in such training is memory consumption. A typical model trained in mixed precision with AdamW requires 18 bytes per model parameter plus activation memory (6 bytes for parameters for mixed precision training, 8 bytes for AdamW, 4 bytes).For more details of the anatomy, see [huggingface blog post](https://huggingface.co/docs/transformers/model_memory_anatomy). This means that 70B parameter model training would require more than 1.12 TB of accelerated memory, which is way bigger than 80 GB of H100 accelerated memory size. To tackle the problem, `torchtune` integrates PyTorch Fully Distributed Data Parallel (FSDP). In this framework. PyTorch Fully Sharded Data Parallel (FSDP) is a distributed training feature designed to efficiently handle large model training by sharding model parameters, gradients, and optimizer states across multiple devices. This approach significantly reduces memory consumption and optimizes resource utilization, making it possible to train models that are too large to fit on a single GPU.
7172

7273
```bash
73-
sbatch tutorials/e2e-llama3-70b-development/pretrain.sbatch
74+
sbatch tutorials/e2e-llama3-70b-development/full_finetune_distributed.sbatch
7475
```
7576

7677

78+
79+
7780
## 4. Instruction-tuning
7881

7982
In this step, you will fine-tune the LLaMA model using Low-Rank Adaptation (LoRA) with the Alpaca dataset. We will first cover the basic concepts and relevant configurations found in the [config file](configs/lora_finetune_distributed.yaml), followed by a detailed fine-tuning tutorial.
@@ -82,7 +85,7 @@ In this step, you will fine-tune the LLaMA model using Low-Rank Adaptation (LoRA
8285
### Basic Concepts and Relevant Configurations
8386

8487
**Low-Rank Adaptation (LoRA)** is a method for fine-tuning large language models efficiently. It is a Parameter-efficient Fine-tuning (PEFT) technique that modifies a small, low-rank subset of a model's parameters, significantly reducing the computational cost and time required for fine-tuning. LoRA operates on the principle that large models, despite their size, inherently possess a low-dimensional structure, allowing significant changes to be represented with fewer parameters. This method involves decomposing large weight matrices into smaller matrices, drastically reducing the number of trainable parameters and making the adaptation process faster and less resource-intensive. It leverages the concept of lower-rank matrices to efficiently train models, making it a cost-effective solution for fine-tuning large language models.
85-
88+
![lora](./docs/lora.png)
8689
In the config we have following relevant section:
8790

8891
```yaml

0 commit comments

Comments
 (0)