Skip to content

Commit 332285e

Browse files
committed
update
1 parent c0397a4 commit 332285e

File tree

7 files changed

+183
-56
lines changed

7 files changed

+183
-56
lines changed

3.test_cases/torchtune/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ This guide demonstrates the comprehensive process of developing a Large Language
44

55
![LLMOps](docs/LLMOps.png)
66

7-
1. **Data Preparation**: The journey begins with the collection and preparation of data for training. This step is crucial as it involves exploring the data's characteristics, performing necessary cleaning, and applying preprocessing techniques to ensure the data is in the right shape for model training.
7+
1. **(Continuous) Pretraining the Language Model**: Next, the language model undergoes pretraining on a vast corpus of text data. This step can be bypassed if starting with an already pretrained model. Pretraining is essential for the model to learn the general patterns and structures of language. Refer `torchtitan` test case for the large scale pretraining with the latest techniques such as 3D parallelism and `torch.compile`.
88

9-
2. **Pretraining the Language Model**: Next, the language model undergoes pretraining on a vast corpus of text data. This step can be bypassed if starting with an already pretrained model. Pretraining is essential for the model to learn the general patterns and structures of language. Refer `torchtitan` test case for the large scale pretraining with the latest techniques such as 3D parallelism and `torch.compile`.
9+
2. **Instruction Tuning**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.
1010

11-
3. **Fine-Tuning**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.
11+
3. **Aligment**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.
1212

1313
4. **Evaluation**: Evaluating the LLM's performance is a critical step. It involves using various metrics to assess the model's accuracy and effectiveness. This step is vital for validating new techniques and objectively comparing different model releases.
1414

11.3 KB
Loading

3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/README.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
# End-to-End LLama3-70B model development with Torchtune <!-- omit in toc -->
22

33
In this tutorial, you will see how to:
4-
* Pretrain
5-
* Finetune
6-
* Evaluate
7-
* Deploy
4+
* Contious Pretraining
5+
* Instruction Finetuning
6+
* Alignment
7+
* Evaluation
8+
* Deployment
89

910
## 1. Prerequisites
1011
Before starting, ensure you have requested access to Meta-Llama-3-70B by visiting [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on Hugging Face and following the access request instructions. Additionally, make sure all prerequisites described in the [slurm](..) directory are set up.
@@ -64,16 +65,16 @@ This output confirms that the `torchtune download` command has been executed wit
6465
By following these steps, you ensure that the necessary model components are in place, setting the stage for subsequent tasks such as pretraining, finetuning, evaluation, and deployment.
6566

6667

67-
## 3. Full-parameter finetuning
68+
## 3. Continuous Pretraining
6869

69-
WIP In this step, you will author Llama3 model using c4 dataset.
70+
In this step, you will fine-tune the Llama model. Specifically, the finetune process in this step is called Full-parameter finetuning, which will update all the parameters in the original model.
7071

7172
```bash
7273
sbatch tutorials/e2e-llama3-70b-development/pretrain.sbatch
7374
```
7475

7576

76-
## 4. Lora parameter efficient finetuning
77+
## 4. Instruction-tuning
7778

7879
In this step, you will fine-tune the LLaMA model using Low-Rank Adaptation (LoRA) with the Alpaca dataset. We will first cover the basic concepts and relevant configurations found in the [config file](configs/lora_finetune_distributed.yaml), followed by a detailed fine-tuning tutorial.
7980

@@ -111,6 +112,10 @@ dataset:
111112

112113
As the config suggests, we use a predefined dataset class prepared in torchtune.
113114

115+
## 5. Alignment
116+
117+
118+
114119
### Submit Finetuning job
115120

116121
You can submit the finetuning job with the following command:
@@ -226,15 +231,33 @@ quantizer:
226231
groupsize: 256
227232
```
228233
229-
`Int4WeightOnlyQuantizer` performs per-axis group quantization, which means it quantizes weights in groups rather than individually. This helps maintain a balance between compression and model accuracy.
234+
`Int4WeightOnlyQuantizer` performs per-axis group quantization, which means it quantizes weights in groups rather than individually. By adjusting the `groupsize`, one can control the trade-off between compression ratio and accuracy. Smaller group sizes typically lead to higher accuracy but lower compression, while larger group sizes achieve higher compression at the potential cost of accuracy.
230235

231236
```bash
232237
sbatch quentize.sbatch
233238
```
234239

235240

241+
```bash
242+
Executing following command:
243+
torchtune run quantize --config /fsx/ubuntu/awsome-distributed-training/3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/configs/quantize.yaml tokenizer.path=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B/original/tokenizer.model checkpointer.checkpoint_dir=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-tuned checkpointer.output_dir=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-quantized
244+
```
245+
246+
The resultant quantized weights is saved as follows:
247+
248+
```bash
249+
0: 2024-05-31:02:10:46,964 DEBUG [seed.py:60] Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
250+
0: 2024-05-31:02:18:17,728 INFO [quantize.py:90] Model is initialized with precision torch.bfloat16.
251+
0: 2024-05-31:02:20:33,576 INFO [quantize.py:98] Time for quantization: 133.08 sec
252+
0: 2024-05-31:02:20:33,577 INFO [quantize.py:99] Memory used: 40.03 GB
253+
0: 2024-05-31:02:21:18,609 INFO [quantize.py:112] Model checkpoint of size 37.94 GB saved to /fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-quantized/hf_model_0001_0-4w.pt
254+
```
255+
256+
236257
## 7. Generation
237258

259+
Now that you have production-ready quantized model. This last step test text generation using the model.
260+
238261
```bash
239262
sbatch 7.generate.sbatch --config configs/generate_llama3.yaml --prompt "Hello, my name is"
240263
```

3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/configs/quantize.yaml

Lines changed: 32 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -12,42 +12,43 @@ checkpointer:
1212
_component_: torchtune.utils.FullModelHFCheckpointer
1313
checkpoint_dir: ${MODEL_PATH}/${HF_MODEL}
1414
checkpoint_files: [
15-
model-00001-of-00030.safetensors,
16-
model-00002-of-00030.safetensors,
17-
model-00003-of-00030.safetensors,
18-
model-00004-of-00030.safetensors,
19-
model-00005-of-00030.safetensors,
20-
model-00006-of-00030.safetensors,
21-
model-00007-of-00030.safetensors,
22-
model-00008-of-00030.safetensors,
23-
model-00009-of-00030.safetensors,
24-
model-00010-of-00030.safetensors,
25-
model-00011-of-00030.safetensors,
26-
model-00012-of-00030.safetensors,
27-
model-00013-of-00030.safetensors,
28-
model-00014-of-00030.safetensors,
29-
model-00015-of-00030.safetensors,
30-
model-00016-of-00030.safetensors,
31-
model-00017-of-00030.safetensors,
32-
model-00018-of-00030.safetensors,
33-
model-00019-of-00030.safetensors,
34-
model-00020-of-00030.safetensors,
35-
model-00021-of-00030.safetensors,
36-
model-00022-of-00030.safetensors,
37-
model-00023-of-00030.safetensors,
38-
model-00024-of-00030.safetensors,
39-
model-00025-of-00030.safetensors,
40-
model-00026-of-00030.safetensors,
41-
model-00027-of-00030.safetensors,
42-
model-00028-of-00030.safetensors,
43-
model-00029-of-00030.safetensors,
44-
model-00030-of-00030.safetensors,
15+
hf_model_0001_0.pt,
16+
hf_model_0002_0.pt,
17+
hf_model_0003_0.pt,
18+
hf_model_0004_0.pt,
19+
hf_model_0005_0.pt,
20+
hf_model_0006_0.pt,
21+
hf_model_0007_0.pt,
22+
hf_model_0007_0.pt,
23+
hf_model_0008_0.pt,
24+
hf_model_0009_0.pt,
25+
hf_model_0010_0.pt,
26+
hf_model_0011_0.pt,
27+
hf_model_0012_0.pt,
28+
hf_model_0013_0.pt,
29+
hf_model_0014_0.pt,
30+
hf_model_0015_0.pt,
31+
hf_model_0016_0.pt,
32+
hf_model_0017_0.pt,
33+
hf_model_0018_0.pt,
34+
hf_model_0019_0.pt,
35+
hf_model_0020_0.pt,
36+
hf_model_0021_0.pt,
37+
hf_model_0022_0.pt,
38+
hf_model_0023_0.pt,
39+
hf_model_0024_0.pt,
40+
hf_model_0025_0.pt,
41+
hf_model_0026_0.pt,
42+
hf_model_0027_0.pt,
43+
hf_model_0028_0.pt,
44+
hf_model_0029_0.pt,
45+
hf_model_0030_0.pt,
4546
]
4647
recipe_checkpoint: null
4748
output_dir: ${MODEL_PATH}/${HF_MODEL}-quantized
4849
model_type: LLAMA3
4950

50-
device: cuda
51+
device: cpu
5152
dtype: bf16
5253
seed: 1234
5354

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
#!/bin/bash
2+
3+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
4+
# SPDX-License-Identifier: MIT-0
5+
6+
#SBATCH --job-name=full-finetuning
7+
#SBATCH --nodes=2
8+
#SBATCH --ntasks=2
9+
#SBATCH --gpus-per-node=8 # Number of GPU per node
10+
#SBATCH --output=logs/%x_%j.out # logfile for stdout
11+
#SBATCH --error=logs/%x_%j.err # logfile for stderr, remove it to merge both outputs
12+
#SBATCH --wait-all-nodes=1
13+
#SBATCH --exclusive
14+
set -euxo pipefail
15+
16+
##################################################################
17+
########### Check current working directory ######################
18+
##################################################################
19+
if [ $(basename $(pwd)) != "slurm" ]
20+
then
21+
echo "Please run this script from the slurm directory"
22+
exit 1
23+
fi
24+
##################################################################
25+
############# Load environment variables #########################
26+
##################################################################
27+
# Load environment variables
28+
if [ ! -f .env ]
29+
then
30+
echo "Please create a .env file with the required environment variables"
31+
exit 1
32+
else
33+
source .env
34+
fi
35+
36+
##################################################################
37+
######### Define EFA/NCCL/Slurm environment variables ############
38+
##################################################################
39+
## EFA settings
40+
export FI_LOG_LEVEL=1
41+
export FI_PROVIDER=efa # change to eth if you want to use ENA for comparisons
42+
export FI_EFA_USE_HUGE_PAGE=0
43+
# https://discuss.pytorch.org/t/nccl-network-is-unreachable-connection-refused-when-initializing-ddp/137352
44+
# https://github.com/pytorch/pytorch/issues/68893
45+
export NCCL_SOCKET_IFNAME=en
46+
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
47+
export NCCL_DEBUG=INFO
48+
export HOSTNAMES=`scontrol show hostnames "$SLURM_JOB_NODELIST"`
49+
export MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
50+
export COUNT_NODE=`scontrol show hostnames "$SLURM_JOB_NODELIST" | wc -l`
51+
export NODES=( $( scontrol show hostnames $SLURM_JOB_NODELIST ) )
52+
export NODES_ARRAY=($NODES)
53+
export HEAD_NODE=${NODES_ARRAY[0]}
54+
export MASTER_ADDR=$(hostname --ip-address)
55+
export MASTER_PORT=$RANDOM
56+
export NNODES=$SLURM_JOB_NUM_NODES
57+
export NPROC=$SLURM_GPUS_PER_NODE
58+
export WORLD_SIZE=$(( $NNODES * $NPROC ))
59+
60+
##################################################################
61+
############# Set training arguments #############################
62+
##################################################################
63+
export HF_MODEL="meta-llama/Meta-Llama-3-70B"
64+
: "${CONTAINER_MOUNT:=$FSX_PATH:$FSX_PATH}"
65+
declare -a SRUN_ARGS=(
66+
--container-image $ENROOT_IMAGE
67+
--container-mounts $CONTAINER_MOUNT
68+
)
69+
declare -a TORCHRUN_ARGS=(
70+
# change this to match the number of gpus per node:
71+
--master_addr $MASTER_ADDR
72+
--master_port $RANDOM
73+
--nproc_per_node=8
74+
--nnodes $NNODES
75+
--nnodes=$SLURM_JOB_NUM_NODES
76+
--rdzv_backend=c10d
77+
--rdzv_endpoint=$(hostname)
78+
)
79+
declare -a TRAIN_ARGS=(
80+
--config ${PWD}/tutorials/e2e-llama3-70b-development/configs/lora_finetune_distributed.yaml
81+
tokenizer.path=${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
82+
checkpointer.checkpoint_dir=${MODEL_PATH}/${HF_MODEL}
83+
checkpointer.output_dir=${MODEL_PATH}/${HF_MODEL}-tuned
84+
output_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log
85+
metric_logger.log_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log/metrics
86+
)
87+
##################################################################
88+
################# Run torchtune ##################################
89+
##################################################################
90+
export PYTHONPATH=${PWD}/torchtune
91+
export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
92+
export TORCHTUNE_COMMAND="full_finetune_distributed"
93+
echo "Executing following command:"
94+
echo "torchtune" "run" "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TORCHTUNE_ARGS[@]}"
95+
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"

3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/quantize.sbatch

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,14 @@
1313
#SBATCH --exclusive
1414
set -euxo pipefail
1515

16+
##################################################################
17+
########### Check current working directory ######################
18+
##################################################################
19+
if [ $(basename $(pwd)) != "slurm" ]
20+
then
21+
echo "Please run this script from the slurm directory"
22+
exit 1
23+
fi
1624
##################################################################
1725
############# Load environment variables #########################
1826
##################################################################
@@ -50,26 +58,26 @@ export NPROC=$SLURM_GPUS_PER_NODE
5058
export WORLD_SIZE=$(( $NNODES * $NPROC ))
5159

5260
##################################################################
53-
############### Create train config ##############################
54-
##################################################################
55-
if [ ! -d ${FSX_PATH}/tmp ]; then
56-
mkdir -p ${FSX_PATH}/tmp
57-
fi
58-
cat ${PWD}/train_configs/quantize_llama3.yaml | envsubst > ${FSX_PATH}/tmp/quantize_llama3.yaml
59-
##################################################################
60-
################# Set arguments ##################################
61+
############# Set training arguments #############################
6162
##################################################################
63+
export HF_MODEL="meta-llama/Meta-Llama-3-70B"
6264
: "${CONTAINER_MOUNT:=$FSX_PATH:$FSX_PATH}"
6365
declare -a SRUN_ARGS=(
6466
--container-image $ENROOT_IMAGE
6567
--container-mounts $CONTAINER_MOUNT
6668
)
6769
declare -a TRAIN_ARGS=(
68-
--config ${FSX_PATH}/tmp/quantize_llama3.yaml
70+
--config ${PWD}/tutorials/e2e-llama3-70b-development/configs/quantize.yaml
71+
tokenizer.path=${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
72+
checkpointer.checkpoint_dir=${MODEL_PATH}/${HF_MODEL}-tuned
73+
checkpointer.output_dir=${MODEL_PATH}/${HF_MODEL}-quantized
6974
)
70-
71-
export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
75+
##################################################################
76+
################# Run torchtune ##################################
77+
##################################################################
7278
export PYTHONPATH=${PWD}/torchtune
73-
74-
#srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} cp generation /fsx/tmp/generate_llama3.yaml
75-
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run quantize "${TRAIN_ARGS[@]}"
79+
export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
80+
export TORCHTUNE_COMMAND="quantize"
81+
echo "Executing following command:"
82+
echo "torchtune" "run" "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"
83+
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"

0 commit comments

Comments
 (0)