zenml-io
diff --git a/‎.github/actions/llm_finetuning_template_test/action.yml
Lines changed: 1 addition & 1 deletion b/‎.github/actions/llm_finetuning_template_test/action.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎copier.yaml
Lines changed: 4 additions & 0 deletions b/‎copier.yaml
Lines changed: 4 additions & 0 deletions
diff --git a/‎template/README.md
Lines changed: 36 additions & 20 deletions b/‎template/README.md
Lines changed: 36 additions & 20 deletions
diff --git a/‎template/configs/orchestrator_finetune.yaml
Lines changed: 5 additions & 2 deletions b/‎template/configs/orchestrator_finetune.yaml
Lines changed: 5 additions & 2 deletions
diff --git a/‎template/configs/remote_finetune.yaml
Lines changed: 17 additions & 2 deletions b/‎template/configs/remote_finetune.yaml
Lines changed: 17 additions & 2 deletions
diff --git a/‎template/pipelines/train.py
Lines changed: 35 additions & 12 deletions b/‎template/pipelines/train.py
Lines changed: 35 additions & 12 deletions
diff --git a/‎template/pipelines/train_accelerated.py
Lines changed: 86 additions & 0 deletions b/‎template/pipelines/train_accelerated.py
Lines changed: 86 additions & 0 deletions
@@ -78,7 +78,7 @@ runs:
     - name: Run pytests
       shell: bash
       run: |
-        pytest ./local_checkout/tests
+        pytest -s ./local_checkout/tests
 
     - name: Clean-up
       shell: bash
 
@@ -62,6 +62,10 @@ steps_of_finetuning:
     type: int
     help: The number of steps of finetuning job.
     default: 300
+use_fast_tokenizer:
+    type: bool
+    help: Wether to use the fast tokenization or not, make sure your base model supports that
+    default: false
 cuda_version:
     type: str
     help: The available cuda version. (Only relevant when using a remote orchestrator)
 
@@ -34,6 +34,11 @@ pip install -r requirements.txt
 
 ### 👷 Combined feature engineering and finetuning pipeline
 
+> [!WARNING]  
+> All steps of this pipeline have a `clean_gpu_memory(force=True)` at the beginning. This is used to ensure that the memory is properly cleared after previous steps.
+>
+> This functionality might affect other GPU processes running on the same environment, so if you don't want to clean the GPU memory between the steps, you can delete those utility calls from all steps.
+
 The easiest way to get started with just a single command is to run the finetuning pipeline with the `orchestrator_finetune.yaml` configuration file, which will do data preparation, model finetuning, evaluation with [Rouge](https://huggingface.co/spaces/evaluate-metric/rouge) and promotion:
 
 ```shell
@@ -50,6 +55,17 @@ When running the pipeline like this, the trained model will be stored in the Zen
   <br/>
 </div>
 
+### ⚡ Accelerate your finetuning
+
+Do you want to benefit from multi-GPU-training with Distributed Data Parallelism (DDP)? Then you can use other configuration files prepared for this purpose.
+For example, `orchestrator_finetune.yaml` can run a finetuning of the [`{{ model_repository }}`](https://huggingface.co/{{ model_repository }}) powered by [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/en/index) on all GPUs available in the environment. To do so, just call:
+
+```shell
+python run.py --config orchestrator_finetune.yaml --accelerate
+```
+
+Under the hood, the finetuning step will spin up the accelerated job using the step code, which will run on all available GPUs.
+
 ## ☁️ Running with a step operator in the stack
 
 To finetune an LLM on remote infrastructure, you can either use a remote orchestrator or a remote step operator. Follow these steps to set up a complete remote stack:
@@ -80,26 +96,26 @@ The project loosely follows [the recommended ZenML project structure](https://do
 
 ```
 .
-├── configs                         # pipeline configuration files
-│   ├── orchestrator_finetune.yaml  # default local or remote orchestrator
-│   └── remote_finetune.yaml        # default step operator configuration
+├── configs                                       # pipeline configuration files
+│   ├── orchestrator_finetune.yaml                # default local or remote orchestrator configuration
+│   └── remote_finetune.yaml                      # default step operator configuration
 ├── materializers
-│   └── directory_materializer.py   # custom materializer to push whole directories to the artifact store and back
-├── pipelines                       # `zenml.pipeline` implementations
-│   └── train.py                    # Finetuning and evaluation pipeline
-├── steps                           # logically grouped `zenml.steps` implementations
-│   ├── evaluate_model.py           # evaluate base and finetuned models using Rouge metrics
-│   ├── finetune.py                 # finetune the base model
-│   ├── prepare_datasets.py         # load and tokenize dataset
-│   └── promote.py                  # promote good models to target environment
-├── utils                           # utility functions
-│   ├── callbacks.py                # custom callbacks
-│   ├── cuda.py                     # helpers for CUDA
-│   ├── loaders.py                  # loaders for models and data
-│   ├── logging.py                  # logging helpers
-│   └── tokenizer.py                # load and tokenize
+│   └── directory_materializer.py                 # custom materializer to push whole directories to the artifact store and back
+├── pipelines                                     # `zenml.pipeline` implementations
+│   └── train.py                                  # Finetuning and evaluation pipeline
+├── steps                                         # logically grouped `zenml.steps` implementations
+│   ├── evaluate_model.py                         # evaluate base and finetuned models using Rouge metrics
+│   ├── finetune.py                               # finetune the base model
+│   ├── log_metadata.py                           # helper step to ensure that model metadata is always logged
+│   ├── prepare_datasets.py                       # load and tokenize dataset
+│   └── promote.py                                # promote good models to target environment
+├── utils                                         # utility functions
+│   ├── callbacks.py                              # custom callbacks
+│   ├── loaders.py                                # loaders for models and data
+│   ├── logging.py                                # logging helpers
+│   └── tokenizer.py                              # load and tokenize
 ├── .dockerignore
-├── README.md                       # this file
-├── requirements.txt                # extra Python dependencies 
-└── run.py                          # CLI tool to run pipelines on ZenML Stack
+├── README.md                                     # this file
+├── requirements.txt                              # extra Python dependencies 
+└── run.py                                        # CLI tool to run pipelines on ZenML Stack
 ```
@@ -14,14 +14,18 @@ settings:
     parent_image: pytorch/pytorch:2.2.2-{{ cuda_version }}-cudnn8-runtime
     requirements: requirements.txt
     python_package_installer: uv
+    python_package_installer_args:
+      system: null
+    apt_packages: 
+      - git
     environment:
       PJRT_DEVICE: CUDA
       USE_TORCH_XLA: "false"
       MKL_SERVICE_FORCE_INTEL: "1"
 
 parameters:
   base_model_id: {{ model_repository }}
-  use_fast: False
+  use_fast: {{ use_fast_tokenizer }}
   load_in_4bit: True
   system_prompt: |
       {{ system_prompt.split("\n") | join("\n      ") }}
@@ -32,7 +36,6 @@ steps:
       dataset_name: {{ dataset_name }}
 
   finetune:
-    enable_step_logs: False
     parameters:
       max_steps: {{ steps_of_finetuning }}
       eval_steps: {{ steps_of_finetuning // 10 }}
 
@@ -14,14 +14,18 @@ settings:
     parent_image: pytorch/pytorch:2.2.2-{{ cuda_version }}-cudnn8-runtime
     requirements: requirements.txt
     python_package_installer: uv
+    python_package_installer_args:
+      system: null
+    apt_packages: 
+      - git
     environment:
       PJRT_DEVICE: CUDA
       USE_TORCH_XLA: "false"
       MKL_SERVICE_FORCE_INTEL: "1"
 
 parameters:
   base_model_id: {{ model_repository }}
-  use_fast: False
+  use_fast: {{ use_fast_tokenizer }}
   load_in_4bit: True
   system_prompt: |
       {{ system_prompt.split("\n") | join("\n      ") }}
@@ -32,17 +36,28 @@ steps:
       dataset_name: {{ dataset_name }}
 
   finetune:
-    enable_step_logs: False
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
     step_operator: {{ step_operator }}
     parameters:
       max_steps: {{ steps_of_finetuning }}
       eval_steps: {{ steps_of_finetuning // 10 }}
       bf16: {{ bf16 }}
 
   evaluate_finetuned:
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
     step_operator: {{ step_operator }}
 
   evaluate_base:
+    retry:
+      max_retries: 3
+      delay: 10
+      backoff: 2
     step_operator: {{ step_operator }}
 
   promote:
 
@@ -1,7 +1,7 @@
 # {% include 'template/license_header' %}
 
 
-from steps import evaluate_model, finetune, prepare_data, promote
+from steps import evaluate_model, finetune, prepare_data, promote, log_metadata_from_step_artifact
 from zenml import pipeline
 
 
@@ -13,7 +13,7 @@ def {{ product_name.replace("-","_") }}_full_finetune(
     load_in_8bit: bool = False,
     load_in_4bit: bool = False,
 ):
-    """Pipeline for finetuning an LLM with peft.
+    """Pipeline for finetuning an LLM with PEFT.
     
     It will run the following steps:
 
@@ -22,36 +22,59 @@ def {{ product_name.replace("-","_") }}_full_finetune(
     - evaluate_model: evaluate the base and finetuned model
     - promote: promote the model to the target stage, if evaluation was successful
     """ 
+    if not load_in_8bit and not load_in_4bit:
+        raise ValueError(
+            "At least one of `load_in_8bit` and `load_in_4bit` must be True."
+        )
+    if load_in_4bit and load_in_8bit:
+        raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")
+
     datasets_dir = prepare_data(
-        base_model_id=base_model_id, 
+        base_model_id=base_model_id,
         system_prompt=system_prompt,
         use_fast=use_fast,
     )
-    ft_model_dir = finetune(
+
+    evaluate_model(
         base_model_id,
+        system_prompt,
         datasets_dir,
+        None,
         use_fast=use_fast,
-        load_in_4bit=load_in_4bit,
         load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_base",
     )
-    evaluate_model(
+    log_metadata_from_step_artifact(
+        "evaluate_base",
+        "base_model_rouge_metrics",
+        after=["evaluate_base"],
+        id="log_metadata_evaluation_base"
+    )
+
+    ft_model_dir = finetune(
         base_model_id,
-        system_prompt,
         datasets_dir,
-        ft_model_dir,
         use_fast=use_fast,
         load_in_8bit=load_in_8bit,
         load_in_4bit=load_in_4bit,
-        id="evaluate_finetuned",
     )
+
     evaluate_model(
         base_model_id,
         system_prompt,
         datasets_dir,
-        None,
+        ft_model_dir,
         use_fast=use_fast,
         load_in_8bit=load_in_8bit,
         load_in_4bit=load_in_4bit,
-        id="evaluate_base",
+        id="evaluate_finetuned",
     )
-    promote(after=["evaluate_finetuned", "evaluate_base"])
+    log_metadata_from_step_artifact(
+        "evaluate_finetuned",
+        "finetuned_model_rouge_metrics",
+        after=["evaluate_finetuned"],
+        id="log_metadata_evaluation_finetuned"
+    )
+
+    promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])
@@ -0,0 +1,86 @@
+# {% include 'template/license_header' %}
+
+from steps import (
+    evaluate_model,
+    finetune,
+    prepare_data,
+    promote,
+    log_metadata_from_step_artifact,
+)
+from zenml import pipeline
+from zenml.integrations.huggingface.steps import run_with_accelerate
+
+
+@pipeline
+def {{ product_name.replace("-","_") }}_full_finetune(
+    system_prompt: str,
+    base_model_id: str,
+    use_fast: bool = True,
+    load_in_8bit: bool = False,
+    load_in_4bit: bool = False,
+):
+    """Pipeline for finetuning an LLM with PEFT powered by Accelerate.
+
+    It will run the following steps:
+
+    - prepare_data: prepare the datasets and tokenize them
+    - finetune: finetune the model
+    - evaluate_model: evaluate the base and finetuned model
+    - promote: promote the model to the target stage, if evaluation was successful
+    """
+    if not load_in_8bit and not load_in_4bit:
+        raise ValueError(
+            "At least one of `load_in_8bit` and `load_in_4bit` must be True."
+        )
+    if load_in_4bit and load_in_8bit:
+        raise ValueError("Only one of `load_in_8bit` and `load_in_4bit` can be True.")
+
+    datasets_dir = prepare_data(
+        base_model_id=base_model_id,
+        system_prompt=system_prompt,
+        use_fast=use_fast,
+    )
+
+    evaluate_model(
+        base_model_id,
+        system_prompt,
+        datasets_dir,
+        None,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_base",
+    )
+    log_metadata_from_step_artifact(
+        "evaluate_base",
+        "base_model_rouge_metrics",
+        after=["evaluate_base"],
+        id="log_metadata_evaluation_base"
+    )
+
+    ft_model_dir = run_with_accelerate(finetune)(
+        base_model_id=base_model_id,
+        dataset_dir=datasets_dir,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+    )
+
+    evaluate_model(
+        base_model_id,
+        system_prompt,
+        datasets_dir,
+        ft_model_dir,
+        use_fast=use_fast,
+        load_in_8bit=load_in_8bit,
+        load_in_4bit=load_in_4bit,
+        id="evaluate_finetuned",
+    )
+    log_metadata_from_step_artifact(
+        "evaluate_finetuned",
+        "finetuned_model_rouge_metrics",
+        after=["evaluate_finetuned"],
+        id="log_metadata_evaluation_finetuned"
+    )
+
+    promote(after=["log_metadata_evaluation_finetuned", "log_metadata_evaluation_base"])