This repository contains the official implementation for the paper: "The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training".
Recent large language models (LLMs) exhibit impressive reasoning but often overthink, generating excessively long responses that hinder efficiency. We introduce DIfficulty-AwarE Training (DIET), a framework that systematically cuts these "token calories" by integrating on-the-fly problem difficulty into the reinforcement learning (RL) process. DIET dynamically adapts token compression strategies by modulating token penalty strength and conditioning target lengths on estimated task difficulty, to optimize the performance-efficiency trade-off. We also theoretically analyze the pitfalls of naive reward weighting in group-normalized RL algorithms like GRPO, and propose Advantage Weighting technique, which enables stable and effective implementation of these difficulty-aware objectives. Experimental results demonstrate that DIET significantly reduces token counts while simultaneously improving reasoning performance. Beyond raw token reduction, we show two crucial benefits largely overlooked by prior work: (1) DIET enhances the natural positive correlation between response length and problem difficulty, ensuring verbosity is appropriately allocated, unlike many existing compression methods that disrupt this relationship. (2) Critically, DIET leads to superior inference scaling; by maintaining high per-sample quality with fewer tokens, it enables better aggregate performance (e.g., via majority voting) under fixed computational budgets, an area where other methods falter. Our analyses provide a principled and effective framework for developing more efficient, practical, and high-performing LLMs.
-
Create Conda Environment:
conda create -n diet python==3.11 conda activate diet
-
Install
verl
library and dependencies: Theverl
directory contains core code for our reinforcement learning framework.git clone [email protected]:thunlp/DIET.git cd diet cd verl pip install -e . # Installs 'verl' in editable mode cd .. pip install -r requirements.txt
Follow these steps to download the necessary dataset and base model.
The DeepScaleR dataset will be downloaded to verl/dataset/deepscaler_dataset/
.
huggingface-cli download --repo-type dataset --resume-download agentica-org/DeepScaleR-Preview-Dataset --local-dir verl/dataset/deepscaler_dataset --local-dir-use-symlinks False
The base model will be downloaded to checkpoints/r1-distilled-qwen-1.5b/
.
huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir checkpoints/r1-distilled-qwen-1.5b --local-dir-use-symlinks False
This step preprocesses the DeepScaleR dataset.
cd verl
python examples/data_preprocess/deepscaler.py \
--input_path dataset/deepscaler_dataset/
cd ..
Note: The --input_path
is relative to the verl
directory after cd verl
.
To train our main DIET model:
cd verl
bash bash_scripts/deepscaler_r1_distilled_qwen_1.5b/diet.sh