The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training

This repository contains the official implementation for the paper: "The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training".

Abstract

Recent large language models (LLMs) exhibit impressive reasoning but often overthink, generating excessively long responses that hinder efficiency. We introduce DIfficulty-AwarE Training (DIET), a framework that systematically cuts these "token calories" by integrating on-the-fly problem difficulty into the reinforcement learning (RL) process. DIET dynamically adapts token compression strategies by modulating token penalty strength and conditioning target lengths on estimated task difficulty, to optimize the performance-efficiency trade-off. We also theoretically analyze the pitfalls of naive reward weighting in group-normalized RL algorithms like GRPO, and propose Advantage Weighting technique, which enables stable and effective implementation of these difficulty-aware objectives. Experimental results demonstrate that DIET significantly reduces token counts while simultaneously improving reasoning performance. Beyond raw token reduction, we show two crucial benefits largely overlooked by prior work: (1) DIET enhances the natural positive correlation between response length and problem difficulty, ensuring verbosity is appropriately allocated, unlike many existing compression methods that disrupt this relationship. (2) Critically, DIET leads to superior inference scaling; by maintaining high per-sample quality with fewer tokens, it enables better aggregate performance (e.g., via majority voting) under fixed computational budgets, an area where other methods falter. Our analyses provide a principled and effective framework for developing more efficient, practical, and high-performing LLMs.

Installation

Create Conda Environment:

conda create -n diet python==3.11
conda activate diet

Install verl library and dependencies: The verl directory contains core code for our reinforcement learning framework.

git clone [email protected]:thunlp/DIET.git
cd diet

cd verl
pip install -e .  # Installs 'verl' in editable mode
cd ..
pip install -r requirements.txt

Data and Model Setup

Follow these steps to download the necessary dataset and base model.

1. Download Dataset (DeepScaleR)

The DeepScaleR dataset will be downloaded to verl/dataset/deepscaler_dataset/.

huggingface-cli download --repo-type dataset --resume-download agentica-org/DeepScaleR-Preview-Dataset --local-dir verl/dataset/deepscaler_dataset --local-dir-use-symlinks False

2. Download Base Model (R1-Distilled Qwen 1.5B)

The base model will be downloaded to checkpoints/r1-distilled-qwen-1.5b/.

huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir checkpoints/r1-distilled-qwen-1.5b --local-dir-use-symlinks False

3. Data Preprocessing

This step preprocesses the DeepScaleR dataset.

cd verl
python examples/data_preprocess/deepscaler.py \
    --input_path dataset/deepscaler_dataset/ 
cd ..

Note: The --input_path is relative to the verl directory after cd verl.

Training

Training with DIET

To train our main DIET model:

cd verl
bash bash_scripts/deepscaler_r1_distilled_qwen_1.5b/diet.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
verl		verl
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training

Abstract

Installation

Data and Model Setup

1. Download Dataset (DeepScaleR)

2. Download Base Model (R1-Distilled Qwen 1.5B)

3. Data Preprocessing

Training

Training with DIET

About

Uh oh!

Releases

Packages

Uh oh!

Languages

thunlp/DIET

Folders and files

Latest commit

History

Repository files navigation

The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training

Abstract

Installation

Data and Model Setup

1. Download Dataset (DeepScaleR)

2. Download Base Model (R1-Distilled Qwen 1.5B)

3. Data Preprocessing

Training

Training with DIET

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages