Skip to content

RyanLiu112/Awesome-Process-Reward-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 

Repository files navigation

Awesome Process Reward Models

Awesome

πŸ”— Table of Contents

πŸ“ PRMs for Mathematical Tasks

πŸ’» PRMs for Other Tasks

  • (MT-RewardTree) MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling [arXiv 2025.03] [Code] [Website] [Model] [Data]

  • (GraphPRM) Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners [arXiv 2025.03] [Model] [Data]

  • (ASPRM) AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [arXiv 2025.02] [Code] [Model] [Data]

  • (AgentPRM) Process Reward Models for LLM Agents: Practical Framework and Directions [arXiv 2025.02] [Code]

  • (VersaPRM) VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data [arXiv 2025.02] [Code] [Model] [Data]

  • (MedS$^3$) MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking [arXiv 2025.01] [Code] [Model] [Data]

  • (OpenPRM) OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees [ICLR 2025]

  • (o1-Coder) o1-Coder: an o1 Replication for Coding [arXiv 2024.12] [Code]

  • Process Supervision-Guided Policy Optimization for Code Generation [arXiv 2024.10]

πŸ” Other Process-Supervised Models

  • Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators [arXiv 2025.03] [Code]

πŸŒ‡ Multimodal PRMs

πŸ“Š Benchmarks

πŸ’ͺ Contributing

If you find a paper that should be included but is missing, feel free to create an issue or submit a pull request. Please use the following format to contribute:

- (**Method Name**) Title [[Journal/Conference](Link)] [[arXiv Year.Month](Link)] [[Code](Link)] [[Website](Link)] [[Model](Link)] [[Data](Link)]

πŸ“ Citation

If you find this work helpful, please consider citing the repository:

@misc{Awesome-Process-Reward-Models,
    title        = {Awesome Process Reward Models},
    author       = {Runze Liu and Jian Zhao and Kaiyan Zhang and Zhimu Zhou and Junqi Gao and Dong Li and Jiafei Lyu and Zhouyi Qian and Biqing Qi and Xiu Li and Bowen Zhou},
    howpublished = {\url{https://github.com/RyanLiu112/Awesome-Process-Reward-Models}},
    note         = {GitHub repository},
    year         = {2025}
}

Out recent work on PRM test-time scaling:

@article{zhao2025genprm,
    title   = {GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning},
    author  = {Jian Zhao and Runze Liu and Kaiyan Zhang and Zhimu Zhou and Junqi Gao and Dong Li and Jiafei Lyu and Zhouyi Qian and Biqing Qi and Xiu Li and Bowen Zhou},
    journal = {arXiv preprint arXiv:2504.00891},
    year    = {2025}
}

Our recent work on LLM test-time scaling with PRMs:

@article{liu2025can,
    title   = {Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling},
    author  = {Runze Liu and Junqi Gao and Jian Zhao and Kaiyan Zhang and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
    journal = {arXiv preprint arXiv:2502.06703},
    year    = {2025}
}

About

A comprehensive collection of process reward models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published