-
(ThinkPRM) Process Reward Models That Think [arXiv 2025.04] [Code] [Model] [Data]
-
(ActPRM) Efficient Process Reward Model Training via Active Learning [arXiv 2025.04] [Code] [Model] [Data]
-
(GenPRM) GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning [arXiv 2025.04] [Code] [Website] [Model] [Data]
-
(EDU-PRM) Process Reward Modeling with Entropy-Driven Uncertainty [arXiv 2025.03]
-
(R-PRM) R-PRM: Reasoning-Driven Process Reward Modeling [arXiv 2025.03] [Blog] [Code] [Model] [Data]
-
(RetrievalPRM) Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning [arXiv 2025.02] [Code] [Model] [Data]
-
(Multilingual PRM) Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [arXiv 2025.02] [Code] [Data]
-
(Universal PRM) AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification [arXiv 2025.02] [Website] [Model]
-
(Dyve) Dyve: Thinking Fast and Slow for Dynamic Process Verification [arXiv 2025.02] [Code] [Model] [Data]
-
(PURE PRM) Stop Gamma Decay: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning [Blog] [Code] [Model] [Data]
-
(CFPRM) Coarse-to-Fine Process Reward Modeling for Mathematical Reasoning [arXiv 2025.01]
-
(Qwen2.5-Math PRM) The Lessons of Developing Process Reward Models in Mathematical Reasoning [arXiv 2025.01] [Website] [Model]
-
(PPM) rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking [arXiv 2025.01] [Code]
-
(ER-PRM) Entropy-Regularized Process Reward Model [arXiv 2024.12] [Code] [Website] [Model] [Data]
-
(Implicit PRM) Free Process Rewards without Process Labels [arXiv 2024.12] [Code] [Model] [Data]
-
(Skywork PRM) Skywork-o1 Open Series [Model]
-
(RLHFlow PRM) An Implementation of Generative PRM [Code] [Model] [Data]
-
(PQM) Process Reward Model with Q-Value Rankings [ICLR 2025] [arXiv 2024.10] [Code] [Model]
-
(Math-psa) OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models [arXiv 2024.10] [Code] [Website] [Model] [Data]
-
(PAV) Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning [ICLR 2025] [arXiv 2024.10]
-
(FG-PRM) FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning [arXiv 2024.10]
-
(Tree-PLV) Advancing Process Verification for Large Language Models via Tree-Based Preference Learning [arXiv 2024.07]
-
(OmegaPRM) Improve Mathematical Reasoning in Language Models by Automated Process Supervision [arXiv 2024.06] [Code (Third Party)]
-
AlphaMath Almost Zero: process Supervision without process [arXiv 2024.05]
-
(Math-Shepherd) Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations [ACL 2024] [arXiv 2023.12] [Model] [Data]
-
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [arXiv 2023.10]
-
Let's Verify Step by Step [ICLR 2024] [arXiv 2023.05] [Data] [Blog]
-
Solving math word problems with process- and outcome-based feedback [arXiv 2022.11]
-
(MT-RewardTree) MT-RewardTree: A Comprehensive Framework for Advancing LLM-Based Machine Translation via Reward Modeling [arXiv 2025.03] [Code] [Website] [Model] [Data]
-
(GraphPRM) Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners [arXiv 2025.03] [Model] [Data]
-
(ASPRM) AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [arXiv 2025.02] [Code] [Model] [Data]
-
(AgentPRM) Process Reward Models for LLM Agents: Practical Framework and Directions [arXiv 2025.02] [Code]
-
(VersaPRM) VersaPRM: Multi-Domain Process Reward Model via Synthetic Reasoning Data [arXiv 2025.02] [Code] [Model] [Data]
-
(MedS$^3$) MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking [arXiv 2025.01] [Code] [Model] [Data]
-
(OpenPRM) OpenPRM: Building Open-domain Process-based Reward Models with Preference Trees [ICLR 2025]
-
(o1-Coder) o1-Coder: an o1 Replication for Coding [arXiv 2024.12] [Code]
-
Process Supervision-Guided Policy Optimization for Code Generation [arXiv 2024.10]
- Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators [arXiv 2025.03] [Code]
-
(MM-PRM) MM-PRM: An open implementation of OmegaPRM and its corresponding training pipeline [Blog] [Model]
-
(ViLPRM) ViLBench: A Suite for Vision-Language Process Reward Modeling [arXiv 2025.03] [Website] Data]
-
(VisualPRM) VisualPRM: An Effective Process Reward Model for Multimodal Reasoning [arXiv 2025.03] [Website] [Model] [Data]
-
(URSA) URSA: Understanding and Verifying Chain-of-Thought Reasoning in Multimodal Mathematics [arXiv 2025.01] [Code] [Website] [Model] [Data]
-
(M-STAR) Diving into Self-Evolving Training for Multimodal Reasoning [arXiv 2024.12] [Website] [Model]
-
(ViLBench) ViLBench: A Suite for Vision-Language Process Reward Modeling [arXiv 2025.03] [Website] Data]
-
(MPBench) MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification [arXiv 2025.03] [Code] [Website] [Data]
-
(PRMBench) PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models [arXiv 2025.01] [Code] [Website] [Data]
-
(ProcessBench) ProcessBench: Identifying Process Errors in Mathematical Reasoning [arXiv 2024.12] [Code] [Model] [Data]
If you find a paper that should be included but is missing, feel free to create an issue or submit a pull request. Please use the following format to contribute:
- (**Method Name**) Title [[Journal/Conference](Link)] [[arXiv Year.Month](Link)] [[Code](Link)] [[Website](Link)] [[Model](Link)] [[Data](Link)]
If you find this work helpful, please consider citing the repository:
@misc{Awesome-Process-Reward-Models,
title = {Awesome Process Reward Models},
author = {Runze Liu and Jian Zhao and Kaiyan Zhang and Zhimu Zhou and Junqi Gao and Dong Li and Jiafei Lyu and Zhouyi Qian and Biqing Qi and Xiu Li and Bowen Zhou},
howpublished = {\url{https://github.com/RyanLiu112/Awesome-Process-Reward-Models}},
note = {GitHub repository},
year = {2025}
}
Out recent work on PRM test-time scaling:
@article{zhao2025genprm,
title = {GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning},
author = {Jian Zhao and Runze Liu and Kaiyan Zhang and Zhimu Zhou and Junqi Gao and Dong Li and Jiafei Lyu and Zhouyi Qian and Biqing Qi and Xiu Li and Bowen Zhou},
journal = {arXiv preprint arXiv:2504.00891},
year = {2025}
}
Our recent work on LLM test-time scaling with PRMs:
@article{liu2025can,
title = {Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling},
author = {Runze Liu and Junqi Gao and Jian Zhao and Kaiyan Zhang and Xiu Li and Biqing Qi and Wanli Ouyang and Bowen Zhou},
journal = {arXiv preprint arXiv:2502.06703},
year = {2025}
}