[Question] Policy collapse #3267

AntonioClaudiossf · 2025-08-24T10:01:22Z

AntonioClaudiossf
Aug 24, 2025

Question

Hi everyone,

I am currently working on a peg-insertion task using PPO in Isaac Lab. The training starts well, the agent improves and the reward increases, but after a certain point the performance suddenly collapses and becomes unstable.

I will attach the TensorBoard plots of the loss functions and the reward so you can better see what is happening. I have already tried tuning several PPO parameters, such as lowering the kl_threshold, changing the value_loss_scale, adjusting the entropy_loss_scale, modifying the learning_rate, and even testing different model size. However, the problem still occurs.

The Loss during the train:

The rewards max, mean and min:

PPO parameters:

agent:
  class: PPO
  rollouts: 24
  learning_epochs: 5
  mini_batches: 4
  discount_factor: 0.99
  lambda: 0.95
  learning_rate: 1.0e-03
  learning_rate_scheduler: KLAdaptiveLR
  learning_rate_scheduler_kwargs:
    kl_threshold: 0.01
  state_preprocessor: RunningStandardScaler
  state_preprocessor_kwargs: null
  value_preprocessor: RunningStandardScaler
  value_preprocessor_kwargs: null
  random_timesteps: 0
  learning_starts: 0
  grad_norm_clip: 1.0
  ratio_clip: 0.2
  value_clip: 0.2
  clip_predicted_values: True
  entropy_loss_scale: 0.01
  value_loss_scale: 1.0
  kl_threshold: 0.0
  rewards_shaper_scale: 1.0
  time_limit_bootstrap: False

Model :

models:
  separate: False
  policy:  # see gaussian_model parameters
    class: GaussianMixin
    clip_actions: False
    clip_log_std: True
    min_log_std: -20.0
    max_log_std: 2.0
    initial_log_std: 0.0
    network:
      - name: net
        input: STATES
        layers: [128, 128, 64]
        activations: elu
    output: ACTIONS
  value:  # see deterministic_model parameters
    class: DeterministicMixin
    clip_actions: False
    network:
      - name: net
        input: STATES
        layers: [128, 128, 64]
        activations: elu
    output: ONE

Has anyone faced similar issues with peg-insertion or other high-precision tasks? Do you have any ideas on why the training might be collapsing, or suggestions for further adjustments/debugging steps?

Any insights would be greatly appreciated.

Thanks in advance!

RandomOakForest · 2025-08-25T13:45:08Z

RandomOakForest
Aug 25, 2025
Maintainer

Thank you for posting this. It is a great question for our Discussions section, I'll move the post there for follow up by the team and others. In the meantime, you may want to consider the following:

Dense rewards should guide the agent progressively. If there’s a sharp transition or sparse rewards dominate, PPO can rapidly lose useful behavior (“policy collapse”). Examine the reward curve—consider mixing dense (distance-based) and sparse (success-triggered) rewards.¹
Implement a reward shaping or curriculum—starting with easier insertion alignments and gradually increasing difficulty.²

1 reply

AntonioClaudiossf Aug 26, 2025
Author

Hello, thank you for your reply!

I had already tried some modifications to the reward by adding a reward composed of dense and sparse components when it was within a certain threshold. Unfortunately, the training continued to collapse. So, I decided to try training with a different number of environments.

Previously, I was using 256, and it was collapsing, so I tried 512, and now it is more stable. I believe that with 256, at some point, some bad behaviors of the network, although small, were enough to collapse the policy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Policy collapse #3267

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Question] Policy collapse #3267

Uh oh!

AntonioClaudiossf Aug 24, 2025

Question

Replies: 1 comment · 1 reply

Uh oh!

RandomOakForest Aug 25, 2025 Maintainer

Footnotes

Uh oh!

AntonioClaudiossf Aug 26, 2025 Author

AntonioClaudiossf
Aug 24, 2025

Replies: 1 comment 1 reply

RandomOakForest
Aug 25, 2025
Maintainer

AntonioClaudiossf Aug 26, 2025
Author