Fix test_elastic_training_agent.py for torch version 2.4 and above. #1531

zhengchenyu · 2025-05-06T06:50:56Z

What changes were proposed in this pull request?

There are two changes:
(1) After pytorch/pytorch@67d3e4f, the parametes redirects and tee are removed. Although #1130 solved this problem, it seems that the unit test part has not been modified.
(2) The unit test will be stuck. #1279 make compatibility in torch-2.4. But after this, for the unit test, we have to wait for all workers to finish writing data to torchelastic/role_info/{i}. But the unit test does not mock this. so will stuck util timeout.
In fact, we don’t need to process the data as pytorch/pytorch@dc4c75b because we already have the rank list.

Why are the changes needed?

Fix test_elastic_training_agent.py for torch version 2.4 and above.

Does this PR introduce any user-facing change?

No

How was this patch tested?

test in pytorch-2.6, pytorch-2.3, pytorch-2.2

zhengchenyu added 2 commits May 6, 2025 14:13

Fix test_elastic_training_agent.py for torch version 2.4 and above.

3c9529b

update

8994d35

zhengchenyu requested review from workingloong, samplise, BalaBalaYi and majieyue as code owners May 6, 2025 06:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_elastic_training_agent.py for torch version 2.4 and above. #1531

Fix test_elastic_training_agent.py for torch version 2.4 and above. #1531

zhengchenyu commented May 6, 2025

Fix test_elastic_training_agent.py for torch version 2.4 and above. #1531

Are you sure you want to change the base?

Fix test_elastic_training_agent.py for torch version 2.4 and above. #1531

Conversation

zhengchenyu commented May 6, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?