Skip to content

Omni如何进行分布式训练? #210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zlf0307 opened this issue Apr 8, 2025 · 0 comments
Open

Omni如何进行分布式训练? #210

zlf0307 opened this issue Apr 8, 2025 · 0 comments

Comments

@zlf0307
Copy link

zlf0307 commented Apr 8, 2025

我使用如下命令进行多卡分布式训练:
CUDA_VISIBLE_DEVICES=5,6 python -m torch.distributed.run
main.py
--data_root ./text_spotting_datasets/
--output_folder ./output/pretrain/stage1/
--train_dataset totaltext_train mlt_train ic13_train ic15_train syntext1_train syntext2_train
--lr 0.0005
--max_steps 400000
--warmup_steps 5000
--checkpoint_freq 10000
--batch_size 6
--tfm_pre_norm
--train_max_size 768
--rec_loss_weight 2
--use_fpn
--use_char_window_prompt
但是实际上只有5号卡在训练,6号卡没有显存占用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant