Skip to content

8 x 4090 run MAGI-1-24B-distill+fp8_quant out of memory #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
PMPBinZhang opened this issue May 29, 2025 · 6 comments
Open

8 x 4090 run MAGI-1-24B-distill+fp8_quant out of memory #83

PMPBinZhang opened this issue May 29, 2025 · 6 comments
Assignees

Comments

@PMPBinZhang
Copy link

Thanks for your work, but when i try to use RTX 4090 × 8 to run MAGI-1-24B-distill+fp8_quant, every gpu got error like this "torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 7 has a total capacity of 23.64 GiB of which 71.62 MiB is free. Including non-PyTorch memory, this process has 23.57 GiB memory in use. Of the allocated memory 22.97 GiB is allocated by PyTorch, and 15.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)"
waiting for your reply

@PMPBinZhang
Copy link
Author

I change 24B_distill_quant_config.json as follows, reduce the number of video_size_h, video_size_w and num_frames,
"clean_chunk_kvrange": 1,
"clean_t": 0.9999,
"seed": 1234,
"num_frames": 64,
"video_size_h": 480,
"video_size_w": 720,
"num_steps": 16,
"window_size": 4,
"fps": 24,
"chunk_width": 6,
"load": "./downloads/24B_distill_quant",
"t5_pretrained": "./downloads/t5_pretrained",
"t5_device": "cuda",
"vae_pretrained": "./downloads/vae",
"scale_factor": 0.18215,
"temporal_downsample_factor": 4
},
I got the error , partically as follows:
Traceback (most recent call last):
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/entry.py", line 54, in
[rank4]: main()
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/entry.py", line 45, in main
[rank4]: pipeline.run_image_to_video(prompt=args.prompt, image_path=args.image_path, output_path=args.output_path)
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/pipeline.py", line 39, in run_image_to_video
[rank4]: self._run(prompt, prefix_video, output_path)
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/pipeline.py", line 47, in _run
[rank4]: dit = get_dit(self.config)
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/model/dit/dit_model.py", line 654, in get_dit
[rank4]: model = load_checkpoint(model)
[rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/infra/checkpoint/checkpointing.py", line 163, in load_checkpoint
[rank4]: missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False, assign=True)
[rank4]: File "/home/fusion/anaconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict
[rank4]: raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
[rank4]: RuntimeError: Error(s) in loading state_dict for VideoDiTModel:
[rank4]: size mismatch for t_embedder.mlp.0.weight: copying a param with shape torch.Size([1536, 256]) from checkpoint, the shape in current model is torch.Size([7680, 256]).
[rank4]: size mismatch for t_embedder.mlp.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]).
[rank4]: size mismatch for t_embedder.mlp.2.weight: copying a param with shape torch.Size([1536, 1536]) from checkpoint, the shape in current model is torch.Size([7680, 7680]).
[rank4]: size mismatch for t_embedder.mlp.2.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]).
[rank4]: size mismatch for y_embedder.y_proj_adaln.0.weight: copying a param with shape torch.Size([1536, 4096]) from checkpoint, the shape in current model is torch.Size([7680, 4096]).
[rank4]: size mismatch for y_embedder.y_proj_adaln.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]).
[rank4]: size mismatch for videodit_blocks.layers.0.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.1.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.2.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.3.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.4.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.5.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]).
[rank4]: size mismatch for videodit_blocks.layers.6.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current mo

@walt008
Copy link

walt008 commented May 29, 2025

Me too

@levi131
Copy link
Collaborator

levi131 commented May 29, 2025

Thank you for your attention to our work. The default config is for 8 H100 cards. On 8 4090 cards, please modify the following configurations:
Image

Thanks for your work, but when i try to use RTX 4090 × 8 to run MAGI-1-24B-distill+fp8_quant, every gpu got error like this "torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 192.00 MiB. GPU 7 has a total capacity of 23.64 GiB of which 71.62 MiB is free. Including non-PyTorch memory, this process has 23.57 GiB memory in use. Of the allocated memory 22.97 GiB is allocated by PyTorch, and 15.18 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)" waiting for your reply

@levi131
Copy link
Collaborator

levi131 commented May 29, 2025

I change 24B_distill_quant_config.json as follows, reduce the number of video_size_h, video_size_w and num_frames, "clean_chunk_kvrange": 1, "clean_t": 0.9999, "seed": 1234, "num_frames": 64, "video_size_h": 480, "video_size_w": 720, "num_steps": 16, "window_size": 4, "fps": 24, "chunk_width": 6, "load": "./downloads/24B_distill_quant", "t5_pretrained": "./downloads/t5_pretrained", "t5_device": "cuda", "vae_pretrained": "./downloads/vae", "scale_factor": 0.18215, "temporal_downsample_factor": 4 }, I got the error , partically as follows: Traceback (most recent call last): [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/entry.py", line 54, in [rank4]: main() [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/entry.py", line 45, in main [rank4]: pipeline.run_image_to_video(prompt=args.prompt, image_path=args.image_path, output_path=args.output_path) [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/pipeline.py", line 39, in run_image_to_video [rank4]: self._run(prompt, prefix_video, output_path) [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/pipeline/pipeline.py", line 47, in _run [rank4]: dit = get_dit(self.config) [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/model/dit/dit_model.py", line 654, in get_dit [rank4]: model = load_checkpoint(model) [rank4]: File "/home/fusion/work/gen_video/MAGI-1/inference/infra/checkpoint/checkpointing.py", line 163, in load_checkpoint [rank4]: missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False, assign=True) [rank4]: File "/home/fusion/anaconda3/envs/magi/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2215, in load_state_dict [rank4]: raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( [rank4]: RuntimeError: Error(s) in loading state_dict for VideoDiTModel: [rank4]: size mismatch for t_embedder.mlp.0.weight: copying a param with shape torch.Size([1536, 256]) from checkpoint, the shape in current model is torch.Size([7680, 256]). [rank4]: size mismatch for t_embedder.mlp.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]). [rank4]: size mismatch for t_embedder.mlp.2.weight: copying a param with shape torch.Size([1536, 1536]) from checkpoint, the shape in current model is torch.Size([7680, 7680]). [rank4]: size mismatch for t_embedder.mlp.2.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]). [rank4]: size mismatch for y_embedder.y_proj_adaln.0.weight: copying a param with shape torch.Size([1536, 4096]) from checkpoint, the shape in current model is torch.Size([7680, 4096]). [rank4]: size mismatch for y_embedder.y_proj_adaln.0.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([7680]). [rank4]: size mismatch for videodit_blocks.layers.0.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.1.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.2.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.3.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.4.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.5.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current model is torch.Size([12288, 7680]). [rank4]: size mismatch for videodit_blocks.layers.6.ada_modulate_layer.proj.0.weight: copying a param with shape torch.Size([12288, 1536]) from checkpoint, the shape in current mo

This log shows that a shape mismatch error occurred when loading the model weights. The number 1536 from the checkpoint is correct, and the number 7680 from the model is wrong. I guess you modified the model_config incorrectly or some code caused this error.

@Issues-maker
Copy link

Wondering if I can run it on 4xRTX3090, can add more 2 GPUs too, so 6xRTX3090. Or strictly 8 needed and minimum 4090?

@levi131
Copy link
Collaborator

levi131 commented May 29, 2025

Wondering if I can run it on 4xRTX3090, can add more 2 GPUs too, so 6xRTX3090. Or strictly 8 needed and minimum 4090?

Memory is the major limiting factor. 3090, like 4090, only has 24GB of memory per card, so it also requires at least pp_size=2 to run the 24B model. The number of cards is not strictly limited to 8, but if you use 6 cards(cp_size=3,pp_size=2), it may be necessary to reduce the size of some input images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants