WIP

The Distributed Model Training Bootcamp is designed from a real-world perspective on how to efficiently utilize GPUs in training models in a distributed manner. Attendees walk through the system topology to learn the dynamics of multi-GPU and multi-node connections and architecture. Using the PyTorch Framework, they will also learn and understand state-of-the-art strategies for training models that include distributed data parallelism (DDP), Fully Sharded Data Parallelism (FSDP), model parallelism, pipeline parallelism, and tensor parallelism. Furthermore, attendees will learn to profile code and analyze performance using NVIDIA® Nsight™ Systems. This tool helps identify optimization opportunities and improve the performance of applications running on a system consisting of multiple CPUs and GPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
workspace		workspace
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WIP

About

Releases

Packages

Languages

License

programmah/WIP-Distributed-Model-Training

Folders and files

Latest commit

History

Repository files navigation

WIP

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages