https://github.com/aws-samples/awsome-distributed-training/tree/main/3.test_cases/pytorch/picotron/SmolLM-1.7B/ec2