-
Notifications
You must be signed in to change notification settings - Fork 118
nvshmem #599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @pbelevich - is this PR ready for review? |
Observation: EFA environment variables do not affect NVSHMEM(compiled with NCCL) performance: srun --mpi=pmix --cpu-bind=none --container-image ./nvshmem.sqsh --nodes=2 --ntasks-per-node=1 bash -c "/opt/nvshmem/bin/perftest/device/pt-to-pt/shmem_put_bw"
srun --mpi=pmix --cpu-bind=none --container-image ./nvshmem.sqsh --nodes=2 --ntasks-per-node=1 bash -c "FI_PROVIDER=efa FI_EFA_USE_DEVICE_RDMA=1 FI_EFA_FORK_SAFE=1 NCCL_BUFFSIZE=8388608 NCCL_P2P_NET_CHUNKSIZE=524288 NCCL_TUNER_PLUGIN=/opt/aws-ofi-nccl/install/lib/libnccl-ofi-tuner.so /opt/nvshmem/bin/perftest/device/pt-to-pt/shmem_put_bw"
|
@nghtm yes, please review |
I am tight on bandwidth to review this PR this week. Requesting @amanshanbhag to take a look |
Thanks @pbelevich ! I will also take a look at it on Friday (right now up to my ears for upcoming TFC summit sesssion). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM (thank you so much!) but we might want to add description for sbatch files (see #654 and feel free to merge it this branch before merge this to main if that makes sense).
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.