Why my example runs successfully on 4 computational nodes but failed on 8 computational nodes when I would like to use the capability of GPU Aware-MPI? #609
Unanswered
Terence-iscas
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm running my pipe example which has nearly 650K hexahedral elements (PolynomialOrder=7, NekRS-23.0). Each of my computational node has 4 CPUs and 4 AMD GPUs. I have used 4 computational nodes performing this example where openmpi-4.1.5 with ucx works correctly, because when timing gs, pw+early+device, pw+device and pw+host were all appeared.
I would like to test the strong scalability of my example, so I used 8 computational nodes (same architecture) to run it. Unfortunately, it stopped at the first "timing gs: "
And the error log looks:
I have located the function:
The variable
oogs_mode_list
looks like having following 5 values:If I use openmpi with ucx, it seems like
gsMode=OOGS_AUTO
, then the program will test the bandwith of communication. On the other hand, I force the value ofgsMode
to be other four options. When I tryOOGS_DEVICEMPI
, the same error occurred again. So my question is whether it exists bugs here? Or my mpi/ucx parameters setting was wrong (see below)?Launch Command:
Besides ,I have read the similar topic #594 #578 #568 , it gave me some mind but didn't work, so I launch this topic.
Thank you in advance for your kindly help!Best Wishes!
Beta Was this translation helpful? Give feedback.
All reactions