-
Notifications
You must be signed in to change notification settings - Fork 183
hwloc-calc --local-memory and -I numa produce different results #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I understood at least part of the issue. When you pass core:0 core:16, hwloc-calc accumulates them into a single locality (a cpuset) and then looks for the corresponding local NUMA nodes. The API to find local NUMA is given a flag saying that we want NUMA that have the exact locality, or smaller or larger. The problem here is no NUMA is larger or smaller than core:0 + core:16. They just intersects. So I need to rethink those flags https://www.open-mpi.org/projects/hwloc/doc/v2.12.0/a00168.php#gab9c963ca37255da71b00d94e1b106f9d |
If the input locality is a single core of multiple packages, no NUMA is smaller or larger, but they intersect. Refs open-mpi#716 Note that intersect imply smaller and larger too. Signed-off-by: Brice Goglin <[email protected]>
Thanks to Antoine Morvan for the report Refs open-mpi#716 Signed-off-by: Brice Goglin <[email protected]>
Can you try with both commits on top of https://github.com/bgoglin/hwloc/commits/issue716/ ? (I can't provide a tarball yet, our CI is down) |
If the input locality is a single core of multiple packages, no NUMA is smaller or larger, but they intersect. Refs open-mpi#716 Note that intersect imply smaller and larger too. Signed-off-by: Brice Goglin <[email protected]>
Thanks to Antoine Morvan for the report Refs open-mpi#716 Signed-off-by: Brice Goglin <[email protected]>
Tarball soon at https://ci.inria.fr/hwloc/view/all/job/bgoglin/611/ |
Hello, I just tested with the 2 patches and the tarball; both are solving the issue on the use case I have. Thanks. |
If the input locality is a single core of multiple packages, no NUMA is smaller or larger, but they intersect. Refs #716 Note that intersect imply smaller and larger too. Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit c8b380d)
Thanks to Antoine Morvan for the report Refs #716 Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit e429e8f)
If the input locality is a single core of multiple packages, no NUMA is smaller or larger, but they intersect. Refs #716 Note that intersect imply smaller and larger too. Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit c8b380d)
Thanks to Antoine Morvan for the report Refs #716 Signed-off-by: Brice Goglin <[email protected]> (cherry picked from commit e429e8f)
What version of hwloc are you using?
From 2.10.0 till latest (using --oo flag of hwloc-calc) ; built from source
Which operating system and hardware are you running on?
Tested on RHEL9 (kernel 5.14), on various systems (AMD Milan, Intel SPR, NVidia GraceHopper)
Details of the problem
We use hwloc-calc --local-memory to find the nearest memories of a set of compute resources, to bind allocations there. Typically :
The above works fine. But when we go to more complex situations, we observe behavior we cannot explain.
Empty results
The first issue is when querying the local memory of severals cores :
In this example, was ask for the local memory of core 0 from numa 0 and core 16 from numa 1 : we expect
hwloc-calc --oo --local-memory core:0 core:16
to return numa 0 and 1, as the last command does. But it returns an empty string.Missing results
The second issue is when querying the local memory of a range of cores :
The command
hwloc-calc --oo --local-memory core:0-16
asks for a range covering the whole numa:0, and the first core of numa:1 ; as-I numa
outputs, we expect numa 0 and 1 to be part of the result, but only 0 appears.Note, when we give a range covering all the cores of the 2nd numa, the output is correct :
But until it is fully covered by the rank, the numa does not show up :
When we do not add any flag or best-memattr, we expect
--local-memory
to behave like-I numa
.Additional information
N/A
The text was updated successfully, but these errors were encountered: