while inserting Group0 (cpuset 0xffffffff,,0xffffffff) at Package (P#0 cpuset 0xffffffff,0xffffffff) #712

ChaoHsin-fang · 2025-04-14T02:15:47Z

What version of hwloc are you using?
2.7.0

Which operating system and hardware are you running on?
ubuntu22.04. kvm Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz ( all pcie device pass through KVM）

Details of the problem

lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-31,64-95
NUMA node1 CPU(s): 32-63,96-127

numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 0 size: 352671 MB
node 0 free: 350266 MB
node 1 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 1 size: 352777 MB
node 1 free: 350764 MB
node distances:
node 0 1
0: 10 20
1: 20 10

bgoglin · 2025-04-14T05:22:00Z

Hello
Looks like the NUMA information is invalid in this KVM. Assuming this happens with "lstopo --no-io" as well, please run "hwloc-gather-topology foo" and send us the generated foo.tar.bz2 so that we may debug remotely by looking at what's buggy in your /sys files.
If lstopo --no-io works fine but lstopo (without options) shows the warning, you'll need to pass "--io" to hwloc-gather-topology which will make the script slower and the tarball bigger.

bgoglin · 2025-04-14T05:30:28Z

Also, if you created the VM by specifying some core and NUMA topology, the bug might be there, it would be useful to see what you specified there.

ChaoHsin-fang · 2025-04-14T07:54:14Z

Hello Looks like the NUMA information is invalid in this KVM. Assuming this happens with "lstopo --no-io" as well, please run "hwloc-gather-topology foo" and send us the generated foo.tar.bz2 so that we may debug remotely by looking at what's buggy in your /sys files. If lstopo --no-io works fine but lstopo (without options) shows the warning, you'll need to pass "--io" to hwloc-gather-topology which will make the script slower and the tarball bigger.

lstopo.log

foo.tar.gz

ChaoHsin-fang · 2025-04-14T08:00:50Z

Also, if you created the VM by specifying some core and NUMA topology, the bug might be there, it would be useful to see what you specified there.

I’ve tried configuring NUMA in KVM XML, but it’s not working as expected. Any clues?

<cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='2' cores='32' threads='2'/> 
    <numa>
      <cell id='0' cpus='0-31,64-95' memory='350' unit='GiB'/> <!-- NUMA0 350GB -->
      <cell id='1' cpus='32-63,96-127' memory='350' unit='GiB'/> <!-- NUMA1 350GB -->
    </numa>
  </cpu>
  <numatune>
    <memory nodeset='0,1' mode='strict'/>
    <memnode cellid='0' nodeset='0'/>
    <memnode cellid='1' nodeset='1'/>
  </numatune>
  <cputune>
  <!-- NUMA 0 -->
  <!-- vCPU 0-31 → host pCPU 0-31  -->
  <vcpupin vcpu='0' cpuset='0'/>
  <vcpupin vcpu='1' cpuset='1'/>
  <vcpupin vcpu='2' cpuset='2'/>
  <vcpupin vcpu='3' cpuset='3'/>
  <vcpupin vcpu='4' cpuset='4'/>
  <vcpupin vcpu='5' cpuset='5'/>
  <vcpupin vcpu='6' cpuset='6'/>
  <vcpupin vcpu='7' cpuset='7'/>
  <vcpupin vcpu='8' cpuset='8'/>
  <vcpupin vcpu='9' cpuset='9'/>
  <vcpupin vcpu='10' cpuset='10'/>
  <vcpupin vcpu='11' cpuset='11'/>
  <vcpupin vcpu='12' cpuset='12'/>
  <vcpupin vcpu='13' cpuset='13'/>
  <vcpupin vcpu='14' cpuset='14'/>
  <vcpupin vcpu='15' cpuset='15'/>
  <vcpupin vcpu='16' cpuset='16'/>
  <vcpupin vcpu='17' cpuset='17'/>
  <vcpupin vcpu='18' cpuset='18'/>
  <vcpupin vcpu='19' cpuset='19'/>
  <vcpupin vcpu='20' cpuset='20'/>
  <vcpupin vcpu='21' cpuset='21'/>
  <vcpupin vcpu='22' cpuset='22'/>
  <vcpupin vcpu='23' cpuset='23'/>
  <vcpupin vcpu='24' cpuset='24'/>
  <vcpupin vcpu='25' cpuset='25'/>
  <vcpupin vcpu='26' cpuset='26'/>
  <vcpupin vcpu='27' cpuset='27'/>
  <vcpupin vcpu='28' cpuset='28'/>
  <vcpupin vcpu='29' cpuset='29'/>
  <vcpupin vcpu='30' cpuset='30'/>
  <vcpupin vcpu='31' cpuset='31'/>

  <!-- vCPU 32-63 → host pCPU 64-95  -->
  <vcpupin vcpu='32' cpuset='64'/>
  <vcpupin vcpu='33' cpuset='65'/>
  <vcpupin vcpu='34' cpuset='66'/>
  <vcpupin vcpu='35' cpuset='67'/>
  <vcpupin vcpu='36' cpuset='68'/>
  <vcpupin vcpu='37' cpuset='69'/>
  <vcpupin vcpu='38' cpuset='70'/>
  <vcpupin vcpu='39' cpuset='71'/>
  <vcpupin vcpu='40' cpuset='72'/>
  <vcpupin vcpu='41' cpuset='73'/>
  <vcpupin vcpu='42' cpuset='74'/>
  <vcpupin vcpu='43' cpuset='75'/>
  <vcpupin vcpu='44' cpuset='76'/>
  <vcpupin vcpu='45' cpuset='77'/>
  <vcpupin vcpu='46' cpuset='78'/>
  <vcpupin vcpu='47' cpuset='79'/>
  <vcpupin vcpu='48' cpuset='80'/>
  <vcpupin vcpu='49' cpuset='81'/>
  <vcpupin vcpu='50' cpuset='82'/>
  <vcpupin vcpu='51' cpuset='83'/>
  <vcpupin vcpu='52' cpuset='84'/>
  <vcpupin vcpu='53' cpuset='85'/>
  <vcpupin vcpu='54' cpuset='86'/>
  <vcpupin vcpu='55' cpuset='87'/>
  <vcpupin vcpu='56' cpuset='88'/>
  <vcpupin vcpu='57' cpuset='89'/>
  <vcpupin vcpu='58' cpuset='90'/>
  <vcpupin vcpu='59' cpuset='91'/>
  <vcpupin vcpu='60' cpuset='92'/>
  <vcpupin vcpu='61' cpuset='93'/>
  <vcpupin vcpu='62' cpuset='94'/>
  <vcpupin vcpu='63' cpuset='95'/>

  <!-- NUMA 1 -->
  <!-- vCPU 64-95 → host pCPU 32-63 -->
  <vcpupin vcpu='64' cpuset='32'/>
  <vcpupin vcpu='65' cpuset='33'/>
  <vcpupin vcpu='66' cpuset='34'/>
  <vcpupin vcpu='67' cpuset='35'/>
  <vcpupin vcpu='68' cpuset='36'/>
  <vcpupin vcpu='69' cpuset='37'/>
  <vcpupin vcpu='70' cpuset='38'/>
  <vcpupin vcpu='71' cpuset='39'/>
  <vcpupin vcpu='72' cpuset='40'/>
  <vcpupin vcpu='73' cpuset='41'/>
  <vcpupin vcpu='74' cpuset='42'/>
  <vcpupin vcpu='75' cpuset='43'/>
  <vcpupin vcpu='76' cpuset='44'/>
  <vcpupin vcpu='77' cpuset='45'/>
  <vcpupin vcpu='78' cpuset='46'/>
  <vcpupin vcpu='79' cpuset='47'/>
  <vcpupin vcpu='80' cpuset='48'/>
  <vcpupin vcpu='81' cpuset='49'/>
  <vcpupin vcpu='82' cpuset='50'/>
  <vcpupin vcpu='83' cpuset='51'/>
  <vcpupin vcpu='84' cpuset='52'/>
  <vcpupin vcpu='85' cpuset='53'/>
  <vcpupin vcpu='86' cpuset='54'/>
  <vcpupin vcpu='87' cpuset='55'/>
  <vcpupin vcpu='88' cpuset='56'/>
  <vcpupin vcpu='89' cpuset='57'/>
  <vcpupin vcpu='90' cpuset='58'/>
  <vcpupin vcpu='91' cpuset='59'/>
  <vcpupin vcpu='92' cpuset='60'/>
  <vcpupin vcpu='93' cpuset='61'/>
  <vcpupin vcpu='94' cpuset='62'/>
  <vcpupin vcpu='95' cpuset='63'/>

  <!-- vCPU 96-127 → host pCPU 96-127  -->
  <vcpupin vcpu='96' cpuset='96'/>
  <vcpupin vcpu='97' cpuset='97'/>
  <vcpupin vcpu='98' cpuset='98'/>
  <vcpupin vcpu='99' cpuset='99'/>
  <vcpupin vcpu='100' cpuset='100'/>
  <vcpupin vcpu='101' cpuset='101'/>
  <vcpupin vcpu='102' cpuset='102'/>
  <vcpupin vcpu='103' cpuset='103'/>
  <vcpupin vcpu='104' cpuset='104'/>
  <vcpupin vcpu='105' cpuset='105'/>
  <vcpupin vcpu='106' cpuset='106'/>
  <vcpupin vcpu='107' cpuset='107'/>
  <vcpupin vcpu='108' cpuset='108'/>
  <vcpupin vcpu='109' cpuset='109'/>
  <vcpupin vcpu='110' cpuset='110'/>
  <vcpupin vcpu='111' cpuset='111'/>
  <vcpupin vcpu='112' cpuset='112'/>
  <vcpupin vcpu='113' cpuset='113'/>
  <vcpupin vcpu='114' cpuset='114'/>
  <vcpupin vcpu='115' cpuset='115'/>
  <vcpupin vcpu='116' cpuset='116'/>
  <vcpupin vcpu='117' cpuset='117'/>
  <vcpupin vcpu='118' cpuset='118'/>
  <vcpupin vcpu='119' cpuset='119'/>
  <vcpupin vcpu='120' cpuset='120'/>
  <vcpupin vcpu='121' cpuset='121'/>
  <vcpupin vcpu='122' cpuset='122'/>
  <vcpupin vcpu='123' cpuset='123'/>
  <vcpupin vcpu='124' cpuset='124'/>
  <vcpupin vcpu='125' cpuset='125'/>
  <vcpupin vcpu='126' cpuset='126'/>
  <vcpupin vcpu='127' cpuset='127'/>
    </cputune>

ChaoHsin-fang · 2025-04-14T08:13:17Z

Thanks for your reply.
The issue seems to be with the NUMA topology in KVM XML file.Tried this setup as well,

<cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='2' cores='32' threads='2'/> 
    <numa>
      <cell id='0' cpus='0-31,64-95' memory='350' unit='GiB'/> <!-- NUMA0 350GB -->
      <cell id='1' cpus='32-63,96-127' memory='350' unit='GiB'/> <!-- NUMA1 350GB -->
    </numa>
  </cpu>
  <numatune>
    <memory nodeset='0,1' mode='strict'/>
    <memnode cellid='0' nodeset='0'/>
    <memnode cellid='1' nodeset='1'/>
  </numatune>

but it's still not working.

bgoglin · 2025-04-14T08:13:45Z

From the /sys point of view, there's a clear bug in the topology:

each package has 32 cores hyperthreaded
however each NUMA rather has 64 single-threaded cores

You just need to fix the cpu numbers in NUMA config, replace this

<cell id='0' cpus='0-31,64-95' memory='350' unit='GiB'/>
<cell id='1' cpus='32-63,96-127' memory='350' unit='GiB'/>

with

<cell id='0' cpus='0-63' memory='350' unit='GiB'/>
<cell id='1' cpus='64-127' memory='350' unit='GiB'/>

ChaoHsin-fang · 2025-04-14T09:58:01Z

I replaced the NUMA configuration as suggested, but it's still not taking effect.

  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='2' cores='32' threads='2'/>
    <numa>
      <cell id='0' cpus='0-63' memory='350' unit='GiB'/> 
      <cell id='1' cpus='64-127' memory='350' unit='GiB'/> 
    </numa>
  </cpu>
  <numatune>
    <memory nodeset='0,1' mode='strict'/>
    <memnode cellid='0' nodeset='0'/>
    <memnode cellid='1' nodeset='1'/>
  </numatune>

(kvm) numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 352719 MB
node 0 free: 350173 MB
node 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 1 size: 352729 MB
node 1 free: 351282 MB
node distances:
node 0 1
0: 10 20
1: 20 10
(kvm) lscpu | grep -i numa
NUMA node(s): 2
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127

bgoglin · 2025-04-14T11:03:36Z

Which issue are we supposed to see in your outputs? The hwloc warning seems to be gone, which is what I expected.

ChaoHsin-fang · 2025-04-15T02:28:42Z

My task is to ensure the nvidia-smi topo output in KVM matches the bare-metal topology exactly.
The issue is that in the nvidia-smi topo -m output, the NUMA Affinity section inside the virtual machine looks different from a normal baremeta host — the NUMA node binding in KVM didn't take effect.

numactl -H output in kvm

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
node 0 size: 352719 MB
node 0 free: 350173 MB
node 1 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
node 1 size: 352729 MB
node 1 free: 351282 MB
node distances:
node 0 1
0: 10 20
1: 20 10

lscpu | grep -i numa output in kvm

NUMA node(s): 2
NUMA node0 CPU(s): 0-63
NUMA node1 CPU(s): 64-127
normal topo （from baremetal host）numa 0 numa 1

but nvidia-smi topo -m output in kvm is

Could the topology problems shown in nvidia-smi topo -m be connected to hwloc's NUMA identification issues？

expected topo in kvm (0-63 ,64-127 is also acceptable）

bgoglin · 2025-04-15T05:26:22Z

Ah, I see. I am not a KVM expert, but I don't understand why you're talking about NUMA identification instead of PCI NUMA affinity here. Each PCI root complex reports a local NUMA node through ACPI tables, but your VM doesn't seem to specify any, hence the GPU is attached to the entire machine (all CPUs, and no specific NUMA node). I think you're just missing that in your KVM config.

hwloc just reads files such as this to read it:
/sys/bus/pci/devices//local_cpulist
/sys/bus/pci/devices//numa_node
As long as those files are different on baremetal and VM, there's no way CPU affinity and NUMA affinity columns will be the same in nvidia-smi.

A quick search reports similar issues such as kubevirt/kubevirt#13926 but I don't know this relates to PCI passthrough.

ChaoHsin-fang · 2025-04-15T06:08:59Z

Appreciate the help! I'll dig deeper into what's causing this problem.Thanks！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

while inserting Group0 (cpuset 0xffffffff,,0xffffffff) at Package (P#0 cpuset 0xffffffff,0xffffffff) #712

while inserting Group0 (cpuset 0xffffffff,,0xffffffff) at Package (P#0 cpuset 0xffffffff,0xffffffff) #712

ChaoHsin-fang commented Apr 14, 2025

bgoglin commented Apr 14, 2025

bgoglin commented Apr 14, 2025

ChaoHsin-fang commented Apr 14, 2025

ChaoHsin-fang commented Apr 14, 2025 •

edited

Loading

ChaoHsin-fang commented Apr 14, 2025

bgoglin commented Apr 14, 2025 •

edited

Loading

ChaoHsin-fang commented Apr 14, 2025 •

edited

Loading

bgoglin commented Apr 14, 2025

ChaoHsin-fang commented Apr 15, 2025

bgoglin commented Apr 15, 2025

ChaoHsin-fang commented Apr 15, 2025

while inserting Group0 (cpuset 0xffffffff,,0xffffffff) at Package (P#0 cpuset 0xffffffff,0xffffffff) #712

while inserting Group0 (cpuset 0xffffffff,,0xffffffff) at Package (P#0 cpuset 0xffffffff,0xffffffff) #712

Comments

ChaoHsin-fang commented Apr 14, 2025

bgoglin commented Apr 14, 2025

bgoglin commented Apr 14, 2025

ChaoHsin-fang commented Apr 14, 2025

ChaoHsin-fang commented Apr 14, 2025 • edited Loading

ChaoHsin-fang commented Apr 14, 2025

bgoglin commented Apr 14, 2025 • edited Loading

ChaoHsin-fang commented Apr 14, 2025 • edited Loading

bgoglin commented Apr 14, 2025

ChaoHsin-fang commented Apr 15, 2025

bgoglin commented Apr 15, 2025

ChaoHsin-fang commented Apr 15, 2025

ChaoHsin-fang commented Apr 14, 2025 •

edited

Loading

bgoglin commented Apr 14, 2025 •

edited

Loading

ChaoHsin-fang commented Apr 14, 2025 •

edited

Loading