Reduce_local Segmentation fault when Running with IMB-MPI1 built for GPU

## Background information


### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
Version 5.0.3 with CUDA enhancements.


### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Open MPI was installed from a source distribution tarball, customized with CUDA support for GPU capabilities.



### If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.
NA


### Please describe the system on which you are running

Operating System/Version:
Operating System: Red Hat Enterprise Linux release 8.6 (Ootpa)

Computer Hardware:
Architecture: x86_64
CPU: AMD EPYC 7252 8-Core Processor, 16 CPUs online, with each core operating at a frequency of 3048.274 MHz.
Memory: 127863 MB total, with 115327 MB free.

Ethernet (eth0):
Speed: 1000Mb/s

-----------------------------

## Details of the problem

I am encountering a segmentation fault when running the Reduce_local operation within the IMB-MPI1-GPU benchmark, specifically when using OpenMPI version 5.0.3 with CUDA. The fault occurs regardless of whether GDRcopy is enabled or not, and both in single and multi-GPU configurations.

Steps to Reproduce
```shell
/home/eelkozah/openmpi-5.0.3-withcuda-0/bin/mpiexec -np 1 --mca btl_ofi_disable_sep 1 --mca mtl_ofi_enable_sep 0 --mca osc ^ucx --mca pml ^ucx --mca mtl ofi --mca btl vader,self -x LD_LIBRARY_PATH=/home/eelkozah/code/libfabric-internal/buildout/lib:/usr/local/cuda:${LD_LIBRARY_PATH} -x MPIR_CVAR_CH4_OFI_ENABLE_AV_TABLE=0 -x MPIR_CVAR_CH4_OFI_ENABLE_MR_SCALABLE=1 -x MPIR_CVAR_CH4_OFI_ENABLE_ATOMICS=1 -x MPIR_CVAR_CH4_OFI_ENABLE_RMA=1 -x MPIR_CVAR_ENABLE_GPU=1 -x MPIR_CVAR_CH4_OFI_ENABLE_HMEM=1 -x FI_OPX_EXPECTED_RECEIVE_ENABLE=0 -x FI_PROVIDER=opx -x FI_OPX_UUID=${RANDOM} gdb --args ./IMB-MPI1-GPU Reduce_local
```
GDB Output
During the execution under GDB, the program crashes with the following backtrace pointing to an issue within the AVX-optimized operation for floating point addition:
```
(gdb) #0  0x00007ffff7471079 in ompi_op_avx_2buff_add_float_avx2 () from /home/eelkozah/openmpi-5.0.3-withcuda/lib/libmpi.so.40
#1  0x00007ffff735170c in mca_coll_base_reduce_local () from /home/eelkozah/openmpi-5.0.3-withcuda/lib/libmpi.so.40
#2  0x00007ffff7313da5 in PMPI_Reduce_local () from /home/eelkozah/openmpi-5.0.3-withcuda/lib/libmpi.so.40
#3  0x0000000000433ad2 in IMB_reduce_local (c_info=0x69bb90, size=4, ITERATIONS=0x69bca8, RUN_MODE=0x69bd34, time=0x7fffffffce70)
    at ../src_c/IMB_reduce_local.c:115
#4  0x000000000043bd5c in Bmark_descr::IMB_init_buffers_iter (this=0x69a4f0, c_info=0x69bb90, ITERATIONS=0x69bca8, Bmark=0x69bd18, BMODE=0x69bd34, iter=1,
    size=4) at helpers/helper_IMB_functions.h:608
#5  0x00000000004461fd in OriginalBenchmark<BenchmarkSuite<(benchmark_suite_t)0>, &IMB_reduce_local>::run (this=0x69bb60, item=...)
    at helpers/original_benchmark.h:191
#6  0x0000000000405c47 in main (argc=2, argv=0x7fffffffdaf8) at imb.cpp:329
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce_local Segmentation fault when Running with IMB-MPI1 built for GPU #12620

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.

Please describe the system on which you are running

Details of the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce_local Segmentation fault when Running with IMB-MPI1 built for GPU #12620

Description

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

Details of the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.