-
Notifications
You must be signed in to change notification settings - Fork 417
prov/verbs: Missing FI_RECV flag in fi_cq_err_entry for RECV operations. #10847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dsciebu Is your program using FI_EP_RDM or FI_EP_MSG? |
There was a bug with rxm in 2.0 regarding improper initialization of the recv entry flags so if you're using 2.0 and verbs with RDM, then my guess is that it's the same issue. Can you retest with upstream? |
The correct link to the commit: https://github.com/ofiwg/libfabric/commits/9c667c918e09312ffef31b63a121e2ae9d730ded |
Oh weird, not sure how that commit id linked to the issue instead... I'll fix it. Thanks! |
Thank you for your answers - @aingerson it indeed seems to be the reason for my finding! |
BTW - there is another interesting observation - the patch you pointed, despite fixing the FI_RECV flag existence also should fix the lack of FI_TAGGED among flags. However, from my observation, this flag seems to show up in my cq entries, but not consistently. At this stage I cannot describe precisely when the following happens, but I noticed that 'sometimes' the cq entry flags does not show up, while in others they are there. Any hints? |
We should backport that fix to the v2.0.x branch but there is no immediate plan for v2.0.1 release. v2.1.0 is the next official release that contains all the bug fixes. |
Hmm, I have no idea what could be happening there but I can take a look. Could you share a reproducer if you have it? |
I cannot share the piece of code that reveals the problem + it is sporadic. Maybe I can figure out sth simpler on Monday... |
Describe the bug
I am experiencing an issue with fi_cq_readfrom in my libfabric-based program. When running fi_cq_readfrom on the completion queue bound to RECV operations, checking if the completion queue entry has FI_RECV bit on always fails (entry.comp.flags & FI_RECV). This issue started occurring after upgrading from libfabric v1.22.0 to v2.0.0.
**To Reproduce **
Steps to reproduce the behavior:
Bind a completion queue to RECV operations.
Run fi_cq_readfrom on the completion queue.
Check the completion queue entry flag masked with FI_RECV (i.e., entry.comp.flags & FI_RECV).
Observe that the masking operation fails in libfabric v2.0.0.
Expected behavior The masking operation with FI_RECV should succeed, as it did in libfabric v1.22.0, allowing the correct identification of RECV completions.
The text was updated successfully, but these errors were encountered: