Skip to content

[Darwin] MTRPerControllerStorageTests testSubscriptionPool randomly fails #38797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bzbarsky-apple opened this issue May 7, 2025 · 1 comment · Fixed by #38803
Closed

[Darwin] MTRPerControllerStorageTests testSubscriptionPool randomly fails #38797

bzbarsky-apple opened this issue May 7, 2025 · 1 comment · Fixed by #38803

Comments

@bzbarsky-apple
Copy link
Contributor

Reproduction steps

See https://github.com/project-chip/connectedhomeip/actions/runs/14884913098/job/41803825878?pr=38790

2025-05-07T14:42:14.0843250Z /Users/runner/work/connectedhomeip/connectedhomeip/src/darwin/Framework/CHIPTests/MTRPerControllerStorageTests.m:2874: error: -[MTRPerControllerStorageTests testSubscriptionPoolManyDevices] : ((subscriptionRunningCount) less than or equal to (subscriptionPoolSize)) failed: ("3") is greater than ("2")

Platform

darwin

Platform Version(s)

No response

Type

Unit tested

(Optional) If manually tested please explain why this is only manually tested

No response

Anything else?

No response

@bzbarsky-apple
Copy link
Contributor Author

NOTE: this failure is in a build that includes #38779. Investigating that part.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue May 7, 2025
Two fixes here:

1. We were not consistently dispatching the unitTestSubscriptionPoolWorkComplete
   notification, because when _clearSubscriptionPoolWork was called from dealloc
   we had already cleared our delegate list before the call.  This was the
   proximate cause of the random failure: when we got deallocated before the
   work item for our subscription pool work got dequeued, we would end up with
   subscriptionRunningCount too high (since it would increment on enqueue but we
   would miss the decrement.

2. The fake device is not consistent about whether it actually dispatches the
   unitTestSubscriptionPoolDequeue notification: usually it does not (because it
   was dealloced before that happened), but sometimes it does.  When it does, we
   were ending up with more dequeue notifications than expected in the test. The
   fix for this is to not track the dequeue/complete bits in the test for the
   fake device at all.

Fixes project-chip#38797
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue May 8, 2025
Two fixes here:

1. We were not consistently dispatching the unitTestSubscriptionPoolWorkComplete
   notification, because when _clearSubscriptionPoolWork was called from dealloc
   we had already cleared our delegate list before the call.  This was the
   proximate cause of the random failure: when we got deallocated before the
   work item for our subscription pool work got dequeued, we would end up with
   subscriptionRunningCount too high (since it would increment on enqueue but we
   would miss the decrement.

2. The fake device is not consistent about whether it actually dispatches the
   unitTestSubscriptionPoolDequeue notification: usually it does not (because it
   was dealloced before that happened), but sometimes it does.  When it does, we
   were ending up with more dequeue notifications than expected in the test. The
   fix for this is to not track the dequeue/complete bits in the test for the
   fake device at all.

Fixes project-chip#38797
andy31415 pushed a commit that referenced this issue May 8, 2025
…ol (#38803)

Two fixes here:

1. We were not consistently dispatching the unitTestSubscriptionPoolWorkComplete
   notification, because when _clearSubscriptionPoolWork was called from dealloc
   we had already cleared our delegate list before the call.  This was the
   proximate cause of the random failure: when we got deallocated before the
   work item for our subscription pool work got dequeued, we would end up with
   subscriptionRunningCount too high (since it would increment on enqueue but we
   would miss the decrement.

2. The fake device is not consistent about whether it actually dispatches the
   unitTestSubscriptionPoolDequeue notification: usually it does not (because it
   was dealloced before that happened), but sometimes it does.  When it does, we
   were ending up with more dequeue notifications than expected in the test. The
   fix for this is to not track the dequeue/complete bits in the test for the
   fake device at all.

Fixes #38797
@github-project-automation github-project-automation bot moved this from Todo to Done in [Platform] Darwin May 8, 2025
gmarcosb pushed a commit to gmarcosb/connectedhomeip that referenced this issue May 9, 2025
…ol (project-chip#38803)

Two fixes here:

1. We were not consistently dispatching the unitTestSubscriptionPoolWorkComplete
   notification, because when _clearSubscriptionPoolWork was called from dealloc
   we had already cleared our delegate list before the call.  This was the
   proximate cause of the random failure: when we got deallocated before the
   work item for our subscription pool work got dequeued, we would end up with
   subscriptionRunningCount too high (since it would increment on enqueue but we
   would miss the decrement.

2. The fake device is not consistent about whether it actually dispatches the
   unitTestSubscriptionPoolDequeue notification: usually it does not (because it
   was dealloced before that happened), but sometimes it does.  When it does, we
   were ending up with more dequeue notifications than expected in the test. The
   fix for this is to not track the dequeue/complete bits in the test for the
   fake device at all.

Fixes project-chip#38797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
1 participant