-
Notifications
You must be signed in to change notification settings - Fork 301
Bulkhead(Executor) does not always release permits #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would adding failsafe/core/src/main/java/dev/failsafe/internal/BulkheadImpl.java Lines 76 to 81 in 98bb496
|
Simply completing the future in that @Override
public boolean tryAcquirePermit(Duration maxWaitTime) throws InterruptedException {
CompletableFuture<Void> future = acquirePermitAsync();
if (future == NULL_FUTURE)
return true;
try {
future.get(maxWaitTime.toNanos(), TimeUnit.NANOSECONDS);
return true;
} catch (CancellationException | ExecutionException | TimeoutException e) {
releaseSpecificPermit(future); // Release the specific permit
return false;
}
}
private synchronized void releaseSpecificPermit(CompletableFuture<Void> future) {
future.complete(null);
if (permits < maxPermits) {
permits += 1;
}
} I am not familiar with this code though, so I am not sure if this could break something or introduce other bugs. |
This seems to me to be a major/critical issue? If there are more Or am I missing something? This could very well be the case because the library seems to be widely used. |
I believe the root cause is how the
FutureLinkedList
is used inBulkheadImpl
.When trying to acquire a permit from the
BulkheadImpl
with amaxWaitTime
, if there are no permits immediately available, a newFuture
will be added to theFutureLinkedList
. When that future is completed, theFuture
is removed from theFutureLinkedList
. If a permit is immediately available, a completed future is returned.The returned future is completed in
BulkheadImpl#releasePermit
, so unlessBulkheadImpl#releasePermit
is called, the future is never removed from theFutureLinkedList
. This causes a memory leak (The node is never removed from the list), and over time new permits cannot be issued.This is a big problem when using a bulkhead in a
FailsafeExecutor
. Permits are acquired with amaxWaitTime
inBulkheadExecutor#preExecute
. If you submit more work to the executor than there are permits, the permits rejected due toBulkheadFullException
are never released. When that happens, the bulkhead enters an irrecoverable, broken state. The permits are never released because in this case neitheronSuccess
oronFailure
is called inBulkheadExecutor
.I wrote a small example showcasing how this bug can be triggered.
Note that this bug can also be replicated using code found in the documentation, if a
maxWaitTime
is provided to thetryAcquirePermit
function. Internally,tryAcquirePermit
may create a future, but this future is never released (becausetryAcquirePermit
returnedfalse
).The text was updated successfully, but these errors were encountered: