HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

Gargi-jais11 · 2025-06-12T06:00:03Z

What changes were proposed in this pull request?

The DiskBalancerManager#getDiskBalancerStatus() method filters only healthy datanodes when querying all nodes, but returns all specified hosts regardless of their health. This inconsistency can lead to confusion or inaccurate monitoring.
This JIRA proposes applying the same IN_SERVICE and HEALTHY filters for specific hosts to ensure consistent behavior.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13212

How was this patch tested?

Tested manually on docker cluster.

bash-5.1$ ozone admin datanode diskbalancer status -d ozone-ha-datanode-2 -d ozone-ha-datanode-3 -d  ozone-ha-datanode-1
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)
ozone-ha-datanode-1.ozone-ha_default STOPPED         10.0000         10              5            0            0            0               0               0              

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

related logs

2025-06-12 10:55:11 2025-06-12 05:25:11,553 [IPC Server handler 14 on default port 9860] WARN node.DiskBalancerManager: Datanode ozone-ha-datanode-2.ozone-ha_default is not in optimal state for disk balancing. NodeStatus: ENTERING_MAINTENANCE(expiry: 1749816487s)-HEALTHY
2025-06-12 10:55:11 2025-06-12 05:25:11,553 [IPC Server handler 14 on default port 9860] WARN node.DiskBalancerManager: Datanode ozone-ha-datanode-3.ozone-ha_default is not in optimal state for disk balancing. NodeStatus: DECOMMISSIONING(no expiry)-HEALTHY

github-actions · 2025-06-12T06:26:55Z

No such command. gemini Available commands:

/close : Close pending pull request temporary
/help : Show all the available comment commands
/label : add new label to the issue: /label <label>
/pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
/ready : Dismiss all the blocking reviews by github-actions bot
/retest : provide help on how to trigger new CI build

Gargi-jais11 · 2025-06-12T06:30:12Z

/gemini review

github-actions · 2025-06-12T06:30:25Z

No such command. gemini Available commands:

/close : Close pending pull request temporary
/help : Show all the available comment commands
/label : add new label to the issue: /label <label>
/pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
/ready : Dismiss all the blocking reviews by github-actions bot
/retest : provide help on how to trigger new CI build

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java

peterxcli

Thanks @Gargi-jais11 for updating the patch and sorry for too many rounds of review.

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java

Gargi-jais11 · 2025-06-12T15:02:50Z

Thanks @Gargi-jais11 for updating the patch and sorry for too many rounds of review.

No issue. It's completly alright.

Gargi-jais11 · 2025-06-13T08:11:27Z

@peterxcli Now you can review the patch with updated intergation test.

peterxcli

@Gargi-jais11 please fix the findbugs failure

...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDiskBalancer.java

peterxcli

Thanks @Gargi-jais11 for updating the patch, other LGTM!

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java

improved status to be consistent for specific host which is not healthy

af9bb09

Gargi-jais11 marked this pull request as ready for review June 12, 2025 06:23

Gargi-jais11 marked this pull request as draft June 12, 2025 06:25

Gargi-jais11 marked this pull request as ready for review June 12, 2025 07:17

peterxcli requested changes Jun 12, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

refactor code

3766109

Gargi-jais11 requested a review from peterxcli June 12, 2025 10:51

peterxcli reviewed Jun 12, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

Gargi-jais11 requested a review from peterxcli June 12, 2025 11:25

peterxcli reviewed Jun 12, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Outdated Show resolved Hide resolved

Gargi-jais11 requested a review from peterxcli June 13, 2025 08:12

peterxcli requested changes Jun 13, 2025

View reviewed changes

...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDiskBalancer.java Outdated Show resolved Hide resolved

...-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/scm/node/TestDiskBalancer.java Show resolved Hide resolved

peterxcli self-requested a review June 13, 2025 08:31

peterxcli approved these changes Jun 13, 2025

View reviewed changes

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DiskBalancerManager.java Show resolved Hide resolved

added integration test and nodestatus to log

620b7e0

Gargi-jais11 force-pushed the HDDS-13212 branch from 8f72b25 to 620b7e0 Compare June 13, 2025 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

Gargi-jais11 commented Jun 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Gargi-jais11 commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

peterxcli left a comment

Uh oh!

Uh oh!

Gargi-jais11 commented Jun 12, 2025

Uh oh!

Gargi-jais11 commented Jun 13, 2025

Uh oh!

peterxcli left a comment

Uh oh!

Uh oh!

Uh oh!

peterxcli left a comment

Uh oh!

Uh oh!

Uh oh!

HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

Are you sure you want to change the base?

HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

Conversation

Gargi-jais11 commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Gargi-jais11 commented Jun 12, 2025

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

peterxcli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Gargi-jais11 commented Jun 12, 2025

Uh oh!

Gargi-jais11 commented Jun 13, 2025

Uh oh!

peterxcli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

peterxcli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Gargi-jais11 commented Jun 12, 2025 •

edited

Loading