Skip to content

HDDS-13212. [DiskBalancer] Fix Inconsistent Health Check in DiskBalancer Status for Specific Hosts #8610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: HDDS-5713
Choose a base branch
from

Conversation

Gargi-jais11
Copy link
Contributor

@Gargi-jais11 Gargi-jais11 commented Jun 12, 2025

What changes were proposed in this pull request?

The DiskBalancerManager#getDiskBalancerStatus() method filters only healthy datanodes when querying all nodes, but returns all specified hosts regardless of their health. This inconsistency can lead to confusion or inaccurate monitoring.
This JIRA proposes applying the same IN_SERVICE and HEALTHY filters for specific hosts to ensure consistent behavior.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13212

How was this patch tested?

Tested manually on docker cluster.

bash-5.1$ ozone admin datanode diskbalancer status -d ozone-ha-datanode-2 -d ozone-ha-datanode-3 -d  ozone-ha-datanode-1
Status result:
Datanode                            Status          Threshold(%)    BandwidthInMB   Threads      SuccessMove  FailureMove  BytesMoved(MB)  EstBytesToMove(MB) EstTimeLeft(min)
ozone-ha-datanode-1.ozone-ha_default STOPPED         10.0000         10              5            0            0            0               0               0              

Note: Estimated time left is calculated based on the estimated bytes to move and the configured disk bandwidth.

related logs

2025-06-12 10:55:11 2025-06-12 05:25:11,553 [IPC Server handler 14 on default port 9860] WARN node.DiskBalancerManager: Datanode ozone-ha-datanode-2.ozone-ha_default is not in optimal state for disk balancing. NodeStatus: ENTERING_MAINTENANCE(expiry: 1749816487s)-HEALTHY
2025-06-12 10:55:11 2025-06-12 05:25:11,553 [IPC Server handler 14 on default port 9860] WARN node.DiskBalancerManager: Datanode ozone-ha-datanode-3.ozone-ha_default is not in optimal state for disk balancing. NodeStatus: DECOMMISSIONING(no expiry)-HEALTHY

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review June 12, 2025 06:23
@Gargi-jais11 Gargi-jais11 marked this pull request as draft June 12, 2025 06:25
Copy link

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

@Gargi-jais11
Copy link
Contributor Author

/gemini review

Copy link

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

@Gargi-jais11 Gargi-jais11 marked this pull request as ready for review June 12, 2025 07:17
@Gargi-jais11 Gargi-jais11 requested a review from peterxcli June 12, 2025 10:51
@Gargi-jais11 Gargi-jais11 requested a review from peterxcli June 12, 2025 11:25
Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for updating the patch and sorry for too many rounds of review.

@Gargi-jais11
Copy link
Contributor Author

Thanks @Gargi-jais11 for updating the patch and sorry for too many rounds of review.

No issue. It's completly alright.

@Gargi-jais11
Copy link
Contributor Author

@peterxcli Now you can review the patch with updated intergation test.

@Gargi-jais11 Gargi-jais11 requested a review from peterxcli June 13, 2025 08:12
Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterxcli peterxcli self-requested a review June 13, 2025 08:31
Copy link
Member

@peterxcli peterxcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Gargi-jais11 for updating the patch, other LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants