Skip to content

Breakdown undesired allocations by shard routing role #132235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jul 31, 2025

Add a count of undesired shard allocations broken down by shard routing role.

This is so that we can check whether there are undesired shard allocations in a specific tier in serverless. We don't have an explicit concept of tier (search/indexing) in the code, but shard routing role is a good enough proxy for that.

Relates: ES-12221

@elasticsearchmachine elasticsearchmachine added v9.2.0 serverless-linked Added by automation, don't add manually labels Jul 31, 2025
@nicktindall nicktindall requested a review from pxsalehi July 31, 2025 07:31
@nicktindall nicktindall added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Jul 31, 2025
@nicktindall nicktindall removed the serverless-linked Added by automation, don't add manually label Aug 1, 2025
@nicktindall nicktindall changed the title Breakdown undesired allocations by tier Breakdown undesired allocations by shard routing role Aug 1, 2025
@nicktindall nicktindall marked this pull request as ready for review August 1, 2025 05:29
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Aug 1, 2025
long unassignedShards,
long totalAllocations,
long undesiredAllocationsExcludingShuttingDownNodes,
Map<ShardRouting.Role, Long> undesiredAllocationsExcludingShuttingDownNodesByRole
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to collect it all in a map of roles, then shouldn't it also replace totalAllocations and undesiredAllocationsExcludingShuttingDownNodes since those would be the values that the "default" role would have in stateful? but then you get into what to do for the values returned from desired balance API in serverless, and there you'd have to sum up index_only and search_only I guess, and sprinkle a bunch of asserts since index/search and default should be mutually exclusive in this map. Another option is to just add the specific break downs we need here? index/searchTierAllocations and index/searchTierUndesiredAllocations? I think you'd need both total and undesired since we're going to need the ratio in the autoscaler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also how do we get these two stats (index/TierAllocations and indexTierUndesiredAllocations?) out of the balancer in the autoscaler? Is there a getter for these?

Copy link
Contributor Author

@nicktindall nicktindall Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to just add the specific break downs we need here? index/searchTierAllocations and index/searchTierUndesiredAllocations?

As far as I can tell we have no concept of "tier" in the ES codebase. There is role, but we seem to stop short of defining that as equivalent to a tier. I don't quite understand why but I don't want to make assumptions about what role means in regards to tiers (see co.elastic.elasticsearch.stateless.allocation.StatelessAllocationDecider#canAllocateShardToNode, it's not even done there)

if we make two specific fields, we're baking in the assumption that

  • Role.INDEX_ONLY means indexing tier, Role.SEARCH_ONLY means search tier and Role.DEFAULT means the only tier in a stateful deployment
  • the existing set of roles are fixed

So we'd probably need to add assertions to trigger if these assumptions were no longer valid. I think having a map of role to counts (possibly Role -> record RoleStats(int total, int undesired)) at least leaves the interpretation of the roles to the context they're used in.

Maybe these are valid assumptions? Perhaps there is some history regarding the modelling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also how do we get these two stats (index/TierAllocations and indexTierUndesiredAllocations?) out of the balancer in the autoscaler? Is there a getter for these?

I added a getter and put up a serverless PR to illustrate that flow

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Aug 11, 2025
@nicktindall nicktindall requested a review from pxsalehi August 11, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Coordination Meta label for Distributed Coordination team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants