Skip to content

Add cluster size validation marker #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 13, 2025
Merged

Conversation

frederiko
Copy link
Contributor

@frederiko frederiko commented Mar 27, 2025

This PR adds:

  • minimum cluster size of 0 members
  • generate CRD
  • add defaults to config/samples manifest

@k8s-ci-robot
Copy link

Hi @frederiko. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@frederiko
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot
Copy link

@frederiko: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr
Copy link
Member

ahrtr commented Mar 27, 2025

Thanks for the contribution, but I think it makes more sense to align with K8s's behaviour: set the minimum value as 0 instead of 1.

$ kc apply -f sts.yaml 
The StatefulSet "web" is invalid: spec.replicas: Invalid value: -2: must be greater than or equal to 0

@jberkus
Copy link
Contributor

jberkus commented Mar 31, 2025

So, what happens right now if you apply a cluster of size 0 is that we have a cluster object with no pods. Which seems desireable; I can think of a number of circumstances where you might (temporarily) want a cluster with no members.

So, +1 on @ahrtr's proposal

@frederiko
Copy link
Contributor Author

@jberkus @ahrtr Not a problem at all. I just happened to have a different view from user's perspective. If I create a cluster with 0 members and I have no information about what happened, but have the resource persisted, I wouldn't like to have to tap into controller log for extra information. I couldn't also see a use case where an user may need a cluster with members. I will make the adjustments.

@ivanvc
Copy link
Member

ivanvc commented Apr 2, 2025

I wouldn't like to have to tap into controller log for extra information.

In my opinion, we should start adding events to the EtcdCluster so the users can understand the current state of the CR.

@frederiko
Copy link
Contributor Author

If that is not desirable, other mechanisms would entail to write an event or update the status. However, more logic would be required.

"If that is not desirable, other mechanisms would entail to write an event or update the status. However, more logic would be required.". I wouldn't oppose to that. :-)

@ahrtr
Copy link
Member

ahrtr commented Apr 10, 2025

"If that is not desirable, other mechanisms would entail to write an event or update the status. However, more logic would be required.".

We will need to update/make use of the status regardless. What's the behaviour of some existing controller when the replica is 0, i.e. StatefulSet?

@frederiko
Copy link
Contributor Author

"If that is not desirable, other mechanisms would entail to write an event or update the status. However, more logic would be required.".

We will need to update/make use of the status regardless. What's the behaviour of some existing controller when the replica is 0, i.e. StatefulSet?

tbh, I don't recall a controller that presents this behavior. I would have to check.

@ahrtr
Copy link
Member

ahrtr commented Apr 15, 2025

tbh, I don't recall a controller that presents this behavior. I would have to check.

I mean the Kubernetes built-in controller, i.e. StatefulSet. As mentioned above #116 (comment), the minimum replica is 0. Not sure whether the status will have some message or warning to indicate the 0 replica. We will need to do some experiments.

@frederiko frederiko changed the title Fail to apply manifest when EtcdCluster is zero Add cluster size validation marker Apr 15, 2025
@frederiko frederiko requested a review from ahrtr April 15, 2025 16:20
@ahrtr
Copy link
Member

ahrtr commented Apr 15, 2025

@frederiko this PR is basing on a very old commit, please rebase this PR. thx

@frederiko frederiko force-pushed the etcd-minimum-size branch 2 times, most recently from deed49b to e6f3650 Compare April 22, 2025 00:09
@frederiko
Copy link
Contributor Author

tbh, I don't recall a controller that presents this behavior. I would have to check.

I mean the Kubernetes built-in controller, i.e. StatefulSet. As mentioned above #116 (comment), the minimum replica is 0. Not sure whether the status will have some message or warning to indicate the 0 replica. We will need to do some experiments.

Yes, sts supports that. I meant more in the context of non-k8s controllers.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good with two minor comments, thx.

Also it would be better if we have an e2e test to verify the minimum size.

@ahrtr
Copy link
Member

ahrtr commented May 5, 2025

/ok-to-test

@ahrtr
Copy link
Member

ahrtr commented May 5, 2025

@frederiko have you manually verified this PR and confirmed a negative value (for the size) will raise an error? If yes, I think it's OK to merge it and add e2e test in followup PR; but it's also OK if you want to add the test in this PR. Please let me know your thought.

@frederiko frederiko force-pushed the etcd-minimum-size branch 2 times, most recently from 3e83dd2 to 987e9e3 Compare May 9, 2025 23:48
@frederiko frederiko force-pushed the etcd-minimum-size branch from 987e9e3 to 3b36917 Compare May 12, 2025 02:00
@frederiko
Copy link
Contributor Author

@frederiko have you manually verified this PR and confirmed a negative value (for the size) will raise an error? If yes, I think it's OK to merge it and add e2e test in followup PR; but it's also OK if you want to add the test in this PR. Please let me know your thought.

I have just added a basic test to ensure that, when an Etcdcluster is created with zero members, no statefulset is created. Let me know if you would like to something else.

func TestZeroMemberCluster(t *testing.T) {
feature := features.New("zero-member-cluster")
etcdClusterName := "etcd-cluster-zero"
size := 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of 0 should work, but it doesn't work for now due to a bug (although a minor one) #125.

Since we explicitly require the minimum size is 0, so I think we should have two cases (i.e. table-driven sub tests):

  • create a EtcdCluster with size of negative value (i.e. -1), it should fail.
  • create a EtcdCluster with size 0, it should succeed. But it doesn't work for now due to the bug mentioned above, so as we comment it out for now (and link to the issue  Operator should support scaling down to 0 ... maybe? #125).

Copy link
Contributor Author

@frederiko frederiko May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr yes, I am aware of the bug. What I tried to address in this PR was the 0 size behavior (aware of the bug). I am going to work on that bug later, on a separate PR, which should make this test to go away (remember my first commit was actually removing that block of code, but I was asked to revert?).

As for the -1 size, do we really need to test it, since the api server will not persist the etcdcluster resource altogether? I can certainly add if this is a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I was asked to revert?

Sorry, when I raised the comment #116 (comment), we didn't know the bug (#125) yet

I am going to work on that bug later, on a separate PR

Please feel free to deliver a PR for it. Thanks.

As for the -1 size, do we really need to test it, since the api server will not persist the etcdcluster resource altogether? I can certainly add if this is a concern.

The high level idea is that we create or apply an EtcdCluster with -1, we should get an error, which might not be returned by etcd-operator's Reconcile method. I think it'd better to have a test for each feature or restriction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I have added a test for negative cluster size as well. I have kept the 0 size cluster one, and I will address the bug later, if that's fine with you. ltmk your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thx

Signed-off-by: Frederiko Costa <[email protected]>
@frederiko frederiko force-pushed the etcd-minimum-size branch from 2e25fd8 to 4f267dd Compare May 12, 2025 16:34
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor comment

Thanks

@frederiko frederiko force-pushed the etcd-minimum-size branch from 4f267dd to 4b2835e Compare May 12, 2025 19:08
@ahrtr ahrtr requested review from ivanvc, hakman and jmhbnz May 12, 2025 19:17
Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for your work on this @frederiko

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, frederiko, jmhbnz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr merged commit 5d7e01c into etcd-io:main May 13, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants