Skip to content

Compatibility issue with using a cloud provider and kubelet-csr-approver (any helm app) #12059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
allidoiswin10 opened this issue Mar 21, 2025 · 2 comments · Fixed by #12141
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@allidoiswin10
Copy link

Hi, (creating this issue on the back off another issue #11842 )

Kubespray Version - v2.24.0
Python - 3.9.0
Ansible core - 2.15.13

Key vars:

## all.yml
cloud_provider: "external"
external_cloud_provider: "vsphere"

## hardening.yml
kubelet_rotate_server_certificates: true

I've run into a similar issue with kubepsray whilst using vSphere + Hardening.

I've done some digging and it seems to be a chicken and egg scenario whereby the noschedule taints are there because of kubelet's startup args. When you set cloud_provider: external , you're telling kubespray (and ultimately kubernetes) to configure the node kubelet.service to taint the node. Whilst it is generally ok to do this to the control planes, doing this on the nodes before the cloud control manager is deployed leaves the nodes in a tainted state. Hence any workloads you deploy, IE kubelet-csr-approver in this case will fail!

If you check the errors from the kubelet-csr-approver deployment, you'll most likely see something along the lines off:

Warning FailedScheduling pod/kubelet-csr-approver-6696cc5c47-flpmd 0/3 nodes are available: 3 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..

Note that the node roles occur way before the kubelet-csr-approver role. You can check the order here.

This has been further discussed here.

You can also see the cloud manager docs about this flag here.

Components that specify --cloud-provider=external will add a taint node.cloudprovider.kubernetes.io/uninitialized with an effect NoSchedule during initialization. This marks the node as needing a second initialization from an external controller before it can be scheduled work. Note that in the event that cloud controller manager is not available, new nodes in the cluster will be left unschedulable. The taint is important since the scheduler may require cloud specific information about nodes such as their region or type (high cpu, gpu, high memory, spot instance, etc).

As a workaround you can run the cluster.yaml twice, on the first run skip the tag for kubelet-csr-approver and on the second run, only run it for that tag.

Something like this might work:

ansible-playbook -i /inventory.ini  --become --become-user=root cluster.yml -e "@inventory/hardening.yaml" --ask-become --skip-tags=kubelet-csr-approver
ansible-playbook -i /inventory.ini  --become --become-user=root cluster.yml -e "@inventory/hardening.yaml" --ask-become --tags=kubelet-csr-approver

Wondering if anyone else has any other workarounds or a permanent solution.

@VannTen
Copy link
Contributor

VannTen commented Mar 21, 2025

/kind bug
/triage accepted

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Mar 21, 2025
@tico88612
Copy link
Member

If there is no particular reason, kubelet-csr-approver should be installed with other applications (e.g. MetalLB, etc.).

@VannTen wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants