Skip to content

Fix: kubeadm secondary use file discovery validation #12132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tico88612
Copy link
Member

@tico88612 tico88612 commented Apr 18, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

The validation step is moved to the end to avoid the loss of files that may lead to validation failure.

Which issue(s) this PR fixes:

Fixes #11835

Special notes for your reviewer:

I hope someone who has encountered this problem can help me review it.

Does this PR introduce a user-facing change?:

Fix: kubeadm secondary nodes use file discovery validation failed

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 18, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tico88612
Once this PR has been reviewed and has the lgtm label, please assign floryut for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 18, 2025
@tico88612
Copy link
Member Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Apr 18, 2025
@tico88612
Copy link
Member Author

/retest

@VannTen
Copy link
Contributor

VannTen commented May 12, 2025

@ledroide @schaenzer @huziahmetovsv any of you could test this fix to check if it fix #11835 ?

@huziahmetovsv
Copy link

huziahmetovsv commented May 14, 2025

@ledroide @schaenzer @huziahmetovsv any of you could test this fix to check if it fix #11835 ?

not working for me

TASK [kubernetes_sigs.kubespray.kubernetes/kubeadm : Create kubeadm client config] **************************************************
fatal: [k8s02-w03]: FAILED! => {"changed": false, "checksum": "bbdf0e8dd285f8aa7a683fdcba73b2f26ab27f3e", "exit_status": 3, "msg": "failed to validate", "stderr": "discovery.file.kubeConfigPath: Invalid value: \"/etc/kubernetes/cluster-info-discovery-kubeconfig.yaml\": not a valid HTTPS URL or a file on disk\nTo see the stack trace of this error execute with --v=5 or higher\n", "stderr_lines": ["discovery.file.kubeConfigPath: Invalid value: \"/etc/kubernetes/cluster-info-discovery-kubeconfig.yaml\": not a valid HTTPS URL or a file on disk", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "", "stdout_lines": []}

@tico88612
Copy link
Member Author

tico88612 commented May 14, 2025

@huziahmetovsv, can you provide your variables setting (the simpler the better) and OS? Or can you check /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml exists in k8s02-w03?

@huziahmetovsv
Copy link

huziahmetovsv commented May 14, 2025

@huziahmetovsv, can you provide your variables setting and OS? Or can you check /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml exists in k8s02-w03?

variables
kube_version: 1.31.4
override_system_hostname: false

kube_api_anonymous_auth: false
cluster_name: k8s02.example.com
ndots: 3
auto_renew_certificates: true
remove_anonymous_access: true
kube_proxy_metrics_bind_address: 0.0.0.0:10249
upgrade_node_confirm: true
upgrade_node_post_upgrade_confirm: true
os version
k8s02-w03:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble

no file /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml exists in k8s02-w03

folder content
s02-w03:~$ ls -la /etc/kubernetes/
total 76
drwxr-xr-x   4 kube root  4096 May 14 10:25 .
drwxr-xr-x 116 root root 12288 May 14 10:22 ..
-rw-r-----   1 root root   490 May 14 10:19 kubeadm-client.conf
-rw-r-----   1 root root   367 Nov 22 17:29 kubeadm-client.conf.1418308.2024-11-23@15:54:17~
-rw-r-----   1 root root   367 Jan  1 15:11 kubeadm-client.conf.3013700.2025-03-12@09:55:56~
-rw-r-----   1 root root   488 Mar 12 09:55 kubeadm-client.conf.3059076.2025-05-14@10:19:43~
-rw-r-----   1 root root   490 May 14 10:19 kubeadm-client.conf.3083772.2025-05-14@10:25:49~
-rw-r-----   1 root root   367 Nov  7  2024 kubeadm-client.conf.358360.2024-11-22@17:29:19~
-rw-r-----   1 root root   476 Nov  7  2024 kubeadm-client.conf.45193.2024-11-07@18:54:27~
-rw-r-----   1 root root   367 Nov 23 15:54 kubeadm-client.conf.63568.2025-01-01@15:11:17~
-rw-------   1 root root  1084 May 14 10:18 kubelet-config.yaml
-rw-------   1 root root  1958 Nov  7  2024 kubelet.conf
-rw-------   1 root root  1963 Nov  7  2024 kubelet.conf.14923.2024-11-07@18:20:14~
-rw-------   1 root root   466 Mar 12 09:55 kubelet.env
-rw-------   1 root root   540 Nov  7  2024 kubelet.env.3011672.2025-03-12@09:55:34~
drwxr-xr-x   2 kube root  4096 May 14 10:18 manifests
lrwxrwxrwx   1 root root    19 Nov  7  2024 pki -> /etc/kubernetes/ssl
drwxr-xr-x   2 root root  4096 Nov  7  2024 ssl
huz@k8s02-w03:~$

@tico88612
Copy link
Member Author

@huziahmetovsv Is this a cluster upgrade? I'm not quite sure how you're doing it?

@huziahmetovsv
Copy link

@huziahmetovsv Is this a cluster upgrade? I'm not quite sure how you're doing it?

as i mentioned earlier:

ansible-playbook -b -i inventory/infra/inventory.yaml playbooks/kubespray/kubespray_upgrade-cluster.yaml --limit "k8s02-w03"
cat playbooks/kubespray/kubespray_upgrade-cluster.yaml 
---

- name: Graceful upgrade kubernetes cluster
  ansible.builtin.import_playbook: kubernetes_sigs.kubespray.upgrade_cluster.yml
cat requirements.yaml
collections:
  - name: https://github.com/tico88612/kubespray.git
    type: git
    version: fix/remove-anonymous-kubeadm-validation

@tico88612
Copy link
Member Author

@huziahmetovsv
I don't know why your machine doesn't have a /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml file, I think it might have passed without validation, or it might have something to do with the fact that you didn't set the remove_anonymous_access: true condition on the original installation (playbooks/cluster.yml).

This may have caused a conflict with the condition not kubelet_conf.stat.exists which now generates cluster-info-discovery-kubeconfig.yaml.

- name: Copy discovery kubeconfig
copy:
dest: "{{ kube_config_dir }}/cluster-info-discovery-kubeconfig.yaml"
content: "{{ kubeconfig_file_discovery.stdout }}"
owner: "root"
mode: "0644"
when:
- ('kube_control_plane' not in group_names)
- not kubelet_conf.stat.exists
- kubeadm_use_file_discovery

I can add checking if /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml exists, I don't think it will affect the original condition.

Might be like not kubelet_conf.stat.exists or not cluster_info_discovery_kubeconfig.stat.exists.

@tico88612
Copy link
Member Author

tico88612 commented May 15, 2025

Let me add that the original issue was an installation issue, not an upgrade issue.

I think the situation you mentioned should be a different issue, but I will try to fix that.

tico88612 added 2 commits May 15, 2025 18:03
The validation step is moved to the end to avoid the loss of files that
may lead to verification failure.

Signed-off-by: ChengHao Yang <[email protected]>
When installing or upgrading in the past, there was no validation
config. Check if the file exists first to prevent subsequent validation
errors.

Signed-off-by: ChengHao Yang <[email protected]>
@tico88612 tico88612 force-pushed the fix/remove-anonymous-kubeadm-validation branch from a81e9c1 to 3b7c0dc Compare May 15, 2025 10:03
@tico88612
Copy link
Member Author

@huziahmetovsv could you help me test this fix again?

@huziahmetovsv
Copy link

@huziahmetovsv could you help me test this fix again?

it worked. And now file cluster-info-discovery-kubeconfig.yaml available on node

file content
cat /etc/kubernetes/cluster-info-discovery-kubeconfig.yaml 
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ...
    server: https://HOST_IP:6443
  name: ""
contexts: null
current-context: ""
kind: Config
preferences: {}

may i ask? how to check if anonymous access is disabled now?

@tico88612
Copy link
Member Author

# Setting this value to false will fail
# For details, read this comment https://github.com/kubernetes-sigs/kubespray/pull/11016#issuecomment-2004985001
kube_api_anonymous_auth: true

I went to the anonymous related notes and it seems that the Kubespray installation does not disable anonymous access. You have to change the static pod of kube-apiserver to false to disable anonymous access.

#11016 (comment)

@VannTen
Copy link
Contributor

VannTen commented May 20, 2025

Should this be merged @tico88612 ? Before 2.28 ?

@tico88612
Copy link
Member Author

This PR will fix the installation of remove_anonymous_access: true, I tested it locally with vagrant, and it works, but if no one helps review it in the next few days, we can put it in the next version and then backport it.

Because of the inclusion of upgrade fixes, it has been changed to the merge method.

/label tide/merge-method-merge

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges. label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. tide/merge-method-merge Denotes a PR that should use a standard merge by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing file cluster-info-discovery-kubeconfig.yaml for role kubeadm
4 participants