Skip to content

Conversation

bryan-cox
Copy link
Contributor

@bryan-cox bryan-cox commented Apr 7, 2025

What type of PR is this?
/kind bug

What this PR does / why we need it:
Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

Which issue(s) this PR fixes:
Fixes #5472

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2025
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 7, 2025
@bryan-cox
Copy link
Contributor Author

/assign @nawazkh

Copy link

codecov bot commented Apr 7, 2025

Codecov Report

Attention: Patch coverage is 13.88889% with 31 lines in your changes missing coverage. Please review.

Project coverage is 52.82%. Comparing base (3a9eb5b) to head (f82a3ac).
Report is 146 commits behind head on main.

Files with missing lines Patch % Lines
main.go 0.00% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5552      +/-   ##
==========================================
- Coverage   53.27%   52.82%   -0.45%     
==========================================
  Files         272      279       +7     
  Lines       29522    29629     +107     
==========================================
- Hits        15727    15652      -75     
- Misses      12980    13160     +180     
- Partials      815      817       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nojnhuh
Copy link
Contributor

nojnhuh commented Apr 8, 2025

the AzureManagedControlPlane CR which is behind the ASOAPI feature gate.

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

I think I mentioned somewhere, maybe in #5099 or in a Slack thread related to that PR, that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags. Either something like that, or we explicitly disclaim all support for any installation that is not exactly equivalent to the CRDs and other manifests we publish for releases, since we generally assume the controller manager is running when all of the CRDs are installed. Changing the meaning of existing feature gates isn't a sustainable way to solve the general "I didn't install a CRD and now the controller manager is crashing" problem.

@enxebre
Copy link
Member

enxebre commented Apr 8, 2025

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort #5294

@nawazkh
Copy link
Member

nawazkh commented Apr 8, 2025

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

Thank you for adding more context on this Jon.

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort #5294

I agree also. We could start with updating manager.yaml with a bunch of env variables that enable different controllers and webhooks.


So we essentially close out this PR @bryan-cox ?

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 8, 2025
@bryan-cox bryan-cox marked this pull request as draft April 8, 2025 18:07
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2025
@bryan-cox bryan-cox changed the title Move ASO secret controller behind feature gate Add support to disable CAPZ components through a manager flag Apr 8, 2025
@bryan-cox
Copy link
Contributor Author

/test all

@bryan-cox
Copy link
Contributor Author

/test all

@bryan-cox
Copy link
Contributor Author

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

@nawazkh
Copy link
Member

nawazkh commented May 2, 2025

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

I like the idea of being able to toggle controllers, so green flag from me on this approach.

However, we need to ensure that the Management cluster is functional despite turning off controller(maybe all in the future?), so maybe we also incorporate an e2e test to validate the functionality.
We could add that test scenario in an optional test.

@bryan-cox , what do you say ?

@nawazkh
Copy link
Member

nawazkh commented May 30, 2025

Un-assigning myself for now since I won't be able to get to this immediately.
/unassign

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 11, 2025
@bryan-cox
Copy link
Contributor Author

Hey @willie-yao 👋🏻 - I made an issue to keep track of adding an e2e for the functionality introduced in this PR, #5744, as we discussed a few weeks ago in the CAPZ weekly sync.

Please let me know if there is anything further I can do to help get this PR to merged.

@bryan-cox
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-e2e

@bryan-cox
Copy link
Contributor Author

/retest

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on this @bryan-cox! Just one minor nit from me, otherwise lgtm!

/approve
/hold for squash


// NOTE: when adding a new DisableComponent, please also add it to the ValidDisableableComponents map.
const (
// DisableASOController disables the ASOSecretController from being deployed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// DisableASOController disables the ASOSecretController from being deployed.
// DisableASOSecretController disables the ASOSecretController from being deployed.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 14, 2025
@bryan-cox
Copy link
Contributor Author

@willie-yao this should be ready for you again. I fixed the issue and squashed the commits.

Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold cancel

@bryan-cox Thanks for your hard work and patience on this!

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 15, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6285739c96f72a1f0997d6fd71726c0eea7789cd

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 5ae25f7 into kubernetes-sigs:main Jul 15, 2025
23 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jul 15, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in CAPZ Planning Jul 15, 2025
@bryan-cox bryan-cox deleted the 5472 branch July 16, 2025 00:59
@mboersma
Copy link
Contributor

/cherry-pick release-1.20

@mboersma
Copy link
Contributor

/cherry-pick release-1.19

@k8s-infra-cherrypick-robot

@mboersma: new pull request created: #5758

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-infra-cherrypick-robot

@mboersma: new pull request created: #5759

In response to this:

/cherry-pick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Self-managed infrastructure of CAPZ crash loops when AzureManagedControlPlane is not installed
8 participants