Skip to content

Conversation

marosset
Copy link
Contributor

@marosset marosset commented Aug 21, 2025

I tested these changes in a personal azure sub

This is related to kubernetes-sigs/cluster-api-provider-azure#5688 and after these cache changes are in place we'll update that PR to use the new ACR image references which should prevent us from seeing image throttling issues during large cluster deployements

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 21, 2025
@k8s-ci-robot k8s-ci-robot added area/infra Infrastructure management, infrastructure design, code in infra/ area/infra/azure Issues or PRs related to Kubernetes Azure infrastructure area/provider/azure Issues or PRs related to azure provider sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 21, 2025
@marosset
Copy link
Contributor Author

/assign @jackfrancis @nojnhuh @willie-yao

images to avoid throtteling during large cluster deployments
@marosset marosset force-pushed the add-cache-images-to-capz-acr branch from b2b6253 to e8412ac Compare August 21, 2025 20:29
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marosset
Once this PR has been reviewed and has the lgtm label, please ask for approval from jackfrancis. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marosset
Copy link
Contributor Author

/hold
I'm working on some extra validation... will update...

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 22, 2025
…calico images to avoid throtteling during large cluster deployments
@marosset
Copy link
Contributor Author

I pushed some updates to the templates but am stuck at

terraform plan -out main.tfplan
Acquiring state lock. This may take a few moments...

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: building account: unable to configure ResourceManagerAccount: subscription ID could not be determined and was not specified
│
│   with provider["registry.terraform.io/hashicorp/azurerm"],
│   on main.tf line 17, in provider "azurerm":
│   17: provider "azurerm" {
│
╵
Releasing state lock. This may take a few moments...

I saw in the docs that there is terraform state manage stored in an azure storage account.
Is the expectation that we pull that down to validate updates with terraform init / plan?

@nojnhuh
Copy link
Contributor

nojnhuh commented Aug 22, 2025

Is that error actually unrelated to the state, and it's only having trouble picking up your az login context?

I saw in the docs that there is terraform state manage stored in an azure storage account.
Is the expectation that we pull that down to validate updates with terraform init / plan?

It's been forever since I've actually run any terraform, but yes I think the general pattern would be that one canonical set of state is established and shared for all future operations. I haven't heard that we've set up any remote backend for that though, but I haven't been keeping super close tabs here either. Using a storage account in Azure might introduce a chicken-and-egg problem if the idea is that we should be able to migrate to a new sub.

@marosset
Copy link
Contributor Author

Is that error actually unrelated to the state, and it's only having trouble picking up your az login context?

I saw in the docs that there is terraform state manage stored in an azure storage account.
Is the expectation that we pull that down to validate updates with terraform init / plan?

It's been forever since I've actually run any terraform, but yes I think the general pattern would be that one canonical set of state is established and shared for all future operations. I haven't heard that we've set up any remote backend for that though, but I haven't been keeping super close tabs here either. Using a storage account in Azure might introduce a chicken-and-egg problem if the idea is that we should be able to migrate to a new sub.

Is there a reason why we are keeping the state in the azure storage account instead of github?
Is there sensitive info in there?

@marosset
Copy link
Contributor Author

Is that error actually unrelated to the state, and it's only having trouble picking up your az login context?

I saw in the docs that there is terraform state manage stored in an azure storage account.
Is the expectation that we pull that down to validate updates with terraform init / plan?

It's been forever since I've actually run any terraform, but yes I think the general pattern would be that one canonical set of state is established and shared for all future operations. I haven't heard that we've set up any remote backend for that though, but I haven't been keeping super close tabs here either. Using a storage account in Azure might introduce a chicken-and-egg problem if the idea is that we should be able to migrate to a new sub.

@marosset
Copy link
Contributor Author

Is that error actually unrelated to the state, and it's only having trouble picking up your az login context?

I am logged into the correct tenant and account (verified with az account show) so I don't think it an az login issue (but it might be)

@nojnhuh
Copy link
Contributor

nojnhuh commented Aug 22, 2025

Is there a reason why we are keeping the state in the azure storage account instead of github?

If we're actually maintaining the state anywhere outside of our local workstations, that's news to me.

I don't see anything in the docs along the lines of "you should commit this to version control" and instead see mentions of other remote storage backends, which suggests adding it to the repo might be an anti-pattern.

Is there sensitive info in there?

The docs mention encrypting the state, but I don't think any of what we have contains any secrets that would warrant encryption. I'd obviously want to look at it closely before pushing it anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra/azure Issues or PRs related to Kubernetes Azure infrastructure area/infra Infrastructure management, infrastructure design, code in infra/ area/provider/azure Issues or PRs related to azure provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants