Skip to content

Workload cluster creation/deletion with default flavor via Tilt fails on 1P subscription due to unexpected VNet peering #5586

@BabyCakes13

Description

@BabyCakes13

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide self-managed & managed?]
Yes

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
I'm encountering an issue while spinning up a workload cluster using the default flavor via Tilt (following this document: https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/docs/book/src/developers/tilt-with-aks-as-mgmt-ilb.md). The Azure subscription I'm using to create these resources is a first-party (1P) subscription.

The cluster creation partially succeeds—many resources are provisioned—but the workload creation ultimately fails with the following error:

cluster.cluster.x-k8s.io/default-<random-id>created
azurecluster.infrastructure.cluster.x-k8s.io/default-<random-id>created
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/default-<random-id>-control-plane created
azuremachinetemplate.infrastructure.cluster.x-k8s.io/default-<random-id>-control-plane created
machinedeployment.cluster.x-k8s.io/default-<random-id>-md-0 created
azuremachinetemplate.infrastructure.cluster.x-k8s.io/default-<random-id>-md-0 created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/default-<random-id>-md-0 created
azureclusteridentity.infrastructure.cluster.x-k8s.io/cluster-identity-ci unchanged
--------Peering VNETs--------
1/4 aks-mgmt-<random-id>-vnet found 
2/4 default-<random-id>-vnet found 
3/4 mgmt-to-default-<random-id>peering created in aks-mgmt-<random-id>-vnet
4/4 default-<random-id>-to-mgmt peering created in default-<random-id>-vnet
Waiting for kubeconfig to be available
Kubeconfig for default-<random-id>created and saved in the local
Waiting for default-<random-id>API Server to be accessible
API Server of default-<random-id>is accessible
--------Creating private DNS zone--------
1/4 <private-dns-zone>.<region>.cloudapp.azure.com private DNS zone created in default-<random-id>
ERROR: (BadRequest) Virtual network resource not found for '/subscriptions/<subscription-id>/resourceGroups/default-<random-id>/providers/Microsoft.Network/virtualNetworks/default-<random-id>-vnet
'
Code: BadRequest
Message: Virtual network resource not found for '/subscriptions/<subscription-id>/resourceGroups/default-<random-id>/providers/Microsoft.Network/virtualNetworks/default-<random-id>-vnet
'

Attempting to delete the workload resources (via delete-all-workload-clusters) also fails due to the missing VNet peering, which is expected to be present.

Running cmd: sh -ec "./hack/tools/bin/kubectl delete clusters --all --wait=false;
    echo \"--------Clearing AKS MGMT VNETs Peerings--------\";
    az network vnet wait --resource-group ${AKS_RESOURCE_GROUP} --name ${AKS_MGMT_VNET_NAME} --created --timeout 180;
    echo \"VNet ${AKS_MGMT_VNET_NAME} found \";

    PEERING_NAMES=$(az network vnet peering list --resource-group ${AKS_RESOURCE_GROUP} --vnet-name ${AKS_MGMT_VNET_NAME} --query \"[].name\" --output tsv);
    for PEERING_NAME in ${PEERING_NAMES}; do echo \"Deleting peering: ${PEERING_NAME}\"; az network vnet peering delete --name ${PEERING_NAME} --resource-group ${AKS_RESOURCE_GROUP} --vnet-name ${AKS_MGMT_VNET_NAME}; done;
    echo \"All VNETs Peerings deleted in ${AKS_MGMT_VNET_NAME}\";
    "
cluster.cluster.x-k8s.io "default-<random-id>" deleted
cluster.cluster.x-k8s.io "default-4494" deleted
--------Clearing AKS MGMT VNETs Peerings--------
VNet aks-mgmt-<random-id>-vnet found 
Deleting peering: mgmt-to-default-<random-id>
ERROR: Operation returned an invalid status 'Bad Request'
Build Failed: Command "sh -ec \"./hack/tools/bin/kubectl delete clusters --all --wait=false;\\n    echo \\\"--------Clearing AKS MGMT VNETs Peerings--------\\\";\\n    az network vnet wait --resource-group \${AKS_RESOURCE_GROUP} --name \${AKS_MGMT_VNET_NAME} --created --timeout 180;\\n    echo \\\"VNet \${AKS_MGMT_VNET_NAME} found \\\";\\n\\n    PEERING_NAMES=\$(az network vnet peering list --resource-group \${AKS_RESOURCE_GROUP} --vnet-name \${AKS_MGMT_VNET_NAME} --query \\\"[].name\\\" --output tsv);\\n    for PEERING_NAME in \${PEERING_NAMES}; do echo \\\"Deleting peering: \${PEERING_NAME}\\\"; az network vnet peering delete --name \${PEERING_NAME} --resource-group \${AKS_RESOURCE_GROUP} --vnet-name \${AKS_MGMT_VNET_NAME}; done;\\n    echo \\\"All VNETs Peerings deleted in \${AKS_MGMT_VNET_NAME}\\\";\\n    \"" failed: exit status 1

What did you expect to happen:
I’ve already discussed this with @nawazkh, who confirmed that VNet peering is not expected for the default flavor. This appears to be the root cause of why the workload cluster cannot be properly created or deleted.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • cluster-api-provider-azure version: 1a13db0
  • Kubernetes version: (use kubectl version): v1.30.2
  • OS (e.g. from /etc/os-release): Ubuntu 24.04.1 LTS

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions