Skip to content

Gatekeeper fails to start with mutation-webhook operation only - enforcement point error #3928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
adiazny opened this issue Apr 24, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@adiazny
Copy link

adiazny commented Apr 24, 2025

What steps did you take and what happened:

Using Gatekeeper 3.18.2 on Kubernetes 1.32 (also observed on 1.29), when the operation is set to mutation-webhook only (to isolate mutation operations), the following error occurs:

{"level":"error","ts":1745424231.8622038,"logger":"setup","msg":"unable to set up OPA client","error":"unable to create client: must specify at least one enforcement point with client.EnforcementPoints","stacktrace":"main.setupControllers\n\t/go/src/github.com/open-policy-agent/gatekeeper/main.go:441\nmain.innerMain.func4\n\t/go/src/github.com/open-policy-agent/gatekeeper/main.go:319"}
{"level":"info","ts":1745424231.862269,"logger":"setup","msg":"disabling controllers..."}

What did you expect to happen:
I expected the Gatekeeper instance to start running without any issues when the operation is set to mutation-webhook only.

Anything else you would like to add:
As a workaround, adding both mutation-webhook and webhook operation flags allows Gatekeeper to start successfully:

- --operation=webhook                                                                                                                                                                                                                                 
- --operation=mutation-webhook

This suggests that Gatekeeper requires at least one enforcement point even when only mutation capabilities are needed.

Environment:

  • Gatekeeper version: 3.18.2 & 3.19.0
  • Kubernetes version: (use kubectl version):
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.4
@adiazny adiazny added the bug Something isn't working label Apr 24, 2025
@adiazny
Copy link
Author

adiazny commented May 6, 2025

Our current working configuration under gatekeeper 3.15 that we want to continue to support:

  • A gatekeeper mutation webhook configured with only mutation-webhook operation.
  • A gatekeeper validating webhook configured with only webhook operation.

To demonstrate a working gatekeeper 3.15 and non-working gatekeeper 3.18, I created a iximuiz devops playground to show case the deployments. Follow the instructions here: https://labs.iximiuz.com/playgrounds/my-custom-e733aca2-1848dc81

What you can do in the playgrounds:

  • Recreate Gatekeeper issue #3928
  • Launch separate playgrounds with either a 1.29.12 or 1.32.3 Kind cluster
  • Deploy Gatekeeper 3.15.1 (no defect)
  • Or deploy Gatekeeper 3.18.3 (defect)
  • Apply policy enforcement resources
  • Test policy enforcement with the sample resources

Observations

No defect observed for Gatkeeper 3.15:

  • Support running seprate operations for each webhook, one for validating and one for mutation.
  • Does not have an operation=generate for the audit webhook.
  • All mutation and validation policies work as expected.

Defect observed for Gatekeeper 3.18

  • Cannot run seprate operations for each webhook, one for validating and one for mutation.
    • As a potential workaround: Add the webhook operation to the mutation gatekeeper webhook in addition to the mutation-webhook operation.
  • The audit gatekeeper webhook MUST add the generate operation.
  • All mutation and valdation policies seem to work as expected.

@JaydipGabani
Copy link
Contributor

JaydipGabani commented May 14, 2025

The error actually originates from this line in the Gatekeeper codebase:

.

The OPA framework expects an enforcementPoint to be provided when initializing the client. Gatekeeper maps operations like webhook and audit to their corresponding enforcement points internally. So, if no operation (i.e., neither webhook nor audit) is specified, Gatekeeper attempts to initialize the OPA client without any enforcement points—causing the failure.

Enforcement points are used by the framework to associate constraints with incoming review requests. This ensures that only constraints relevant to the specific enforcement point (e.g., audit or webhook) are evaluated. This mechanism is what enables use cases such as applying a particular constraint only during audit.

There are three ways to fix it -

  1. Do not return any error when there are no enforcementPoints passed to new client, do not return error and return empty client - This needs to make sure that the client on the later part is not being utilized to avoid nil pointer error.
  2. Update whole enforcement point code to make sure it works without any enforcement points being passed while initiating a client.
  3. Refactor GK code so that OPA client is only required for operations that utilizes it. Right now OPA client dep is getting injected/used no matter the operation, so this may be an opportunity to refactor and optimize the code.

@ritazh I am curious to know your thoughts on what way we should lean towards. I prefer 3rd option which requires bigger change but optimizes the code a little bit as well since we do not need OPA client if we are not using operation webhook and audit.

@ritazh
Copy link
Member

ritazh commented May 15, 2025

Thanks for the summary @JaydipGabani I agree we should refactor GK and only use OPA client when needed as a long term solution, however this will require lots of code changes and testing to ensure backward compatibility and to prevent regressions. A change like that IMO isnt a patch release update. To remediate this issue soon and to get a patch release out soon with minimal code change to ensure we don't introduce more variables, I prefer option 2 since enforcement code should handle the case where enforcement points are not provided.

@JaydipGabani
Copy link
Contributor

JaydipGabani commented May 15, 2025

Makes sense. Let's focus on getting the fix out. I have created issue to track refactoring the code - #3964.

@adiazny Let me know if/when you will be able to pick this up. I can help you with the scope of changes and codebase if needed.

@adiazny
Copy link
Author

adiazny commented May 19, 2025

Sounds good @JaydipGabani. I'll connect with you this week to detail out the scope of changes. I'm in the process of getting the gatekeeper project reviewed and approved with my firms open source contribution process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants