Skip to content

[release-3.6] Replace resolver.State.Addresses with resolver.State.Endpoint.Addresses #19782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: release-3.6
Choose a base branch
from

Conversation

k8s-infra-cherrypick-robot

This is an automated cherry-pick of #19780

/assign ahrtr

@k8s-ci-robot
Copy link

Hi @k8s-infra-cherrypick-robot. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2025

/ok-to-test

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.89%. Comparing base (52bddd0) to head (9b6738c).
Report is 2 commits behind head on release-3.6.

Additional details and impacted files
Files with missing lines Coverage Δ
client/v3/internal/resolver/resolver.go 85.18% <100.00%> (+1.18%) ⬆️
client/v3/naming/resolver/resolver.go 80.35% <100.00%> (+1.11%) ⬆️

... and 19 files with indirect coverage changes

@@               Coverage Diff               @@
##           release-3.6   #19782      +/-   ##
===============================================
- Coverage        68.92%   68.89%   -0.04%     
===============================================
  Files              420      420              
  Lines            35607    35612       +5     
===============================================
- Hits             24543    24535       -8     
- Misses            9638     9647       +9     
- Partials          1426     1430       +4     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52bddd0...9b6738c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2025

@dfawley The patch #19780 works on etcd/main (depends on [email protected]), but not for etcd/release-3.6 (depending on [email protected]).

The error message is rpc error: code = DeadlineExceeded desc = latest balancer error: last resolver error: produced zero addresses. Refer to https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/etcd-io_etcd/19782/pull-etcd-e2e-amd64/1914971185293037568

It might be related to this grpc-go's commit grpc/grpc-go@aad8a12

After bumping grpc-go version to v1.71.1, then it's working now. Please note release-3.6 and release-3.5 are both our stable releases, usually we only bump dependencies for etcd's stable releases when there are critical or CVE fixes. can you please advise whether it's safe to bump grpc-go's version on etcd/release-3.6 and etcd/release-3.5?

@dfawley
Copy link

dfawley commented Apr 23, 2025

It might be related to this grpc-go's commit grpc/grpc-go@aad8a12

I don't think that should affect things, since that's in the xds tree and only deals with xds (envoy)-related support, which I believe you are not using?

In v1.71 (and with some fixes in v1.71.1), we fixed our support for HTTP[s]_PROXY, and I do see "Proxy" in your logs, so maybe that's related?

can you please advise whether it's safe to bump grpc-go's version on etcd/release-3.6 and etcd/release-3.5?

I don't see any reason why it wouldn't be safe to upgrade. Note that with Go's MVS, your users could already be running with that combination. But please check out our release notes for v1.71.0 and v1.71.1 so you can see the full list of changes: https://github.com/grpc/grpc-go/releases (and let me know if you have any questions). I don't believe any changes in there should impact name resolver support.

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2025

thx for the feedback.

since that's in the xds tree and only deals with xds (envoy)-related support, which I believe you are not using?

I don't think etcd use xDS.

In v1.71 (and with some fixes in v1.71.1), we fixed our support for HTTP[s]_PROXY, and I do see "Proxy" in your logs, so maybe that's related?

It might not be related to proxy. Please note that even the following simple unit test works on grpc-go v1.71.1, but not on 1.70.0, the error message was the same.

func TestAuthTokenBundleNoOverwrite(t *testing.T) {
// This call in particular changes working directory to the tmp dir of
// the test. The `etcd-auth-test:0` can be created in local directory,
// not exceeding the longest allowed path on OsX.
testutil.BeforeTest(t)
// Create a mock AuthServer to handle Authenticate RPCs.
lis, err := net.Listen("unix", "etcd-auth-test:0")
require.NoError(t, err)
defer lis.Close()
addr := "unix://" + lis.Addr().String()
srv := grpc.NewServer()
etcdserverpb.RegisterAuthServer(srv, mockAuthServer{})
go srv.Serve(lis)
defer srv.Stop()
// Create a client, which should call Authenticate on the mock server to
// exchange username/password for an auth token.
c, err := NewClient(t, Config{
DialTimeout: 5 * time.Second,
Endpoints: []string{addr},
Username: "foo",
Password: "bar",
})
require.NoError(t, err)
defer c.Close()
oldTokenBundle := c.authTokenBundle
// Call the public Dial again, which should preserve the original
// authTokenBundle.
gc, err := c.Dial(addr)
require.NoError(t, err)
defer gc.Close()
newTokenBundle := c.authTokenBundle
if oldTokenBundle != newTokenBundle {
t.Error("Client.authTokenBundle has been overwritten during Client.Dial")
}
}

I don't see any reason why it wouldn't be safe to upgrade. Note that with Go's MVS, your users could already be running with that combination.

OK, thx.

cc @fuweid @serathius @ivanvc

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport LGTM.

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2025

@dfawley The patch #19780 works on etcd/main (depends on [email protected]), but not for etcd/release-3.6 (depending on [email protected]).

The error message is rpc error: code = DeadlineExceeded desc = latest balancer error: last resolver error: produced zero addresses. Refer to https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/etcd-io_etcd/19782/pull-etcd-e2e-amd64/1914971185293037568

After manual bi-sect, eventually I locked to commit grpc/grpc-go@e95a4b7, the tests passes with this or newer commit, fails with commits older than this one. @dfawley

@dfawley
Copy link

dfawley commented Apr 23, 2025

Ah, I believe what's happening is that if you are only setting Endpoints, then, until we switched round robin to use to the new pick first policy (the commit you found), we wouldn't actually use the Endpoints field. Our migration strategy was: 1. resolvers set both fields (with our ClientConn setting Endpoints based on Addresses if the resolver doesn't), 2. policies are updated to support Endpoints, 3. resolvers set only Endpoints. (2) was only completed in v1.71.

So I think everything is understood, and should work fine for your users as long as you update your dependency past that commit.

@ahrtr
Copy link
Member

ahrtr commented Apr 23, 2025

Thanks for the confirmation. Please approve #19783 if you get a second.

@fuweid @serathius pls let's know if you have any comment or concern. thx

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, fuweid, ivanvc, k8s-infra-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants