Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrated leader election to lease API #120

Merged
merged 1 commit into from
May 31, 2022

Conversation

NikhilSharmaWe
Copy link
Member

@NikhilSharmaWe NikhilSharmaWe commented Feb 3, 2022

Fixes #111

Action required: Updated leader election to use Endpoints + Lease. All provisioners based on this library must have RBAC permissions to create/update Lease objects in coordination.k8s.io/v1 API.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 3, 2022
@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 4, 2022

@aojea Why is the change here in wrong. I asked wojtekt (on slack) about how to approach this issue, he said that we have to just reuse the existing infrastructure now. So he suggested to change endpoints to enpointsleases first and later to leases.

So then why tests are failing here ?

Also can you make me understand the reason for this change @snOm3ad made in his PR and do we need this change here too ?

@aojea
Copy link

aojea commented Feb 4, 2022

E0203 13:21:18.190816   12050 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 47 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x150ae40, 0x247a6e0)
	/home/prow/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:74 +0xa3

The client-go vresion seems to old , 0.19.1, it has to bump it to a versions with the endpointleases

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 4, 2022

@aojea Ok so do we need someone's permission before bumping the version (who is well informed about the repo) or we can do it ourselves to the latest version of https://pkg.go.dev/k8s.io/apimachinery which is v0.23.3?

@aojea
Copy link

aojea commented Feb 4, 2022

you have to update in the go.mod, see an example

a4a426b

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 4, 2022
@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 4, 2022

@aojea

  • I ran go get k8s.io/apimachinery in the repo. It update go.mod and go.sum file, still jobs are failing. What can be the reason ?

Apologies for asking for help time and time again.

@aojea
Copy link

aojea commented Feb 4, 2022

check the job logs, it gives you hints

../../../pkg/mod/sigs.k8s.io/[email protected]/internal/golang/encoding/json/encode.go:1255:18: sf.IsExported undefined (type reflect.StructField has no field or method IsExported)
note: module requires Go 1.16
# golang.org/x/net/http2
../../../pkg/mod/golang.org/x/[email protected]/http2/transport.go:417:45: undefined: os.ErrDeadlineExceeded
note: module requires Go 1.17

I think that you can run those tests manually and test locally too

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 6, 2022

@aojea I am not able to make any sense of these logs and make out what is the cause of these jobs failing. Also you mentioned running these tests manually and locally, are you talking about running test/e2e/test.sh file locally.

@aojea
Copy link

aojea commented Feb 6, 2022

I pasted the logs errorrs

../../../pkg/mod/golang.org/x/[email protected]/http2/transport.go:417:45: undefined: os.ErrDeadlineExceeded
note: module requires Go 1.17

it is failing because the golang dependencies are not correct

go.mod Outdated
github.com/prometheus/client_golang v1.5.1
github.com/prometheus/client_model v0.2.0
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d // indirect
golang.org/x/time v0.0.0-20191024005414-555d28b269f0
k8s.io/api v0.19.1
k8s.io/apimachinery v0.19.1
k8s.io/apimachinery v0.23.3
k8s.io/client-go v0.19.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should bump this one too, to have all k8s libs from the same ver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsafrane As @aojea mentioned by seeing the failing logs after bumping k8s.io/client-go, we need to new golang version for io/fs package but we don't have an idea where it is configured.

Do you have any idea about this ? Please inform.

@aojea
Copy link

aojea commented Feb 7, 2022

you are progressing, it fails now in

sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller imports
	k8s.io/client-go/kubernetes imports
	k8s.io/client-go/rest imports
	k8s.io/client-go/plugin/pkg/client/auth/exec imports
	io/fs: malformed module path "io/fs": missing dot in first path element
make: *** [Makefile:18: dep] Error 1

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 8, 2022

@aojea I am not sure about what to here.

As you mentioned this error is shown io/fs: malformed module path "io/fs": missing dot in first path element. So what we have to do here actually ?

Do we have to correct the module path manually somewhere or we have to import some dependencies to fix it ?

@aojea
Copy link

aojea commented Feb 8, 2022

A quick search with that error in Google shows it requires a newer golang version ... I don't know where is that configured in this repo

@NikhilSharmaWe
Copy link
Member Author

/retest

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 9, 2022

@aojea, @jsafrane
Changed the version of go image in sig-storage-lib-external-provisioner.yaml to a newer version, still jobs are failing.

But now they are not showing any logs, build_log.txt is empty for pull-sig-storage-lib-external-provisioner-build and when I try to open details for pull-sig-storage-lib-external-provisioner-unit, it is just loading without showing any information about the fail.

What could be the reason behind it ?

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 10, 2022

@aojea and @jsafrane Could you please give your thoughts on the previous comment.

@aojea
Copy link

aojea commented Feb 10, 2022

@aojea and @jsafrane Could you please give your thoughts on the previous comment.

try to build it locally with golang 1.17, and check if there are more changes needed to the go.mod ... because of that, it sasy go1.16 now

@NikhilSharmaWe
Copy link
Member Author

NikhilSharmaWe commented Feb 10, 2022

@aojea I have tried many times previously to build the repo locally but it seems like my system does not meet the minimum requirements. Is there any other way?

And what do you mean by ... because of that, it sasy go1.16 now?

@aojea
Copy link

aojea commented Feb 10, 2022

@aojea I have tried many times previously to build the repo locally but it seems like my system does not meet the minimum requirements.

what requirements?

And what do you mean by ... because of that, it sasy go1.16 now?

@jsafrane
Copy link
Contributor

it is just loading without showing any information about the fail.

The logs are too big, click on Artifacts on the top right to get to a text file.

@humblec
Copy link
Contributor

humblec commented Feb 10, 2022

@NikhilSharmaWe run go test locally and check it pass?

@yonatankahana
Copy link

@humblec Could you please help me what is the issue here. The script for gofmt is failing, I changed the code for it but still it is giving error.

@NikhilSharmaWe the problem is that the CI using golang >=1.17 where the build tag syntax changed.

for example, mount/mountinfo_unsupported.go:

// +build !windows,!linux,!freebsd,!solaris freebsd,!cgo solaris,!cgo

should now be:

//go:build (!windows && !linux && !freebsd && !solaris) || (freebsd && !cgo) || (solaris && !cgo)

better to fix it automatically in the entire repository by running gofmt -w . (with golang >=1.17 of course)

@NikhilSharmaWe
Copy link
Member Author

@yonatankahana Yes I also did that before but after that I was getting error related to that. Trying it again. Also do we need to change the go version in go.mod.

@yonatankahana
Copy link

yonatankahana commented May 31, 2022

@NikhilSharmaWe

now its failing but from a different reason. the script repo-infra/verify/boilerplate/boilerplate.py don't know about the //go:build syntax and it keep failing the test for both files you've fixed, you need to enhance the regex: go_build_constraints to find both kinds of tags, something like ^(//(go:| \+)build.*\n)+\n.

@NikhilSharmaWe
Copy link
Member Author

@yonatankahana If the the script repo-infra/verify/boilerplate/boilerplate.py don't know about the //go:build syntax then why are we adding it since the constraint is already added through // +build syntax. I believe it is due to when we upgrade the go version.

But I am not able to understand what type of tags are we searching for here. Could you please explain.
And apologies for spending too much time on this.

@yonatankahana
Copy link

yonatankahana commented May 31, 2022

But I am not able to understand what type of tags are we searching for here. Could you please explain.
And apologies for spending too much time on this.

@NikhilSharmaWe
no need to apologize, it's team work :)

the // +build tags are deprecated and replaced by //go:build tags.
the script boilerplate.py verify that every file with code (.go, .py, Makefile, etc...) have proper license header.
to do that, as can be seen in L84 he wants to remove build tags from the top of Go files. to do that, he uses regex pattern that remove any line starts with // +build, but he don't know to remove lines starts with //go:build yet.
changing the regex in L185 from ^(// \+build.*\n)+\n to ^(//(go:| \+)build.*\n)+\n should detect both kinds and the test will pass.
don't forget to edit the comment in L184 as well.

@yonatankahana
Copy link

yonatankahana commented May 31, 2022

@NikhilSharmaWe

ok, next. now he complains about missing python, maybe try to change all occurrences of python to python3? its a guess... but worth the try

@@ -181,8 +181,8 @@ def get_regexs():
# dates can be 2014, 2015, 2016, ..., CURRENT_YEAR, company holder names can be anything
years = range(2014, date.today().year + 1)
regexs["date"] = re.compile( '(%s)' % "|".join(map(lambda l: str(l), years)) )
# strip // +build \n\n build constraints
regexs["go_build_constraints"] = re.compile(r"^(// \+build.*\n)+\n", re.MULTILINE)
# strip (// go:build \n\n) and (// +build \n\n) build constraints

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be no space between // and go:build

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 31, 2022
@yonatankahana
Copy link

@NikhilSharmaWe ok now the actual go tests are failing.

running go test ./controller reproduce it for me on my machine. sounds like a real mistake in the code, looks like #120 (comment) or something similar.

@@ -865,7 +865,7 @@ func (ctrl *ProvisionController) Run(ctx context.Context) {
go ctrl.volumeStore.Run(ctx, DefaultThreadiness)

if ctrl.leaderElection {
rl, err := resourcelock.New("endpoints",
rl, err := resourcelock.New("endpointsleases",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the constant resourcelock.EndpointsLeasesResourceLock here

@yonatankahana
Copy link

@NikhilSharmaWe

try to change controller.go L872 to:

ctrl.client.CoordinationV1(),

Copy link

@yonatankahana yonatankahana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can confirm that i've checked it with nfs-subdir-external-provisioner and it looks good.

@wongma7, @jsafrane - can you take a look at it?

@wongma7
Copy link
Contributor

wongma7 commented May 31, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: NikhilSharmaWe, wongma7, yonatankahana

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2022
@k8s-ci-robot k8s-ci-robot merged commit a4ce2c1 into kubernetes-sigs:master May 31, 2022
@humblec
Copy link
Contributor

humblec commented Jun 1, 2022

@yonatankahana++ for helping and getting this IN!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate leader election to use Lease API
8 participants