Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeLifecycleController treats node lease renewal as a heartbeat signal #69241

Merged
merged 1 commit into from
Oct 12, 2018

Conversation

wangzhen127
Copy link
Member

@wangzhen127 wangzhen127 commented Sep 28, 2018

What this PR does / why we need it:
This PR makes NodeLifecycleController treat node lease renewal as a heartbeat signal, in addition to NodeStatus update. This is part of KEP-0009, feature #589 and issue #14733.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NodeLifecycleController: Now node lease renewal is treated as the heartbeat signal from the node, in addition to NodeStatus Update.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 28, 2018
@wangzhen127
Copy link
Member Author

Still work in progress. Need to add tests. Will notify once it's done. For visibility:
/cc @wojtek-t @wasylkowski-a

@k8s-ci-robot
Copy link
Contributor

@wangzhen127: GitHub didn't allow me to request PR reviews from the following users: wasylkowski-a.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Still work in progress. Need to add tests. Will notify once it's done. For visibility:
/cc @wojtek-t @wasylkowski-a

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/kind feature

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Sep 30, 2018
@wojtek-t
Copy link
Member

wojtek-t commented Oct 1, 2018

/assign

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable, but it requires tests.
Also, as I mentioned in one comment, if you could move renames to a separate PR, this PR would be so much easier to review (and renames are no-controversial at all and doesn't require tests, so we could merge them immediately).

@@ -132,10 +137,11 @@ const (
retrySleepTime = 20 * time.Millisecond
)

type nodeStatusData struct {
type nodeHealthData struct {
probeTimestamp metav1.Time
readyTransitionTimestamp metav1.Time
status v1.NodeStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching this code anyway - can we change this to pointer? [potentially in a separate PR if that's easier - you can do all the renames there too]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this PR more, if you could move just renames to a separate PR, that would make this PR so much easier.
Thanks!

nodeStatusMap map[string]nodeStatusData
// Per Node map storing last observed healthy data, including Status and Lease,
// together with a local time when it was observed.
nodeHealthMap map[string]nodeHealthData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're touching it anyway - this should probably be map[string]*nodeHealthData.

Copy link
Member Author

@wangzhen127 wangzhen127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #69305 for the PR doing the renames. I will rebase and add tests after that PR gets merged.

@jennybuckley
Copy link

/cc @cheftako

@wangzhen127
Copy link
Member Author

/sig node
/cc @yujuhong

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 1, 2018
@wojtek-t
Copy link
Member

wojtek-t commented Oct 4, 2018

@wangzhen127 - this is ready for rebase I think.

@wangzhen127
Copy link
Member Author

Yes. I am adding tests. Will push another version soon.

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 4, 2018
@wangzhen127
Copy link
Member Author

Ping @gmarek :)

@wojtek-t
Copy link
Member

wojtek-t commented Oct 9, 2018

Ping @gmarek :)

I talked offline with @gmarek. He won't have time to look into that in the next 2 weeks or so. But he is fine with merging without his review and he will take a look some time towards end of October and if needed we will apply his comments then.

@yujuhong - will you be able to take a look in that case (in addition to my review)?

Copy link
Contributor

@yujuhong yujuhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I have some questions about the initialization of the nodeHealth map and various conditions, but most probably because I'm not that familiar with the code

if utilfeature.DefaultFeatureGate.Enabled(features.NodeLease) {
// Always update the probe time if node lease is renewed.
observedLease, _ = nc.leaseLister.Leases(v1.NamespaceNodeLease).Get(node.Name)
if observedLease != nil && (!found || savedLease == nil || savedLease.Spec.RenewTime.Before(observedLease.Spec.RenewTime)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it necessary to check found? It was set 50 lines ago and savedLease == nil seems to be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We do not need to check found. Updated.

probeTimestamp: node.CreationTimestamp,
readyTransitionTimestamp: node.CreationTimestamp,
}
if _, found := nc.nodeHealthMap[node.Name]; found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we expect the node to be in the nodeHealthMap but does not have the ready condition set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question is whether it's possible that nc.nodeHealthMap[node.Name] is set to nil?

It's not obvious to me that when setting the value in line 929 below nc.nodeHealthMap[node.Name] = savedNodeHealth, the savedNodeHealth will always be non-nil...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we expect the node to be in the nodeHealthMap but does not have the ready condition set?

Look at this example:

  • time t0: Node is created.
  • time t1: Node renew lease, but NodeStatus does not have ready condition. And found is false. We create and save a fake ready condition (unknown) here. And we also save the lease below.
  • time t2: Still, NodeStatus does not have ready condition. So currentReadyCondition is still nil. We saved the node at t1 with unknown ready condition. So found is true.

Another question is whether it's possible that nc.nodeHealthMap[node.Name] is set to nil?

savedNodeHealth is always not nil at line 929. So once this function runs at least once, nc.nodeHealthMap[node.Name] is always non-nil.

Consider 2 cases:

  • If currentReadyCondition is nil, we either use the existing one or create a fake one. Then savedNodeHealth on line 855 is not nil.
  • If currentReadyCondition is not nil, and if at this point we do not have any saved copy, i.e., nc.nodeHealthMap[node.Name] is nil for now. Then at line 855, we check it again, we will have found to be false, so we create a savedNodeHealth at line 879, and then assign it to nc.nodeHealthMap[node.Name] at line 929.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. I think the code here is not very easy to follow (not this PR's fault).

One case that seems a bit strange to me is that even if the node doesn't have status, as long as the it continues renewing the lease, the controller will take no action. I think this is by design, but still feels strange :\

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of heartbeat, I think renewing the lease satisfy the need of making sure node is alive.

@wojtek-t, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the situation where a node does not have status, but keeps renewing lease depends on the value of node-status-update-period.

So what we have been thinking is to switch:

  • set lease refresh period (however we call it) to how frequently we update node status now
  • decrease frequency of node status updates to something like 1 minute
  • but still compute node statuses with the same frequency (equal of lease refresh period)

So if we make leases and node-statuses independent, then yes - there may be small period when there is no node status (this can pretty much happen only after node registration though, because after that it should always be there).
But if we would make them somehow correlated (I'm not saying we should do that - I'm saying it's theoretically possible), then it would only happen on races.

But yeah - I agree the latter probably doesn't make much sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yujuhong, does this small period that there is no node status sound ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more interested in the pathological case where kubelet could not post the status ever, while it could still sending heartbeats. I think this theoretically could happen if we decouple heartbeat and status updates in kubelet (which is probably the right thing to do to not add more complexity and dependency). I don't think we need to deal with the pathological case now, but in case we want to put a safeguard in the future, let's settle on leaving a comment saying something like If kubelet never posted the node status, but continues renewing the heartbeat leases, the node controller will assume the node is healthy and take no action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added the comments at line 923 now. PTAL

Copy link
Member Author

@wangzhen127 wangzhen127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. PTAL

probeTimestamp: node.CreationTimestamp,
readyTransitionTimestamp: node.CreationTimestamp,
}
if _, found := nc.nodeHealthMap[node.Name]; found {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we expect the node to be in the nodeHealthMap but does not have the ready condition set?

Look at this example:

  • time t0: Node is created.
  • time t1: Node renew lease, but NodeStatus does not have ready condition. And found is false. We create and save a fake ready condition (unknown) here. And we also save the lease below.
  • time t2: Still, NodeStatus does not have ready condition. So currentReadyCondition is still nil. We saved the node at t1 with unknown ready condition. So found is true.

Another question is whether it's possible that nc.nodeHealthMap[node.Name] is set to nil?

savedNodeHealth is always not nil at line 929. So once this function runs at least once, nc.nodeHealthMap[node.Name] is always non-nil.

Consider 2 cases:

  • If currentReadyCondition is nil, we either use the existing one or create a fake one. Then savedNodeHealth on line 855 is not nil.
  • If currentReadyCondition is not nil, and if at this point we do not have any saved copy, i.e., nc.nodeHealthMap[node.Name] is nil for now. Then at line 855, we check it again, we will have found to be false, so we create a savedNodeHealth at line 879, and then assign it to nc.nodeHealthMap[node.Name] at line 929.

if utilfeature.DefaultFeatureGate.Enabled(features.NodeLease) {
// Always update the probe time if node lease is renewed.
observedLease, _ = nc.leaseLister.Leases(v1.NamespaceNodeLease).Get(node.Name)
if observedLease != nil && (!found || savedLease == nil || savedLease.Spec.RenewTime.Before(observedLease.Spec.RenewTime)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We do not need to check found. Updated.

@yujuhong
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2018
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 11, 2018
@wangzhen127
Copy link
Member Author

Thanks @yujuhong!

I just noticed and fixed the formatting issue. @wojtek-t, can you approve?

@wojtek-t
Copy link
Member

I took one more final look and it also looks good to me now.

/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2018
@wojtek-t
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wangzhen127, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 12, 2018
@k8s-ci-robot k8s-ci-robot merged commit 23a8477 into kubernetes:master Oct 12, 2018
@wangzhen127 wangzhen127 deleted the heartbeat branch October 12, 2018 16:48
@gmarek
Copy link
Contributor

gmarek commented Oct 24, 2018

@wangzhen127 @yujuhong @wojtek-t I took a look at this PR and IMO the change is technically correct. What I'm wondering about is the decision to not set the NodeReady condition to true when NC observes a lease. Can someone explain me the reasoning behind this decision? I feel that users might depend on NodeReady=True as a signal to detect when Nodes are up (e.g. kubeUp depended on that at some point).

@wojtek-t
Copy link
Member

What I'm wondering about is the decision to not set the NodeReady condition to true when NC observes a lease. Can someone explain me the reasoning behind this decision? I feel that users might depend on NodeReady=True as a signal to detect when Nodes are up (e.g. kubeUp depended on that at some point).

I think that's a missing thing - if NodeReady != True and NC observed a Lease, I think we should set NodeReady = True. Would that solve all your concerns?

@gmarek
Copy link
Contributor

gmarek commented Oct 24, 2018

Yup.

@wojtek-t
Copy link
Member

@wangzhen127 - will you be able to add this? ^^

@wangzhen127
Copy link
Member Author

Yes, will create another PR for it.

@gmarek
Copy link
Contributor

gmarek commented Oct 25, 2018

@wangzhen127 - thanks!

@wangzhen127
Copy link
Member Author

I thought more on this and also talked to @yujuhong offline. I think we should not let NC change NodeReady status to True if it observed a lease renewal. Because node lease is just a heartbeat signal. It says nothing about whether the node is ready. Let's look at all 4 cases below.

Case 1: When node just starts, there was no status:
To NC, it means: "I see you. I know you are reachable. So you are not unknown. But I don't know if you are ready. I will let you decide later".

Case 2, When node already has status as NodeReady == False
NC just know the node is reachable. NC does not know when node will become ready because kubelet could just keep deciding NodeReady == False if there is some error on the node.

Case 3, When node already has status as NodeReady == Unknown
In this case, if NC starts to get node lease, it may want to change the status. The problem is that NC does not know whether it should change to NodeReady == True or NodeReady == False. On the other hand, every time kubelet checks node status, ReadyCondition setter will try to update the status to either NodeReady == True or NodeReady == False, but not unknown. That will trigger a node status change, and kubelet then should report new NodeStatus to NC immediately.

Case 4, When node already has status as NodeReady == True
There is nothing NC needs to do.

mars1024 pushed a commit to mars1024/kubernetes that referenced this pull request Nov 26, 2019
Make periodic NodeStatus updates cheaper

cherry-pick 
master, create lease api, kubernetes#64246
master, node lifecycle controller, kubernetes#69241
kubelet, kubernetes#66257

See merge request !184866
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants