-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some antrea-agent pods stuck in termination state for around 15 minutes #625
Comments
It seems the issue was because ovs conntrack state mismatched unexpectedly. I'm able to reproduce the connection issue between metrics-server and kube-apiserver. The steps are:
There should be 1 conntrack record in zone 65520 (in host network):
It is commited by openflow.
Check openflow stats, you should find the packets matched
Check conntrack, the record is still there but not matched:
The issue may not happen for the first time, try several times if not. Steps to reproduce without antreaAfter some experiments, it seems the issue was because some packets forwarded by the default flow after ovs came up and before antrea-agent installed flows. Then following packets will always be "+inv+trk" even if the conntrack record is still there. It can be reproduced by steps below:
We expect connections are committed to zone 65520 and not dropped.
How to fixWhile the issue could be because of some unexpected behavior in ovs which we should ask for OVS community for help, we could improve antrea to avoid it: BTW, I can reproduce it with OVS userspace 2.13.0 and 2.12.0. |
Great finding @tnqn
Do you known why? Is this an issue with OVS? Why does it depend on whether some packets are forwarded using the default Normal flow after OVS comes back up? |
@antoninbas I don't know why yet, I'm asking OVS experts whether this is an issue or misconfiguration. Do you think we could use |
@tnqn Do you think we can clear the flag after adding the "connectivity" flows (that should be very fast) and without waiting for NP flows? If yes, then let's make the change. |
@antoninbas Yes, I think so, currently we don't wait for NP flows to be installed before enabling forwarding anyway. |
But what flows are needed to avoid the conntrack state mismatch? Any flow, except the default flow? |
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
Thanks Yi-Hung Wei from OVS community for finding the root cause, quoting his explanation:
@jianjuns according to the explanation above, it's not only the default flow, we must send all packets to conntrack to keep the connection's state tracked correctly. I think using "flow-restore-wait" can fix it, and we should start forwarding at least after restoring the pipeline, but it's also good to wait until pod's flow are restored since it's very fast. Besides, I found another good reason of using "flow-restore-wait", previously datapath flows would be cleaned once ovs-vswitchd was started, so existing connections especially cross-node ones could still have some downtime before antrea-agent restores the flows. With "flow-restore-wait", ovs-vswitchd won't flush or expire previously set datapath flows, so existing connections can achieve real 0 downtime in theory. |
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
Sounds a good approach (though ideally we should apply all flows in a single bundle after restart). |
Actually I was thinking about this more over the weekend. Should it be considered a security issue that we restore connectivity before we re-install NP flows? Maybe we should switch to a single bundle ASAP as Jianjun suggested. |
@jianjuns @antoninbas I agree that ideally we should enable ovs-vswitchd forwarding after installing all flows including Pod, Route, NP flows, especially now established connections can continue work and new connections won't be mishandled in default fashion with flow-restore-wait set.
|
BTW, now I am dealing an issue in #658 that kind cluster doesn't work with |
@tnqn the easiest thing to do for now may be to avoid using |
Agreed we should fix the traffic drop issue first.
|
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like #625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like #625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
This patch starts ovs-vswitchd with flow-restore-wait set to true and removes the config after restoring necessary flows for the following reasons: 1. It prevents packets from being mishandled by ovs-vswitchd in its default fashion, which could affect existing connections' conntrack state and cause issues like antrea-io#625. 2. It prevents ovs-vswitchd from flushing or expiring previously set datapath flows, so existing connections can achieve 0 downtime during OVS restart. As a result, we remove the config here after restoring necessary flows.
Thanks @alex-vmw for reporting and helping troubleshoot this issue. Quoting @alex-vmw's report and revised a few details.
Describe the bug
When rolling update antrea-agent daemonset, it happened several times (almost every time on one cluster) that some antrea-agent pods stuck in termination state for around 15 minutes, then it recovered automatically. Some observation and analysis as below:
Another issue was also hit, where apiservers became extremely slow to respond for about 15-16 Minutes because apiservers were NOT able to connect to the metrics-server service with errors 503 and 504. Here is what we discovered:
master001 - 18:33:35-18:50:12 - 16 min 37 sec
master002 - 18:34:54-18:50:14 - 15 min 20 sec
master003 - 18:33:45-18:50:14 - 16 min 29 sec
Note that:
To Reproduce
Rolling update antrea-agent daemonset.
Expected
Rolling update antrea-agent shouldn't take 15 minutes on some nodes and shouldn't affect the connection between apiserver and metrics-server.
Actual behavior
As described above.
Versions:
Please provide the following information:
0.5.1
kubectl version
). If your Kubernetes components have different versions, please provide the version for all of them.1.15.4
docker
uname -r
).Unknown yet
modinfo openvswitch
for the Kubernetes Nodes.Additional context
The text was updated successfully, but these errors were encountered: