-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The gcsfuse sidecar container does not exit automatically when all the containers have exited and the Pod restartPolicy is Never #23
Comments
The issue is fixed by the commit 8a1a860. The fix will be included in the next release. |
This is fixed in the CSI driver v0.1.3 release. |
@songjiaxun -- I'm a bit confused by this. It seems that in the e.g. Airflow it still adds a full grace termination period to the end of every KubernetesPodOperator run. My understanding from Kubernetes (based on this):
The Termination Grace Period is the period of time that the processes are allowed to exit gracefully -- if they don't, they're terminated abruptly. If they finish gracefully before then, that's great, too. My understanding of the gcsfuse sidecar uses the same parameter but with very different behaviour:
The Termination Grace is the period of time to wait (needed or not) just in case gcsfuse hasn't sync'd everything. It looks like GCS Fuse doesn't handle TERM, but does INT and does the correct behaviour (unmounts and exits). (See this and this) Based on the above, I think ideal behaviour would be:
Given the code here, I think all that is required is delivering a interrupt signal to all of the processes prior to doing the sleep. The process should exit normally when all children processes are done, I believe. |
Thanks, @mescanne, for the question. So there are two scenarios actually. First scenario If the Pod is a Job Pod or the
As you can see, in this scenario, it does not respect the Second scenario For other workloads, the Pods are supposed to run forever. In this case, if the Pod crashes, it follows the doc Kubernetes best practices: terminating with grace. Specifically,
I hope the explanation is helpful. |
Thanks @songjiaxun ! Waiting 30 seconds at the end of a Job or Pod (with RestartPolicy Never) is quite a big painpoint as it imposes a severe delay for any Airflow job/task running. GKE 1.28 availability One note -- the gcs-fuse process itself doesn't seem to honour SIGTERM correctly, only SIGINT, and it will need to be modified as well. If it's a few months or less then it's better to wait, but otherwise I have a proposal for improving the 30 second time for the first scenario with the exit file. Proposal
Why?
Code
|
Hi @mescanne , I created this commit to make the sidecar container respect the Pod terminationGracePeriod in the first scenario. In your use case, you will need to specify a small terminationGracePeriodSeconds value on your Pod, e.g. 5, or even 0. Otherwise, the default value is 30 sec. I hope this improvement will be helpful. This will be included in the next release. Also, the sidecar container feature is going to be promoted to beta in Kubernetes 1.29, which will be the k8s version with the sidecar container feature enabled on GKE. |
Thanks, that is very helpful. In this case we can decrease the graceful termination. Thanks! |
In some use cases, where the Pod restartPolicy is Never, and the users expect the sidecar container will exit automatically when all the containers have exited.
For example, in the Airflow use cases, the Task will never complete and the DAG be blocked if the sidecar container does not exit automatically.
Note that we do support sidecar container auto-exit in Jobs -- the sidecar container will exit automatically when all the containers have exited in a Job Pod.
The text was updated successfully, but these errors were encountered: