Improve readiness probe for Stowaway #221

Schille · 2022-10-25T11:42:12Z

What is the new feature about?

Currently, Stowaway enters the ready state almost immediately, as there is no readiness probe implemented at all. Yet, the Operator depends on this state in order to perform activities, such as retrieving the peer secrets and storing them in Kubernetes secrets. There is a race condition: Kubernetes declares the Stowaway Pod ready, but the peer secrets are not yet generated. This leads to errors in the Operator log.
Why is this no big deal at the moment? - since the Operator retries to extract the peer's secret, it is going to work at some point. However, this can be easily improved by introducing a meaningful readiness probe for the Stowaway Pod.

Why would such a feature be important to you?

Prevent error logs in the Gefyra Operator from showing up. They are misleading and unnecessary.

Anything else we need to know?

The Stowaway Kubernetes Deployment is created here:

gefyra/operator/gefyra/resources/deployments.py

Lines 16 to 92 in 81d5583

    
           def create_stowaway_deployment() -> k8s.client.V1Deployment: 
        
               container = k8s.client.V1Container( 
        
                   name="stowaway", 
        
                   image=f"{configuration.STOWAWAY_IMAGE}:{configuration.STOWAWAY_TAG}", 
        
                   image_pull_policy=configuration.STOWAWAY_IMAGE_PULLPOLICY, 
        
                   # Wireguard default port 51820 will be mapped by the nodeport service 
        
                   ports=[k8s.client.V1ContainerPort(container_port=51820, protocol="UDP")], 
        
                   resources=k8s.client.V1ResourceRequirements( 
        
                       requests={"cpu": "0.1", "memory": "100Mi"}, 
        
                       limits={"cpu": "0.75", "memory": "500Mi"}, 
        
                   ), 
        
                   env=[ 
        
                       k8s.client.V1EnvVar(name="PEERS", value="1"), 
        
                       k8s.client.V1EnvVar( 
        
                           name="SERVERPORT", value=str(configuration.WIREGUARD_EXT_PORT) 
        
                       ), 
        
                       k8s.client.V1EnvVar(name="PUID", value=configuration.STOWAWAY_PUID), 
        
                       k8s.client.V1EnvVar(name="PGID", value=configuration.STOWAWAY_PGID), 
        
                       k8s.client.V1EnvVar(name="PEERDNS", value=configuration.STOWAWAY_PEER_DNS), 
        
                       k8s.client.V1EnvVar( 
        
                           name="INTERNAL_SUBNET", value=configuration.STOWAWAY_INTERNAL_SUBNET 
        
                       ), 
        
                       k8s.client.V1EnvVar( 
        
                           name="SERVER_ALLOWEDIPS_PEER_1", value=configuration.GEFYRA_PEER_SUBNET 
        
                       ), 
        
                   ], 
        
                   security_context=k8s.client.V1SecurityContext( 
        
                       privileged=True, 
        
                       capabilities=k8s.client.V1Capabilities(add=["NET_ADMIN", "SYS_MODULE"]), 
        
                   ), 
        
                   volume_mounts=[ 
        
                       k8s.client.V1VolumeMount( 
        
                           name="proxyroutes", mount_path="/stowaway/proxyroutes" 
        
                       ), 
        
                       k8s.client.V1VolumeMount(name="host-libs", mount_path="/lib/modules"), 
        
                   ], 
        
               ) 
        
               template = k8s.client.V1PodTemplateSpec( 
        
                   metadata=k8s.client.V1ObjectMeta(labels={"app": "stowaway"}), 
        
                   spec=k8s.client.V1PodSpec( 
        
                       service_account_name="gefyra-stowaway", 
        
                       containers=[container], 
        
                       volumes=[ 
        
                           k8s.client.V1Volume( 
        
                               name="proxyroutes", 
        
                               config_map=k8s.client.V1ConfigMapVolumeSource( 
        
                                   name=configuration.STOWAWAY_PROXYROUTE_CONFIGMAPNAME 
        
                               ), 
        
                           ), 
        
                           k8s.client.V1Volume( 
        
                               name="host-libs", 
        
                               host_path=k8s.client.V1HostPathVolumeSource( 
        
                                   path="/lib/modules", type="Directory" 
        
                               ), 
        
                           ), 
        
                       ], 
        
                   ), 
        
               ) 
        
               spec = k8s.client.V1DeploymentSpec( 
        
                   replicas=1, 
        
                   template=template, 
        
                   selector={"matchLabels": {"app": "stowaway"}}, 
        
               ) 
        
               deployment = k8s.client.V1Deployment( 
        
                   api_version="apps/v1", 
        
                   kind="Deployment", 
        
                   metadata=k8s.client.V1ObjectMeta( 
        
                       name="gefyra-stowaway", namespace=configuration.NAMESPACE 
        
                   ), 
        
                   spec=spec, 
        
               ) 
        
               return deployment

We could implement a readiness probe like so:

[...]
exec:
  command:
     - cat
     - /config/peer1/peer1.conf
[...]

If that file exists: wonderful, Stowaway is ready and Operator can extract the file.
If Stowaway is still booting up, this file does not yet exist and Stowaway would not be ready. In this case, Operator will wait some time longer.

gefyra/operator/gefyra/stowaway.py

Lines 17 to 57 in 81d5583

    
           async def check_stowaway_ready(stowaway_deployment: k8s.client.V1Deployment): 
        
               global STOWAWAY_POD 
        
               app = k8s.client.AppsV1Api() 
        
               core_v1_api = k8s.client.CoreV1Api() 
        
               i = 0 
        
               dep = app.read_namespaced_deployment( 
        
                   name=stowaway_deployment.metadata.name, namespace=configuration.NAMESPACE 
        
               ) 
        
               # a primitive timeout of configuration.STOWAWAY_STARTUP_TIMEOUT in seconds 
        
               while i <= configuration.STOWAWAY_STARTUP_TIMEOUT: 
        
                   s = dep.status 
        
                   if ( 
        
                       s.updated_replicas == dep.spec.replicas 
        
                       and s.replicas == dep.spec.replicas  # noqa 
        
                       and s.available_replicas == dep.spec.replicas  # noqa 
        
                       and s.observed_generation >= dep.metadata.generation  # noqa 
        
                   ): 
        
                       stowaway_pod = core_v1_api.list_namespaced_pod( 
        
                           configuration.NAMESPACE, label_selector="app=stowaway" 
        
                       ) 
        
                       if len(stowaway_pod.items) != 1: 
        
                           logger.warning( 
        
                               f"Stowaway not yet ready, Pods: {len(stowaway_pod.items)} which is != 1" 
        
                           ) 
        
                           await sleep(1) 
        
                           continue 
        
                       STOWAWAY_POD = stowaway_pod.items[0].metadata.name 
        
                       logger.info(f"Stowaway ready: {STOWAWAY_POD}") 
        
                       return True 
        
                   else: 
        
                       logger.info("Waiting for Stowaway to become ready") 
        
                       await sleep(1) 
        
                   i += 1 
        
                   dep = app.read_namespaced_deployment( 
        
                       name=stowaway_deployment.metadata.name, namespace=configuration.NAMESPACE 
        
                   ) 
        
               # reached this in an error case a) timeout (build took too long) or b) build could not be successfully executed 
        
               logger.error("Stowaway error: Stowaway did not become ready") 
        
               return False

The text was updated successfully, but these errors were encountered:

SteinRobert · 2022-10-25T21:58:13Z

Great ideas! Let's build it!

chore(#221): add readiness probe for stowaway

Schille added the enhancement 🎉 New feature or request label Oct 25, 2022

SteinRobert added a commit that referenced this issue Nov 1, 2022

chore(#221): add readiness probe for stowaway

c6185bb

SteinRobert mentioned this issue Nov 1, 2022

chore(#221): add readiness probe for stowaway #235

Merged

SteinRobert added a commit that referenced this issue Nov 2, 2022

refactor(#221): lower initial_delay_seconds for stowaway readiness

21dabd1

SteinRobert added a commit that referenced this issue Nov 2, 2022

Merge pull request #235 from gefyrahq/#221

f0c9839

chore(#221): add readiness probe for stowaway

SteinRobert closed this as completed in #235 Nov 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve readiness probe for Stowaway #221

Improve readiness probe for Stowaway #221

Schille commented Oct 25, 2022

SteinRobert commented Oct 25, 2022

Improve readiness probe for Stowaway #221

Improve readiness probe for Stowaway #221

Comments

Schille commented Oct 25, 2022

What is the new feature about?

Why would such a feature be important to you?

Anything else we need to know?

SteinRobert commented Oct 25, 2022