Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readiness probe for Stowaway #221

Closed
Schille opened this issue Oct 25, 2022 · 1 comment · Fixed by #235
Closed

Improve readiness probe for Stowaway #221

Schille opened this issue Oct 25, 2022 · 1 comment · Fixed by #235
Labels
enhancement 🎉 New feature or request

Comments

@Schille
Copy link
Collaborator

Schille commented Oct 25, 2022

What is the new feature about?

Currently, Stowaway enters the ready state almost immediately, as there is no readiness probe implemented at all. Yet, the Operator depends on this state in order to perform activities, such as retrieving the peer secrets and storing them in Kubernetes secrets. There is a race condition: Kubernetes declares the Stowaway Pod ready, but the peer secrets are not yet generated. This leads to errors in the Operator log.
Why is this no big deal at the moment? - since the Operator retries to extract the peer's secret, it is going to work at some point. However, this can be easily improved by introducing a meaningful readiness probe for the Stowaway Pod.

Why would such a feature be important to you?

Prevent error logs in the Gefyra Operator from showing up. They are misleading and unnecessary.

Anything else we need to know?

The Stowaway Kubernetes Deployment is created here:

def create_stowaway_deployment() -> k8s.client.V1Deployment:
container = k8s.client.V1Container(
name="stowaway",
image=f"{configuration.STOWAWAY_IMAGE}:{configuration.STOWAWAY_TAG}",
image_pull_policy=configuration.STOWAWAY_IMAGE_PULLPOLICY,
# Wireguard default port 51820 will be mapped by the nodeport service
ports=[k8s.client.V1ContainerPort(container_port=51820, protocol="UDP")],
resources=k8s.client.V1ResourceRequirements(
requests={"cpu": "0.1", "memory": "100Mi"},
limits={"cpu": "0.75", "memory": "500Mi"},
),
env=[
k8s.client.V1EnvVar(name="PEERS", value="1"),
k8s.client.V1EnvVar(
name="SERVERPORT", value=str(configuration.WIREGUARD_EXT_PORT)
),
k8s.client.V1EnvVar(name="PUID", value=configuration.STOWAWAY_PUID),
k8s.client.V1EnvVar(name="PGID", value=configuration.STOWAWAY_PGID),
k8s.client.V1EnvVar(name="PEERDNS", value=configuration.STOWAWAY_PEER_DNS),
k8s.client.V1EnvVar(
name="INTERNAL_SUBNET", value=configuration.STOWAWAY_INTERNAL_SUBNET
),
k8s.client.V1EnvVar(
name="SERVER_ALLOWEDIPS_PEER_1", value=configuration.GEFYRA_PEER_SUBNET
),
],
security_context=k8s.client.V1SecurityContext(
privileged=True,
capabilities=k8s.client.V1Capabilities(add=["NET_ADMIN", "SYS_MODULE"]),
),
volume_mounts=[
k8s.client.V1VolumeMount(
name="proxyroutes", mount_path="/stowaway/proxyroutes"
),
k8s.client.V1VolumeMount(name="host-libs", mount_path="/lib/modules"),
],
)
template = k8s.client.V1PodTemplateSpec(
metadata=k8s.client.V1ObjectMeta(labels={"app": "stowaway"}),
spec=k8s.client.V1PodSpec(
service_account_name="gefyra-stowaway",
containers=[container],
volumes=[
k8s.client.V1Volume(
name="proxyroutes",
config_map=k8s.client.V1ConfigMapVolumeSource(
name=configuration.STOWAWAY_PROXYROUTE_CONFIGMAPNAME
),
),
k8s.client.V1Volume(
name="host-libs",
host_path=k8s.client.V1HostPathVolumeSource(
path="/lib/modules", type="Directory"
),
),
],
),
)
spec = k8s.client.V1DeploymentSpec(
replicas=1,
template=template,
selector={"matchLabels": {"app": "stowaway"}},
)
deployment = k8s.client.V1Deployment(
api_version="apps/v1",
kind="Deployment",
metadata=k8s.client.V1ObjectMeta(
name="gefyra-stowaway", namespace=configuration.NAMESPACE
),
spec=spec,
)
return deployment

We could implement a readiness probe like so:

[...]
exec:
  command:
     - cat
     - /config/peer1/peer1.conf
[...]

If that file exists: wonderful, Stowaway is ready and Operator can extract the file.
If Stowaway is still booting up, this file does not yet exist and Stowaway would not be ready. In this case, Operator will wait some time longer.

async def check_stowaway_ready(stowaway_deployment: k8s.client.V1Deployment):
global STOWAWAY_POD
app = k8s.client.AppsV1Api()
core_v1_api = k8s.client.CoreV1Api()
i = 0
dep = app.read_namespaced_deployment(
name=stowaway_deployment.metadata.name, namespace=configuration.NAMESPACE
)
# a primitive timeout of configuration.STOWAWAY_STARTUP_TIMEOUT in seconds
while i <= configuration.STOWAWAY_STARTUP_TIMEOUT:
s = dep.status
if (
s.updated_replicas == dep.spec.replicas
and s.replicas == dep.spec.replicas # noqa
and s.available_replicas == dep.spec.replicas # noqa
and s.observed_generation >= dep.metadata.generation # noqa
):
stowaway_pod = core_v1_api.list_namespaced_pod(
configuration.NAMESPACE, label_selector="app=stowaway"
)
if len(stowaway_pod.items) != 1:
logger.warning(
f"Stowaway not yet ready, Pods: {len(stowaway_pod.items)} which is != 1"
)
await sleep(1)
continue
STOWAWAY_POD = stowaway_pod.items[0].metadata.name
logger.info(f"Stowaway ready: {STOWAWAY_POD}")
return True
else:
logger.info("Waiting for Stowaway to become ready")
await sleep(1)
i += 1
dep = app.read_namespaced_deployment(
name=stowaway_deployment.metadata.name, namespace=configuration.NAMESPACE
)
# reached this in an error case a) timeout (build took too long) or b) build could not be successfully executed
logger.error("Stowaway error: Stowaway did not become ready")
return False

@Schille Schille added the enhancement 🎉 New feature or request label Oct 25, 2022
@SteinRobert
Copy link
Contributor

Great ideas! Let's build it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🎉 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants