Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA Kubernetes integration #56

Closed
belemaire opened this issue Jul 22, 2020 · 2 comments
Closed

KEDA Kubernetes integration #56

belemaire opened this issue Jul 22, 2020 · 2 comments

Comments

@belemaire
Copy link
Member

To get maximum scale at minimum cost, idea is to use Keda K8 autoscaler for LiveBundle.

LiveBundle GitHub application (composed of three services : producer/consumerr/queue) needs a node with a good number of VCPUs and RAM, as it is in charge to pull the PR from the repository and run react-native bundle multiple times on it. Here I am specifically talking about the consumer service. Indeed, the producer and queue services are very lightweight and we don't need to scale them in any way.

The "problem" here regarding the consumer, is that if we allocate a single node for it, the node will just stand idle most of the time, waiting for PRs to crunch. On the other hand, because we don't want to process multiple PRs in // on the same node (to keep processing time as deterministic as possible), if for some reason one or more PRs are opened while a PR is being processed, then they will be queued for processing by the consumer node, and the client will potentially wait for a while to get a QRCode back (also breaking the guarantee on deterministic processing time). What can be done to mitigate the latter, is to have a certain number of consumer replicas nodes (5 for example), running. But this makes the former problem even worse as we now have to pay the cost of 5 running nodes running idle most of the time. For reference a 4VCPU/8GB ram node in a popular public cloud provider, cost around 40$/month to run 24/24 7/7. This might just be too expensive for a lot of potential public users of LiveBundle (and this is only the cost for one, multiply by 5 if you want 5 of them in case you have high PR throughput).

Unfortunately K8 out of the box only an horizontal pod autoscaler, that can scale the number of pods (or nodes if making a quick shortcut) based on some metrics like CPU utilization for example. This is not really useful in our case.

Keda would solve both these problems, as it is an event driver autocaler, that can scale the number of pods based on some event driven metrics, such as the numbers of messages in a queue for example. It can also scale the number of pods all the way back down to zero. What this would allow, from an high level perspective, is to keep the number of consumer nodes to zero (no cost) and whenever a message is sent to the queue (a PR to process), Keda would spin up a node to do the processing, and scale back down when done. If 5 PRs are opened simultenaously, Keda would spin up 5 different nodes to process the message (based on configuration). So in theory, this means that costly consumer nodes would only be run whenever there is some PR to process, and would disappear afterward. Given that the billing of nodes in public cloud providers is by the minute, this would drastically scale cost downs (addressing first problem), while guaranteeing consistent PR prrocess time (addressing second problem).

@belemaire
Copy link
Member Author

Unfortunately, after much experimentation with Keda ScaledDeployments and ScaledJobs, it is not in a state that could fulfill our needs at this point.

ScaledDeployments will not really help us here. What we need is ScaledJobs.
But this feature currently suffers big issues that make it unusable as of now, boiling it down to the two following issues that I experienced myself with v1.5 during my exploration :
kedacore/keda#801
kedacore/keda#829

Seems like Keda team is now focusing solely on v2 release and won't really address issues in v1.5.
Hopefully v2 should address all these issues and make it useable per our needs, just need to be patient a little bit ;)

@belemaire
Copy link
Member Author

Closing this one as we moved away from K8 using new architecture solely based on cloud storage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant