Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs don't scale properly based on target value/replica count #801

Closed
matt1484 opened this issue Apr 30, 2020 · 6 comments
Closed

Jobs don't scale properly based on target value/replica count #801

matt1484 opened this issue Apr 30, 2020 · 6 comments
Labels
bug Something isn't working
Milestone

Comments

@matt1484
Copy link

matt1484 commented Apr 30, 2020

So Jobs don't appear to scale at all according to maxReplicaCount and TargetAverageValue

I have this configuration:

minReplicaCount: 0
maxReplicaCount: 3
triggers:
- type: redis
  metadata:
    listLength: '1'

Which produced these logs (with irrelevant fields removed):

{"ts":1588266503.6063826,"logger":"scalehandler","msg":"Scaler max value","MaxValue":1}
{"ts":1588266503.620013,"logger":"scalehandler","msg":"QueueLength Metric value","queueLength":10}
{"ts":1588266503.6202614,"logger":"scalehandler","msg":"Scaler is active"}
{"ts":1588266503.6204321,"logger":"scalehandler","msg":"Scaling Jobs","Number of running Jobs ":0}
{"ts":1588266503.6204605,"logger":"scalehandler","msg":"Scaling Jobs"}
{"ts":1588266503.646084,"logger":"scalehandler","msg":"Creating jobs","Effective number of max jobs":1}
{"ts":1588266503.6463103,"logger":"scalehandler","msg":"Creating jobs","Number of jobs":1}
{"ts":1588266503.6463375,"logger":"scalehandler","msg":"Created jobs","Number of jobs":1}

Expected Behavior

Since the QueueLength is 10 and maxReplicaCount is 3, one would expect that 3 jobs would be created.

Actual Behavior

Only 1 job is created (since it thinks that MaxValue is 1)

Since there isn't a lot of docs on jobs, I did some digging in code which led me to these lines of code

var metricValue int64
for _, metric := range metricSpecs {
metricValue, _ = metric.External.TargetAverageValue.AsInt64()
maxValue += metricValue
}
scalerLogger.Info("Scaler max value", "MaxValue", maxValue)

which are used to compute maxValue which is in turn used here

h.scaleJobs(scaledObject, isScaledObjectActive, queueLength, maxValue)

which leads to

func (h *ScaleHandler) scaleJobs(scaledObject *kedav1alpha1.ScaledObject, isActive bool, scaleTo int64, maxScale int64) {
runningJobCount := h.getRunningJobCount(scaledObject, maxScale)
h.logger.Info("Scaling Jobs", "Number of running Jobs ", runningJobCount)
var effectiveMaxScale int64
effectiveMaxScale = maxScale - runningJobCount
if effectiveMaxScale < 0 {
effectiveMaxScale = 0
}

and effectiveMaxScale is used to figure out how many jobs to create.

Now lets take a look at how this actually get computed

effectiveMaxScale = maxScale - runningJobCount

In this scenario I don't have any jobs running (as per the logs), so effectiveMaxScale is essentially:

effectiveMaxScale = maxScale

since runningJobCount is 0

given that the above call is formatted as

scaleJobs(scaledObject, isScaledObjectActive, queueLength, maxValue) 

effectiveMaxScale is just maxValue which we know to be the sum of all the metrics' TargetAverageValue. Since I only have one scaler, the number of jobs created in this scenario will ALWAYS be set to the provided listLength which in this case is just 1.

Now that would be no issue except the docs say this about TargetAverageValue:

TargetAverageValue: the value of the metric for which we require one pod to handle. e.g. 
if we are have a scaler based on the length of a message queue, and we specificy 10 for TargetAverageValue, we are saying that each pod will handle 10 messages. 
So if the length of the queue becomes 30, we expect that we have 3 pods in our cluster.

which means that if I have a queueLength of 10 (like the logs indicate), and a TargetAverageValue of 1, I would expect 3 jobs to be run (since my maxReplicaCount is 3) yet only 1 does.

In order to immediately combat this I attempted to bump up my TargetAverageValue to 3 and tested with a queue length of 1, but I ended up with this:

{"ts":1588275347.789624,"msg":"Active trigger","isTriggerActive":true}
{"ts":1588275347.7896824,"msg":"Scaler max value","MaxValue":3}
{"ts":1588275347.7989686,"msg":"QueueLength Metric value","queueLength":1}
{"ts":1588275347.7993653,"msg":"Scaler is active",}
{"ts":1588275347.8002813,"msg":"Scaling Jobs","Number of running Jobs ":0}
{"ts":1588275347.800373,"msg":"Scaling Jobs",}
{"ts":1588275347.8305228,"msg":"Creating jobs","Effective number of max jobs":3}
{"ts":1588275347.8305883,"msg":"Creating jobs","Number of jobs":1}
{"ts":1588275347.848522,"msg":"Created jobs","Number of jobs":1}
{"ts":1588275352.8490992,"msg":"Scalers count","Count":1}
{"ts":1588275352.8663177,"msg":"Active trigger","isTriggerActive":true}
{"ts":1588275352.8665454,"msg":"Scaler max value","MaxValue":3}
{"ts":1588275352.8865478,"msg":"QueueLength Metric value","queueLength":1}
{"ts":1588275352.8866546,"msg":"Scaler is active",}
{"ts":1588275352.8888965,"msg":"Scaling Jobs","Number of running Jobs ":1}
{"ts":1588275352.8889666,"msg":"Scaling Jobs",}
{"ts":1588275352.9546463,"msg":"Creating jobs","Effective number of max jobs":2}
{"ts":1588275352.9547422,"msg":"Creating jobs","Number of jobs":1}
{"ts":1588275352.9817986,"msg":"Created jobs","Number of jobs":1}
{"ts":1588275357.9840724,"msg":"Scalers count","Count":1}
{"ts":1588275357.9937315,"msg":"Active trigger","isTriggerActive":false}
{"ts":1588275357.9942245,"msg":"Scaler max value","MaxValue":3}
{"ts":1588275357.9999142,"msg":"QueueLength Metric value","queueLength":0}
{"ts":1588275358.002064,"msg":"Scaling Jobs","Number of running Jobs ":2}

Which instead incorrectly creates a job every time the trigger passes even though it shouldn.t need to. Essentially Jobs never scale appropriately, they end up being set to the TargetAverageValue and ignore maxReplicaCount

Realistically, you just need to calculate the maxValue as a quotient of maxReplicaScale and TargetAverageValue somehow.

Steps to Reproduce the Problem

Follow above configuration and push some things into the redis list

Specifications

  • KEDA Version: v1.4.1
  • Platform & Version: MacOS 10.14.6
  • Kubernetes Version: v1.15.5
  • Scaler(s): redis (but realistically all of them)
@matt1484 matt1484 added the bug Something isn't working label Apr 30, 2020
@dangelm
Copy link

dangelm commented May 3, 2020

if i am not mistaken this is same issue that i have experienced:
#749

@dazigna
Copy link

dazigna commented May 14, 2020

Any progress here ? I am experiencing the same behavior.

@tomkerkhove
Copy link
Member

AFAIK we haven't made progress on this as we're working hard on 2.0. @zroubalik I presume this will be different in 2.0 now that they will be split, right?

@zroubalik
Copy link
Member

@tomkerkhove yes, but we should keep this issue open to track this particular problem, to be sure that it gets fixed in the v2 implementation.

@heitorrbarros
Copy link

UP 👍

@zroubalik
Copy link
Member

Fixed in v2 in #951

SpiritZhou pushed a commit to SpiritZhou/keda that referenced this issue Jul 18, 2023
…e#801)

Co-authored-by: Tom Kerkhove <kerkhove.tom@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants