Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback wrapper scaler #1872

Closed
mishamo opened this issue Jun 9, 2021 · 32 comments · Fixed by #1910 or kedacore/keda-docs#474
Closed

Fallback wrapper scaler #1872

mishamo opened this issue Jun 9, 2021 · 32 comments · Fixed by #1910 or kedacore/keda-docs#474

Comments

@mishamo
Copy link
Contributor

mishamo commented Jun 9, 2021

Proposal

A metrics source can become unavailable. This causes it to be ignored if there are multiple scalers present. The proposal is to add a new scaler that would delegate to a wrapped instance of another scaler. If that delegate fails, it would provide a fallback metrics of fallbackReplicas/currentReplicas (a little like the cron scaler).

Scaler Source

A delegate scaler, falling back to a configured replica value

Scaling Mechanics

A proposed configuration could look something like the following:

apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
  name: feed-ingress
  namespace: feed-main-elb
spec:
  scaleTargetRef:
    deploymentName: feed-ingress
  minReplicaCount: 2
  maxReplicaCount: 10
  triggers:
    - type: fallback
      metadata:
        fallbackReplicas: 8
        delegate:
          type: prometheus
          metadata:
            serverAddress: http://prometheus.hpa-test.svc:9090/
            metricName: feed_ingress_ingress_requests
            threshold: '100'
            query: sum(rate(feed_ingress_ingress_requests{namespace="feed-main-elb"}[2m]))

The catch with this is the current type of TriggerMetadata is map[string]string. Because of this, either that type has to be changed to handle arbitrary object values, or the config has to be flattened. I think the former is preferred to avoid awkward config for users.

Authentication Source

Delegate

Anything else?

This is a continuation of some thoughts from #965

We are happy to contribute this, but would like some discussion about the approach

@tomkerkhove
Copy link
Member

So to summarize, if it can't get metrics for Prometheus you want it to fallback to 8 instances, is that correct?

@mishamo
Copy link
Contributor Author

mishamo commented Jun 9, 2021

Just to refine that; we would like this trigger/scaler to present as "8 instances" from the HPA. Other scalers would continue to work as normal and HPA would just take the maximum of all of them.

@tomkerkhove
Copy link
Member

tomkerkhove commented Jun 9, 2021

In that case, have you considered to change the configuration the other way around?

Something like this:

triggers:
- type: prometheus
  metadata:
    # Required
    serverAddress: http://<prometheus-host>:9090
    metricName: http_requests_total
    query: sum(rate(http_requests_total{deployment="my-deployment"}[2m])) # Note: query must return a vector/scalar single element response
    threshold: '100'
  fallback:
    # Optional
    replicas: 8  

(forget about the implementation for now)

@mishamo
Copy link
Contributor Author

mishamo commented Jun 9, 2021

I think that looks good, but maybe we should even flatten to fallbackReplicas as a single property.

Without forgetting about implementation, I think this also simplifies the approach. A new scaler that acts with a delegate probably still makes sense.

@tomkerkhove
Copy link
Member

tomkerkhove commented Jun 9, 2021

The reason why I'm not keeping it flat is so that we can add more things alter on if we want to. For example, send a notification or something. (thinking out loud)

@mishamo
Copy link
Contributor Author

mishamo commented Jun 9, 2021

That makes sense. Implementation-wise we've realised it might be a bit difficult to define the target (GetMetricsSpecForScaling) when the delegate GetMetrics call fails. The cron scaler works off the premise that the underlying metric will always be the average, but we can't assume that for an arbitrary delegate. 🤔

@coderanger
Copy link
Contributor

This feels like a very overcomplicated solution? We would we not just let the current scale stay when things are broken? That seems like a safer behavior. Having a pinned fallback like this could get very flappy during partial failure scenarios.

@mishamo
Copy link
Contributor Author

mishamo commented Jun 9, 2021

I think you would want more safety when your metrics sources are not available. If your application is scaled to its minimum replicas, and is suddenly hit with load while your metrics sources are down, it could easily be overloaded. If you have a safe fallback (it could be your maximum replicas to be extra safe), then you will always be able to handle that load.

@coderanger
Copy link
Contributor

My worry is if Prom (or anything else) is overloaded and timing out on ~50% of requests, you could suddenly have a situation where your replica count is rapidly shifting between whatever it's supposed to be at and the fallback number. This would likely put a lot of stress on things. I guess that's fine as long as recommend people set their HPA cooldown parameters when using this feature? That way it would stay at the higher one until things stabilize (or until the cooldown expires).

@mishamo
Copy link
Contributor Author

mishamo commented Jun 9, 2021

Yeah, that situation makes sense. But I think the fact that by default HPA scales up immediately when it needs to and has a default stabilisation window of 5 minutes would alleviate this. Yes, it might be worth documenting that aggressive scale-down configurations should be avoided when using this feature.

@tomkerkhove
Copy link
Member

This feels like a very overcomplicated solution? We would we not just let the current scale stay when things are broken? That seems like a safer behavior. Having a pinned fallback like this could get very flappy during partial failure scenarios.

I do get your concern but like @mishamo is saying I can see value in specifying a fallback. However, it should only kick in after x cycles of checking, not for every failure imo, and come with the required bells and whistles around it such as putting the SO in degraded state or so and emit kubernetes events.

@mishamo
Copy link
Contributor Author

mishamo commented Jun 10, 2021

Yes. I was thinking about this myself and came to the same conclusion; we would only want this to happen after a few (configurable) consecutive failed attempts. This likely complicates the implementation somewhat but is probably the right move.

@zroubalik
Copy link
Member

zroubalik commented Jun 10, 2021

Why not both? :)

In the fallback section, users could specify either the exact number of replicas or an option to stay on the current replicas number.

I agree with Tom's suggestion on keeping the configuration simple (proposed in this comment: #1872 (comment))

@mishamo
Copy link
Contributor Author

mishamo commented Jun 10, 2021

In the fallback section, users could specify either the exact number of replicas or an option to stay on the current replicas number.

Isn't this the existing behaviour? If so, that would be the same as not including the fallback section at all.

@tomkerkhove
Copy link
Member

It will stay on the current replicas, but not one you can define

@zroubalik
Copy link
Member

In the fallback section, users could specify either the exact number of replicas or an option to stay on the current replicas number.

Isn't this the existing behaviour? If so, that would be the same as not including the fallback section at all.

If there's a problem in connecting the scaler isActive() returns false and the replicas will by scaled to minReplicaCount.
And this is the problem we are trying to solve here, right? So I thought that as a fallback, there could be a specific number of replicas (eg my Prometheus is down, let's not scale to min, but scale for example to 3 replicas, to handle average load) or the second option could be to stay on the current replicas.

@zroubalik
Copy link
Member

It will stay on the current replicas, but not one you can define

why?

@mishamo
Copy link
Contributor Author

mishamo commented Jun 10, 2021

There's two separate issues:

  • If there's an issue with isActive, then things stay the same unless your minReplicas is set to 0 as per the logic here: . This only really applies if all scalers are inactive/broken due to this loop
    for i, scaler := range scalers {
    . I have code to potentially address this issue here: mishamo@8aca785
  • If you have multiple triggers/sources of metrics (e.g. two separate prometheus instances) and only one is failing, then that failing one will be completely ignored. This issue is to address that source; if it is down, then the HPA for that source should instruct k8s to scale to a fallback number of replicas. More generally I think this concerns the call to GetMetrics rather than IsActive and wrapping that in some fallback logic.

@tomkerkhove
Copy link
Member

It will stay on the current replicas, but not one you can define

why?

That is how it is today, no? You cannot define what to scale to on failure

@mishamo
Copy link
Contributor Author

mishamo commented Jun 16, 2021

Ok. I think we have an approach that should work and feels acceptable to us. It is as follows:

  • We add an optional fallback config at the same level as triggers:
...
fallback:
  failureThreshold: 3
  fallbackReplicas: 10
triggers:
...
  • We also add an array of nameOfMetric -> FallbackStatus to the ScaledObject Status. Something like:
 status:
   ...
   fallback:
     - some_metric_name:
         numberOfFailures: 2
         status: Pending
  • We house the fallback logic in provider.go, which, for each scaler, would call GetMetrics on it, update the fallback status on the relevant metric name in the status map and return a normalised (fallbackReplicas * targetMetricFromSpec) value as the metric if fallback has been triggered.
  • If fallback is defined, it would apply to all triggers within a ScaledObject.
  • This feature is only compatible with metrics whose spec type is AverageValue (all current scalers except cpu/memory, and possibly the external ones).

@tomkerkhove
Copy link
Member

Ok. I think we have an approach that should work and feels acceptable to us. It is as follows:

  • We add an optional fallback config at the same level as triggers:
...
fallback:
  failureThreshold: 3
  fallbackReplicas: 10
triggers:
...

I would propose this instead: (nit)

...
fallback:
  failureThreshold: 3
-  fallbackReplicas: 10
+  replicas: 10
triggers:
...

In terms of the status, does it have to fall in the fallback category? Imagine I'm not using it, I would still be interested to see the # of failures in the status field.

@mishamo
Copy link
Contributor Author

mishamo commented Jun 17, 2021

No issue with the change of fallbackReplicas -> replicas.

   fallback:
     - some_metric_name:
         numberOfFailures: 2
         status: Pending

The only limitation with moving the map out of fallback, is the status field we added. The current idea for this is that there would be three states of Happy, Pending and Failing, where Pending would be in place until the failureThreshold was reached. If the user isn't using the fallback feature, then no failureThreshold would be defined. I suppose it would be possible to either remove the Pending state completely or simply go straight to Failing when no fallback config is defined. Reasonably happy with either approach. Also if it doesn't sit under fallback, it would still have to sit under some name, as the data structure is a map; possibly scalers?

@tomkerkhove
Copy link
Member

What if we can it health because it basically comes down to that?

What do you think @zroubalik ?

@zroubalik
Copy link
Member

zroubalik commented Jun 21, 2021

Sorry for the long response time, I was overhelmed with other stuff.

Pardon my ingorance, I am still trying to understand what is the specific ask, what is the scenario you'd like to cover? You want to define the fallback per ScaledObject or per Scaler(trigger)?

Let's say we have these triggers in a ScaledObject with some metrics:

  scaler1: 10
  scaler2: 5 
  scaler3: failure 

Possible use cases:

  1. You can have one trigger or mutliple triggers and if one of the external sources are failing or are unavailable, then you'd like to set some fallback metric number for that particular scaler?
    In our case mentioned above the HPA will calculate the desired replicas count based on the numbers from each scaler (as it would do in a normal situation) and it will use a fallback number for the failing scaler3, eg 6. So in this case the greatest number - scaler1 (10 vs 5 vs 6) will win.

  2. Or do you want to convice HPA to use fallback as a desired number for the whole ScaledObject if there is a problem in any scaler and don't mind the other functional scalers? So if fallback is 6, then this will be used by HPA and not the greatest number - scaler1?

  3. Or something else?

@mishamo
Copy link
Contributor Author

mishamo commented Jun 21, 2021

Hi @zroubalik, the original idea was to define the fallback per Scaler (trigger), but as this was more complicated to implement, we decided to shift the fallback definition to the ScaledObject (which is still a fallback per scaler, it's just that the same fallback would be used for all scalers on that object).

The use case described in 1. looks correct to me.

@zroubalik
Copy link
Member

@mishamo okay. Sounds good

@zroubalik
Copy link
Member

How do you plan to count the number of failures? For example if you have 3 scalers and first one starts to fail and then another scaler starts to fail as well?

@mishamo
Copy link
Contributor Author

mishamo commented Jun 21, 2021

The plan is to maintain a map of external_metric_name -> fallback_status in the Status field in k8s. Then the scalers can be looked up by their relative metric names and the number of failures can be read/bumped up/reset etc

@zroubalik
Copy link
Member

Okay, so you don't plan to sum the failures from all scalers, ack.

Anything else you'd like to discuss? The proposal looks good for me 👍

@mishamo
Copy link
Contributor Author

mishamo commented Jun 21, 2021

Nope, I think we're all good for now. We have some code that should work on https://github.com/mishamo/keda/tree/fallback-scaler, but there's quite a lot of testing to be done before we create a PR. Just to clarify there's 2 pieces of work:

When we come to it, should we PR these as separate PRs or is it OK to PR as one?

EDIT: As I wrote that, it made me think that maybe the defaultReplicas work is redundant and we can apply the same fallback logic to IsActive. We'll investigate this.

@zroubalik