Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Implement the scaledjob controler for v2 #945

Merged
merged 5 commits into from
Jul 27, 2020

Conversation

TsuyoshiUshio
Copy link
Contributor

@TsuyoshiUshio TsuyoshiUshio commented Jul 22, 2020

I open this PR for start conversation. This is the initial implementation for the v2 scaled job controller.
This version is for implementing the v1 logic with v2 structure/design(a.k.a. refactoring). Once this is accepted, I'd like to send
other PRs for bug fixes and new features. I tested with Rabbit MQ scaler. Any feedback is welcome.

I don't write test now since the design review might comes first, then I'll write the test. I also want to check if this change doesn't broke anything.

Change introduced

  • Implement the scale job controller and finalizer with V1 based logic
  • scaleHandler L224 calls Close() method and it cause a problem of missing connection
  • ReconcileScaledJob update
  • Add Local debugging feature for the rabbit-mq scaler

I'd like to discuss when should we call the Close() method. I need to understand life-cycle and behavior more.
If you guys are ok. I'd like to add a sample for getting started for the scaled job for rabbit-mq with documentation.

Checklist

Fixes #
#789

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start! The scaling logic needs to be improved though, as mentioned for example in these issues (and maybe in others) #636 #801 #525 kedacore/keda-docs#187 #529

The issue described in those is caused by the similar problem. So don't be afraid to modify the logic a lot, modify the ScaleJob type if needed etc. We need to change a way how KEDA resolves number of created jobs during each pollingInterval.

pkg/controller/scaledjob/scaledjob_controller.go Outdated Show resolved Hide resolved
pkg/controller/scaledjob/scaledjob_controller.go Outdated Show resolved Hide resolved
pkg/controller/scaledjob/scaledjob_controller.go Outdated Show resolved Hide resolved
deploy/crds/keda.sh_scaledjobs_crd.yaml Outdated Show resolved Hide resolved
@TsuyoshiUshio
Copy link
Contributor Author

TsuyoshiUshio commented Jul 23, 2020

Hi @zroubalik , @ahmelsayed
I'd like to ask you two things. I'll introduce a change of the scale logic at least to cover #636 and related. (However, on this PR, includes the issue that you mentioned only) then I'll create subsequent PRs.

The PR that is mentioned the issue already has a good solution, however, the naming looks not good for avoiding breaking change for v1. So if there is a case, the metric is different between ScaledObject and ScaledJobs, I'd like to separate the metric interface as the the PR suggested. Since the V2 can introduce breaking change from V1, I'd like to introduce the change for the scaler interface. Do you think it is a good idea? or Any suggestion?

I'm not sure if we need the different metrics between ScaledObject and ScaledJob though.

I'm asking the original author why he introduce the interface change. #639 (comment)

Until I've got the answer from the author, I'll keep the interface as-is and try solving the issue.

type Scaler interface {

	GetMetrics(ctx context.Context, metricName string, metricSelector labels.Selector) ([]external_metrics.ExternalMetricValue, error)

	GetMetricSpecForScaleObject() []v2beta2.MetricSpec

	GetMetricSpecForScaleJob() []v2beta2.MetricSpec

	IsActive(ctx context.Context) (bool, error)

	// Close any resources that need disposing when scaler is no longer used or destroyed
	Close() error
}

@TsuyoshiUshio
Copy link
Contributor Author

I'm learning to keep the current interface. Since I can't imagine any case that we need to use different scale metrics between deployment and jobs. If there is specific case is there, please let me know.

@zroubalik
Copy link
Member

I'm learning to keep the current interface. Since I can't imagine any case that we need to use different scale metrics between deployment and jobs. If there is specific case is there, please let me know.

@TsuyoshiUshio I can't imagine any usecase for adding a separate metrics for Jobs either . So go ahead as you said with the current interface and if there's gonna be a reason to add it, we can do that in a subsequent PRs.

@TsuyoshiUshio
Copy link
Contributor Author

Hi @zroubalik
Thanks! I'll do that.

TsuyoshiUshio and others added 3 commits July 25, 2020 15:35
… queue

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>
@TsuyoshiUshio
Copy link
Contributor Author

TsuyoshiUshio commented Jul 25, 2020

Hi @ahmelsayed @zroubalik
#636 #525
Might be solved since I implement the logic that only count visible queue if it is queue length is less than 32.
Also, I fix the all requests except for the #801 kedacore/keda-docs#187 #529 . I'm start investigate it one by one, however, merge this PR might be good idea since we can start debugging the new scalejob logic and can work parallel.

As you adoviced, I'll create a separate PR for changing rabbit mq for enabling local testing.

I also create a sample project that we can send/receive with Dockerfile and deploy ScaledJob for Azure Queue Stroage that is written by go.
https://github.com/TsuyoshiUshio/sample-go-storage-queue

@TsuyoshiUshio
Copy link
Contributor Author

@zroubalik
If you are ok, could you merge this? I looked at the issues that you showed, the fix will be different, and sometimes, we need to discuss about specific topic on each PRs. I keep on fix these issues for the V2 release.

@ahmelsayed ahmelsayed merged commit b203f38 into kedacore:v2 Jul 27, 2020
zroubalik added a commit that referenced this pull request Aug 6, 2020
* [v2] Implement the scaledjob controler for v2

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Scale Logic of the Azure Storage Queue for not scaling with invisible queue

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Update pkg/controller/scaledjob/scaledjob_controller.go

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* rollback the change for rabbit mq and fix the protocol on crd

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

* Add protocol for the scalejobs_crd

Signed-off-by: Tsuyoshi Ushio <ushio@simplearchitect.com>

Co-authored-by: Zbynek Roubalik <726523+zroubalik@users.noreply.github.com>
SpiritZhou pushed a commit to SpiritZhou/keda that referenced this pull request Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants