Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WX-1675] Record cloud quota delay to GroupMetrics table #7501

Merged
merged 23 commits into from
Aug 23, 2024

Conversation

salonishah11
Copy link
Contributor

@salonishah11 salonishah11 commented Aug 20, 2024

Jira: https://broadworkbench.atlassian.net/browse/WX-1675

Description

  • Records quota exhaustion for a hog group to the new GroupMetrics table when a job runs into AwaitingCloudQuota state
  • This PR adds a new singleton Actor called GroupMetricsActor which takes in currently RecordGroupQuotaExhaustion message to record the group that ran into quota exhaustion into the new table

Example screenshot:
Screenshot 2024-08-20 at 4 27 49 PM

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New actor whose purpose is to receive messages:

  • record which group has run into quota exhaustion
  • get message asking about if this group is quota exhausted or not (this will be implemented in follow up PR)

* Checks if the job has run into any cloud quota exhaustion and records it to GroupMetrics table
* @param runStatus The run status
*/
def checkAndRecordQuotaExhaustion(runStatus: StandardAsyncRunState): Unit = ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend agnostic method and each backend can implement how to detect quota exhaustion. Currently only implemented in PipelinesApiAsyncBackendJobExecutionActor.scala

Comment on lines +228 to +230
lazy val groupMetricsActor: ActorRef =
context.actorOf(GroupMetricsActor.props(EngineServicesStore.engineDatabaseInterface))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the singleton GroupMetricsActor is created and wired through other actors to make its way through to backend actor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this was created, will delete it.

Copy link
Collaborator

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this branch would be a good mob-time demo. I'm curious to see it in action, but I also don't want to deal with migrating/resetting my local Cromwell DB.


Have you thought about where you'd like to place the conversion from "quota timestamp is recent" to Boolean "quota is exhausted"?

I was thinking of quotaExhaustionForGroupId in the component, but maybe that's too low level and it should wait for the next PR.

Comment on lines +21 to +22
case 0 => dataAccess.groupMetricsEntryIdsAutoInc += groupMetricsEntry
case _ => assertUpdateCount("recordGroupMetricsEntry", updateCount, 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't seen this pattern before but I see that it's common in our DB. TIL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you had a different pattern in mind to achieve this upsert let me know and I am happy to look into it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not a leading question, it seems reasonable and I don't know of anything better.

@salonishah11
Copy link
Contributor Author

Have you thought about where you'd like to place the conversion from "quota timestamp is recent" to Boolean "quota is exhausted"?

@aednichols I was thinking probably in the /database/slick/GroupMetricsSlickDatabase.scala file or maybe 1 level higher. But yeah that will be included in the next PR which will actually use the values from the table to decide where to allocate new tokens to this group or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this is only a simple test file that checks the actor receives messages when sent and calls appropriate database method. But in the follow up PR where GroupMetricsActor will receive another message related to figuring out whether group is quota exhausted or not, I am expecting that this test file will get more useful tests then.

@salonishah11 salonishah11 marked this pull request as ready for review August 22, 2024 13:07
@salonishah11 salonishah11 requested a review from a team as a code owner August 22, 2024 13:07

override def receive: Receive = {
case RecordGroupQuotaExhaustion(group) =>
log.info(s"Recording quota exhaustion for group '$group'.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From discussion - remove this log line

Copy link
Collaborator

@aednichols aednichols left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor

@jgainerdewar jgainerdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once that log line is removed. Thanks for the walkthrough!

Base automatically changed from sps_new_table to develop August 23, 2024 17:01
@salonishah11 salonishah11 merged commit 459a6ef into develop Aug 23, 2024
39 checks passed
@salonishah11 salonishah11 deleted the sps_record_quota_exhaustion branch August 23, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants