Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASTS] feat(bucket-assigner): Add underlying physical table #7299

Open
wants to merge 1 commit into
base: mdaudali/09-16-refactor_shuffling_around_of_persistable_types
Choose a base branch
from

Conversation

mdaudali
Copy link
Contributor

General

Before this PR:
There is no physical table in the schema for the bucket assigner and related tasks.

After this PR:
There is now!

==COMMIT_MSG==
==COMMIT_MSG==

Priority: P2

Concerns / possible downsides (what feedback would you like?):
I can remove it from the schema so the table does not actually get created, if we feel like we'll modify the schema before shipping.

Is documentation needed?:

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
No
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
Yes? But no
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):
N/A
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:
N/A
Does this PR need a schema migration?
N/A

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:
None
What was existing testing like? What have you done to improve it?:
N/A
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:
N/A
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:
N/a

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):
Table is created
Has the safety of all log arguments been decided correctly?:
N/A
Will this change significantly affect our spending on metrics or logs?:
N/A
How would I tell that this PR does not work in production? (monitors, etc.):
N/A
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:
N/A
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
N/A

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:
No
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:
No
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:
Maybe we'll change the schema to handle even more bucket assigner options, but there's sufficient reserved space in the current design to do that without changing the schema

Development Process

Where should we start reviewing?:
TSS
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:
N/A
Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju

@@ -137,6 +137,25 @@ private static void addSweepStateTables(Schema schema) {
conflictHandler(ConflictHandler.IGNORE_ALL);
}
});

schema.addTableDefinition("sweepAssignedBuckets", new TableDefinition() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not stuck to the name - this changed multiple times, but something bucket and assignment seemed fitting, as we refer to the process of assigning timestamp ranges to buckets as bucket assignment.

rowName();
hashFirstNRowComponents(3);
rowComponent("shard", ValueType.VAR_LONG);
rowComponent("major_bucket_identifier", ValueType.VAR_LONG);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be more consistent to call it coarse and fine? I worry that, since the major vs coarse and minor vs fine numbers are so far apart in size that they'll cause confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more consistent, but I would argue here that we don't want consistency as they actually refer to different concepts!

allSafeForLoggingByDefault();
rowName();
hashFirstNRowComponents(3);
rowComponent("shard", ValueType.VAR_LONG);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like #7200, I didn't sign the long (although in that PR, there's no intention to have special cells, whereas there is here).

We could reserve e.g. Long.MAX_VALUE instead of signing the long (or can just sign the long and be different to other ASTS but same as previous sweep tables).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it really matters here, but I think I marginally prefer signed var longs since in the encoding a Long.MAX_VALUE is like a full 9 or 10 bytes

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

rowName();
hashFirstNRowComponents(3);
rowComponent("shard", ValueType.VAR_LONG);
rowComponent("major_bucket_identifier", ValueType.VAR_LONG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more consistent, but I would argue here that we don't want consistency as they actually refer to different concepts!

allSafeForLoggingByDefault();
rowName();
hashFirstNRowComponents(3);
rowComponent("shard", ValueType.VAR_LONG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it really matters here, but I think I marginally prefer signed var longs since in the encoding a Long.MAX_VALUE is like a full 9 or 10 bytes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants