Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add GCP BigTable support #364

Merged
merged 79 commits into from
Aug 2, 2023
Merged

feat: Add GCP BigTable support #364

merged 79 commits into from
Aug 2, 2023

Conversation

jrconlin
Copy link
Member

Adds BigTable support under autopush-common.

BigTable is currently under a feature flag due to CI failing to build it correctly (currently the jobs are being killed on CI)

Closes SYNC-3449

@jrconlin jrconlin marked this pull request as ready for review April 21, 2023 19:28
@jrconlin jrconlin requested a review from pjenvey April 21, 2023 19:28
@jrconlin jrconlin requested a review from pjenvey July 25, 2023 22:03
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits and things here, I'm going to look at the bigtable impl in more detail and also the new current_timestamp next.

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
autopush-common/src/db/mock.rs Outdated Show resolved Hide resolved
autopush-common/src/db/mod.rs Outdated Show resolved Hide resolved
autopush-common/src/db/mod.rs Outdated Show resolved Hide resolved
tests/test_integration_all_rust.py Outdated Show resolved Hide resolved
@jrconlin jrconlin requested a review from pjenvey July 27, 2023 20:08
autopush-common/src/db/mod.rs Show resolved Hide resolved
autopush-common/src/db/mod.rs Show resolved Hide resolved
autopush-common/src/db/bigtable/bigtable_client/row.rs Outdated Show resolved Hide resolved
.clone()
.read_rows(&req)
.map_err(|e| error::BigTableError::Read(e.to_string()))?;
merge::RowMerger::process_chunks(resp, timestamp_filter, limit).await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we utilize ReadRowsRequest::row_limit limit instead of doing this limiting on our side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but remember that the row_limit function limits rows regardless of what filtering we are doing. If we fetch the first 10 of 20 candidate rows, and 5 are excluded due to our limits, we will only return the remaining 5 and no additional records. This can be a problem since, while the sortkey_timestamp is part of the primary key and should be used as the internal sorting key used by Bigtable, we've previously noted that both range_key and regex keys are potentially problematic.
(in the case of range_key, it returns all rows contained between the start and stop ranges, requiring very explicit key construction, and regex_key does not provide syntax for "starting with keys matching the following pattern".)

All that said, I've enabled row limits for now and removed the limit argument pass. Tests are currently passing, but I'm still feeling cautious about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify -- this is only an issue for the filtering we do on our side, right? (just the timestamp_filter against sortkey_timestamp)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. This is just for filtering in autopush code, not in bigtable. Bigtable filtering may still be possible, but would require additional investigation (and I'm not sure it would do what we want).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think we'll need to leave it (at least for now) then -- we don't have a test that produces enough notifications to trigger the case of it not collecting enough notifications. I was able to create one by duplicating the integration test's test_ttl_batch_partly_expired_and_good_one and changing its range(6) calls to range(12). Fails on bigtable but passes on dynamo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're able to switch to read_rows w/ a start_key instead of the regex matching (later) then I think we can begin taking advantage of the server side limit here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. That was pretty much the reason I had done the initial version that way. I just thought you might remember some exclusion that would allow us to use that fix.

autopush-common/src/db/bigtable/bigtable_client/mod.rs Outdated Show resolved Hide resolved
autopush-common/src/db/bigtable/bigtable_client/mod.rs Outdated Show resolved Hide resolved
.circleci/config.yml Outdated Show resolved Hide resolved
.circleci/config.yml Show resolved Hide resolved
autopush-common/src/db/bigtable/bigtable_client/merge.rs Outdated Show resolved Hide resolved
autopush-common/src/db/bigtable/bigtable_client/merge.rs Outdated Show resolved Hide resolved
@jrconlin jrconlin requested a review from pjenvey July 31, 2023 20:53
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need that limiting put back for now (see this comment)

@jrconlin jrconlin requested a review from pjenvey August 2, 2023 15:17
Copy link
Member

@pjenvey pjenvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks JR. This now passes that larger notification test -- I'll probably add it to the suite in a future commit.

I'll tinker with some schema changes -- especially investigating if we can use that plain read_rows instead of the regex filter -- but this is ready to merge.

Looks like we can close out #378 since it's included in here.

@jrconlin
Copy link
Member Author

jrconlin commented Aug 2, 2023

Agreed. I'll manually close that ticket after this merges.

@jrconlin jrconlin merged commit 608c52f into master Aug 2, 2023
1 check passed
@jrconlin jrconlin deleted the feat/big_table branch August 2, 2023 23:02
@jrconlin jrconlin mentioned this pull request Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants