-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GCP BigTable support #364
Conversation
WIP: code in place, needs testing and validation
After sleeping on it, I decided to not overload the messages being stored when possible.
* add proper health check routine for __status__
…-rs into feat/big_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and things here, I'm going to look at the bigtable impl in more detail and also the new current_timestamp
next.
.clone() | ||
.read_rows(&req) | ||
.map_err(|e| error::BigTableError::Read(e.to_string()))?; | ||
merge::RowMerger::process_chunks(resp, timestamp_filter, limit).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we utilize ReadRowsRequest::row_limit
limit instead of doing this limiting on our side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but remember that the row_limit
function limits rows regardless of what filtering we are doing. If we fetch the first 10 of 20 candidate rows, and 5 are excluded due to our limits, we will only return the remaining 5 and no additional records. This can be a problem since, while the sortkey_timestamp
is part of the primary key and should be used as the internal sorting key used by Bigtable, we've previously noted that both range_key
and regex
keys are potentially problematic.
(in the case of range_key
, it returns all rows contained between the start and stop ranges, requiring very explicit key construction, and regex_key
does not provide syntax for "starting with keys matching the following pattern".)
All that said, I've enabled row limits for now and removed the limit
argument pass. Tests are currently passing, but I'm still feeling cautious about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify -- this is only an issue for the filtering we do on our side, right? (just the timestamp_filter
against sortkey_timestamp
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This is just for filtering in autopush code, not in bigtable. Bigtable filtering may still be possible, but would require additional investigation (and I'm not sure it would do what we want).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think we'll need to leave it (at least for now) then -- we don't have a test that produces enough notifications to trigger the case of it not collecting enough notifications. I was able to create one by duplicating the integration test's test_ttl_batch_partly_expired_and_good_one
and changing its range(6)
calls to range(12)
. Fails on bigtable but passes on dynamo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're able to switch to read_rows w/ a start_key instead of the regex matching (later) then I think we can begin taking advantage of the server side limit here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. That was pretty much the reason I had done the initial version that way. I just thought you might remember some exclusion that would allow us to use that fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need that limiting put back for now (see this comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks JR. This now passes that larger notification test -- I'll probably add it to the suite in a future commit.
I'll tinker with some schema changes -- especially investigating if we can use that plain read_rows instead of the regex filter -- but this is ready to merge.
Looks like we can close out #378 since it's included in here.
Agreed. I'll manually close that ticket after this merges. |
Adds BigTable support under autopush-common.
BigTable is currently under a feature flag due to CI failing to build it correctly (currently the jobs are being killed on CI)
Closes SYNC-3449