-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet: Verify 32-bit CRC checksum when decoding pages #6290
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get some tests phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @tustvold that this code needs to have some tests to ensure we don't break the feature in the future
Also I think the feature flag should be documented here https://crates.io/crates/parquet
FYI: https://github.com/apache/parquet-testing/tree/master/data |
Thanks for the pointers. I added the tests and documented the feature flag. Please take a look |
I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Please run cargo +stable fmt --all
and check in the result. Have you run any benchmarks to see if there is a measurable impact from the crc calculation?
@@ -215,3 +218,4 @@ harness = false | |||
|
|||
[lib] | |||
bench = false | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -82,4 +83,4 @@ The `parquet` crate provides the following features which may be enabled in your | |||
|
|||
## License | |||
|
|||
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0. | |||
Licensed under the Apache License, Version 2.0: <http://www.apache.org/licenses/LICENSE-2.0>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the angle brackets necessary?
return Err(ParquetError::General( | ||
"Page CRC checksum mismatch".to_string(), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Err(ParquetError::General( | |
"Page CRC checksum mismatch".to_string(), | |
)); | |
return Err(general_err!("Page CRC checksum mismatch")); |
@@ -0,0 +1,55 @@ | |||
use std::path::PathBuf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the apache license notification.
Closes #6289
Please let me know if we should expose this in the reader APIs instead of a crate feature