-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet partition schema evolution on non-primitive columns #7305
Parquet partition schema evolution on non-primitive columns #7305
Conversation
A general question I have is whether this functionality should be required to be enabled with a config setting, or if backward-compatible schema evolution should just be natively supported? I would think we'd want the latter, which is how I thought parquet schema evolution patches like #4714 were being handled. |
Right, as I comment here. I am not sure if other formats have similar functionalities to handle the name-based mapping for non-primitive fields. We may able to combine with other information to decided either |
For case of parquet format, we should use |
I think Presto already has some support for primitive type evolution. If you want to support evolution for non-primitive types, it would be better to do it in |
@geraint0923 Correct, that is an alternative approach and looks more structured. |
ffed8b9
to
1482ec6
Compare
#4714 Rebased without dependency of presto-main. Will update when it passed CI. |
024d38c
to
a1f0e02
Compare
… on non-primitive type for Parquet so that the Parquet cursor can get the table schema
a1f0e02
to
7657954
Compare
Restructured commits. |
@geraint0923 Ready for review |
Closing this PR and implementing the coercion in HiveCoercionRecordCursor and HivePageSource #9131 |
Combined with the flexible parquet struct converter(#4714), this PR added a lazy equal on HiveType in order to allow a partition schema evolution over non-primitive fields(especially Struct).