Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet partition schema evolution on non-primitive columns #7305

Closed

Conversation

Yaliang
Copy link
Contributor

@Yaliang Yaliang commented Feb 4, 2017

Combined with the flexible parquet struct converter(#4714), this PR added a lazy equal on HiveType in order to allow a partition schema evolution over non-primitive fields(especially Struct).

@billonahill
Copy link

A general question I have is whether this functionality should be required to be enabled with a config setting, or if backward-compatible schema evolution should just be natively supported? I would think we'd want the latter, which is how I thought parquet schema evolution patches like #4714 were being handled.

@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 6, 2017

Right, as I comment here. I am not sure if other formats have similar functionalities to handle the name-based mapping for non-primitive fields. We may able to combine with other information to decided either equals or lazyEquals should be used. But for prototyping, I would just pass this setting from the configuration.

@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 6, 2017

For case of parquet format, we should use lazyEquals once the isUseParquetColumnNames() returns true.

@geraint0923
Copy link
Contributor

I think Presto already has some support for primitive type evolution. If you want to support evolution for non-primitive types, it would be better to do it in HiveCoercionPolicy and then handle the coercions in HiveCoercionRecordCursor.

@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 10, 2017

@geraint0923 Correct, that is an alternative approach and looks more structured.

@Yaliang Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch from ffed8b9 to 1482ec6 Compare February 14, 2017 00:14
@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 14, 2017

It looks like #4714 has a dependency on presto-main beyond the test scope, which is bad. Will work on alternative approach without #4714. I guess I may have to repack the object in HiveCoercionRecordCursor.

@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 14, 2017

#4714 Rebased without dependency of presto-main. Will update when it passed CI.

@Yaliang Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch 3 times, most recently from 024d38c to a1f0e02 Compare February 23, 2017 21:32
@Yaliang Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch from a1f0e02 to 7657954 Compare February 25, 2017 02:10
@Yaliang Yaliang changed the title partition schema evolution Add Parquet partition schema evolution on non-primitive columns Feb 25, 2017
@Yaliang Yaliang changed the title Add Parquet partition schema evolution on non-primitive columns Parquet partition schema evolution on non-primitive columns Feb 25, 2017
@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 25, 2017

Restructured commits.
Parquet cursor will get the column handler in the form of table schema and do the flexible conversion in itself. And for other formats, they will run into Not-Supported Exception.

@Yaliang
Copy link
Contributor Author

Yaliang commented Feb 25, 2017

@geraint0923 Ready for review

@Yaliang
Copy link
Contributor Author

Yaliang commented Oct 10, 2017

Closing this PR and implementing the coercion in HiveCoercionRecordCursor and HivePageSource #9131

@Yaliang Yaliang closed this Oct 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants