Parquet partition schema evolution on non-primitive columns #7305

Yaliang · 2017-02-04T02:24:13Z

Combined with the flexible parquet struct converter(#4714), this PR added a lazy equal on HiveType in order to allow a partition schema evolution over non-primitive fields(especially Struct).

billonahill · 2017-02-06T18:06:54Z

A general question I have is whether this functionality should be required to be enabled with a config setting, or if backward-compatible schema evolution should just be natively supported? I would think we'd want the latter, which is how I thought parquet schema evolution patches like #4714 were being handled.

Yaliang · 2017-02-06T18:15:27Z

Right, as I comment here. I am not sure if other formats have similar functionalities to handle the name-based mapping for non-primitive fields. We may able to combine with other information to decided either equals or lazyEquals should be used. But for prototyping, I would just pass this setting from the configuration.

Yaliang · 2017-02-06T18:23:06Z

For case of parquet format, we should use lazyEquals once the isUseParquetColumnNames() returns true.

geraint0923 · 2017-02-08T23:19:32Z

I think Presto already has some support for primitive type evolution. If you want to support evolution for non-primitive types, it would be better to do it in HiveCoercionPolicy and then handle the coercions in HiveCoercionRecordCursor.

Yaliang · 2017-02-10T06:30:10Z

@geraint0923 Correct, that is an alternative approach and looks more structured.

Yaliang · 2017-02-14T07:02:17Z

It looks like #4714 has a dependency on presto-main beyond the test scope, which is bad. Will work on alternative approach without #4714. I guess I may have to repack the object in HiveCoercionRecordCursor.

Yaliang · 2017-02-14T19:56:26Z

#4714 Rebased without dependency of presto-main. Will update when it passed CI.

… on non-primitive type for Parquet so that the Parquet cursor can get the table schema

Yaliang · 2017-02-25T02:18:05Z

Restructured commits.
Parquet cursor will get the column handler in the form of table schema and do the flexible conversion in itself. And for other formats, they will run into Not-Supported Exception.

Yaliang · 2017-02-25T02:43:13Z

@geraint0923 Ready for review

Yaliang · 2017-10-10T16:14:50Z

Closing this PR and implementing the coercion in HiveCoercionRecordCursor and HivePageSource #9131

facebook-github-bot added the CLA Signed label Feb 4, 2017

Yaliang mentioned this pull request Feb 4, 2017

Parquet schema evolution on non-primitive type twitter-forks/presto#71

Closed

Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch from ffed8b9 to 1482ec6 Compare February 14, 2017 00:14

Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch 3 times, most recently from 024d38c to a1f0e02 Compare February 23, 2017 21:32

jxiang and others added 3 commits February 24, 2017 18:09

Support Schema Evolution in Parquet

7d17d90

Added Parquet Schema Evolution Test

9248cb3

Add non-primitive coercion and prevent extract regular column handles…

7657954

… on non-primitive type for Parquet so that the Parquet cursor can get the table schema

Yaliang force-pushed the yaliangw/oss-partition-schema-evolution branch from a1f0e02 to 7657954 Compare February 25, 2017 02:10

Yaliang changed the title ~~partition schema evolution~~ Add Parquet partition schema evolution on non-primitive columns Feb 25, 2017

Yaliang changed the title ~~Add Parquet partition schema evolution on non-primitive columns~~ Parquet partition schema evolution on non-primitive columns Feb 25, 2017

dain assigned nezihyigitbasi Apr 17, 2017

Yaliang closed this Oct 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parquet partition schema evolution on non-primitive columns #7305

Parquet partition schema evolution on non-primitive columns #7305

Yaliang commented Feb 4, 2017 •

edited

Loading

billonahill commented Feb 6, 2017

Yaliang commented Feb 6, 2017

Yaliang commented Feb 6, 2017

geraint0923 commented Feb 8, 2017

Yaliang commented Feb 10, 2017

Yaliang commented Feb 14, 2017

Yaliang commented Feb 14, 2017

Yaliang commented Feb 25, 2017

Yaliang commented Feb 25, 2017

Yaliang commented Oct 10, 2017

Parquet partition schema evolution on non-primitive columns #7305

Parquet partition schema evolution on non-primitive columns #7305

Conversation

Yaliang commented Feb 4, 2017 • edited Loading

billonahill commented Feb 6, 2017

Yaliang commented Feb 6, 2017

Yaliang commented Feb 6, 2017

geraint0923 commented Feb 8, 2017

Yaliang commented Feb 10, 2017

Yaliang commented Feb 14, 2017

Yaliang commented Feb 14, 2017

Yaliang commented Feb 25, 2017

Yaliang commented Feb 25, 2017

Yaliang commented Oct 10, 2017

Yaliang commented Feb 4, 2017 •

edited

Loading