Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(influx_tools): Add export to parquet files #25297

Open
wants to merge 14 commits into
base: master-1.x
Choose a base branch
from

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Sep 9, 2024

Closes #
Superseeds #25253

Describe your proposed changes here.

  • I've read the contributing section of the project README.
  • Signed CLA (if not already signed).

This PR adds a command to export data into per-shard parquet files. To do so, the command iterates over the shards, creates a cumulative schema over the series of a measurement (i.e. a super-set of tags and fields) and exports the data to a parquet file per measurement and shard.

To test the tool run

go run -ldflags "-X google.golang.org/protobuf/reflect/protoregistry.conflictPolicy=ignore" ./cmd/influx_tools/ export-parquet -config influxdb.conf -database telegraf

@srebhan srebhan force-pushed the v1-bulk-exporter-parquet branch 2 times, most recently from 6869ba3 to bd44db9 Compare September 9, 2024 14:12
.circleci/config.yml Outdated Show resolved Hide resolved
cmd/influx_tools/main.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/batcher.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/batcher.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/batcher.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/command.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/exporter.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/exporter.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/exporter.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/exporter.go Outdated Show resolved Hide resolved
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review, but I'm not familiar with arrow and certainly missed some things. I can do a more thorough review if we paired to walk through the algorithm once.

cmd/influx_tools/parquet/schema.go Outdated Show resolved Hide resolved
cmd/influx_tools/parquet/schema.go Outdated Show resolved Hide resolved
@srebhan
Copy link
Member Author

srebhan commented Sep 18, 2024

@davidby-influx thanks for the thorough review! I tried to address all issues and commented on the three unresolved ones. Will schedule a meeting for walking through the code. Thanks again!

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cmd/influx_tools/parquet/cursors.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants