Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crossovers #66

Closed
v1gnesh opened this issue Jun 20, 2023 · 4 comments
Closed

Crossovers #66

v1gnesh opened this issue Jun 20, 2023 · 4 comments

Comments

@v1gnesh
Copy link

v1gnesh commented Jun 20, 2023

Hi,

Firstly, thank you for building this in the open & sharing!

I see that this can be used to serde-derivable structures to the arrow layout.

There are a ways to parse binary content into Rust data types.
Additionally, there is https://github.com/simd-lite/simd-json-derive for deriving JSON from Rust data types.

Would I be able to convert a bunch of structs "created" by them, and then use serde_arrow's derive on top of that, to convert it finally to the arrow layout?

@chmp
Copy link
Owner

chmp commented Jun 23, 2023

Hey,

Thanks for the kind words!

The crates you mention are really cool indeed. At first glance they do not seem to offer this split between data and format as serde does. So I do not see an obvious way to convert from deku / binrw data to arrow directly.

If you're talking about deku / binrw -> Rust -> arrow, then sure: you can use serde_arrow as is, you just need to specify the schema of your objects. Either by tracing a couple of examples using serialize_to_fields or by building the schema yourself. Then you use deku / binrw to construct the Rust objects and use serde_arrow to build the arrow arrays that correspond from these objects.

@v1gnesh
Copy link
Author

v1gnesh commented Jun 24, 2023

Thank you, yeah I mean this option -- deku / binrw -> Rust -> arrow.
If you have time, could you share an example of how I'd go about doing this. I'm pretty noob-ish with programming in general. My use case has a whole bunch of nested struct types, of binary log data.

Will post about the first method in those 2 projects and see what they think..

@chmp
Copy link
Owner

chmp commented Jun 24, 2023

With serde_arrow, you have to ensure all your types implement serialize / deserialize, i.e., by using serde's derive macros. Then you can simply follow the example in the readme:

  1. trace the fields (i.e., determine the schema of your arrays): let fields = serialize_into_fields(&items, TracingOptions::default())?;
  2. construct the arrays let arrays = serialize_into_arrays(&fields, &items)?;

Important for step 1: if you have enums and lists you must make sure all lists have at leas a single entry and all relevant enum variants are encountered.

If you control the whole code base, maybe also arrow2-convert would be an option. You can easily convert from arrow2 to arrow.

@chmp
Copy link
Owner

chmp commented Jul 1, 2023

Closing this issue, as there is no change necessary in serde_arrow as far as I can tell.

@chmp chmp closed this as completed Jul 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants