`read_model_row` does not work for schema extensions of `ModelV1` #24

kleinschmidt · 2023-02-21T01:51:02Z

read_model_row always returns a ModelV1 which makes it pretty useless for anything that requires fields from a schema extensions. In Legolas <0.5, this was fine because extra fields were preserved but now they are discarded.

Instead, could LegolasFlux get the schema from the table metadata, use Legolas.record_type to get the appropriate record type, and try to construct that instead?

The text was updated successfully, but these errors were encountered:

ericphanson · 2023-02-21T14:26:16Z

hmm, I guess? I am not sure, that feels kind of like trying to work around the legolas change rather than embracing it. To me it seems like either that function should be dropped and folks should add it themselves for their own schemas if it is useful*, or it should have a required schema argument to say what schema you want to deserialize.

*we could have a macro to generate this function, but that seems somewhat indirect/confusing. Maybe better is to just write that function out as an example somewhere to suggest writing your own if it is useful.

hannahilea · 2023-02-21T15:00:32Z

a required schema argument to say what schema you want to deserialize

What about an optional schema argument for this, so that the existing behavior (return ModelV1) is still the default and doesn't break in the absence of a specified schema?

kleinschmidt · 2023-02-21T15:24:28Z

Isn't this what Legolas.read does already (auto-detect the schema)? Maybe there should be a Legolas.read_row?

hannahilea · 2023-02-21T15:28:19Z

Good point. Or update read to take an optional schema specification

read(io_or_path; validate::Bool=true, schema=missing)

kleinschmidt · 2023-02-21T15:33:00Z

I think that read by design does not take a schema argument

hannahilea · 2023-02-21T15:33:58Z

ah, that makes sense.

kleinschmidt · 2023-02-21T15:34:21Z

oh yikes I missed that write_model_row always imposes the ModelV1 schema though...

kleinschmidt · 2023-02-21T15:36:14Z

So as things currently stand, if we follow the recommendations of the readme (use write_model_row/read_model_row, use schema extensions for your own model), things will not work: the written model row may contain the extra fields, but the deserialized one definitely won't.

kleinschmidt · 2023-02-21T15:37:10Z

I think we shoudl follow the legolas template of:

write takes a schema argument (can default to ModelV1SchemaVersion)
read extracts teh schema from the Arrow table metadata.

ericphanson · 2023-02-21T15:50:23Z

Yeah, I don't think folks should actually be using write_model_row or read_model_row as-is (even with Legolas v0.4), but rather defining their own version with their schema downstream. They were included to kind of have a minimal interface so the README could have an example showing you can :just: read and write, but that was probably a bad idea, because in practice you want to be defining your own schema extension and your own functions to write/read that schema if needed. This is def a docs/explanation failure but also an interface one it sounds like.

kleinschmidt · 2023-02-21T16:00:30Z

in practice you want to be defining your own schema extension and your own functions to write/read that schema if needed

why though? what would those extensions be doing other than what these functions are doing, modulo the schema difference? I think it's pretty reasonable to provide these functions in a way that's generally useful. Right now they're just a footgun, since a user will see that they are used in teh examples adn exported and think they are meant for users to use, and then wonder why their code is breaking in mysterious ways when their extension columns go missing. If they're really meant just as examples/templates, we should remove them from the package and define them in-line in the README

ericphanson · 2023-02-21T16:07:28Z

If they're really meant just as examples/templates, we should remove them from the package and define them in-line in the README

Yeah, that's what I was saying I should've done/should do.

Right now they're just a footgun, since a user will see that they are used in teh examples adn exported and think they are meant for users to use, and then wonder why their code is breaking in mysterious ways when their extension columns go missing

Agreed, that's what I meant by interface failure.

why though? what would those extensions be doing other than what these functions are doing, modulo the schema difference?

True... OK maybe they can be useful then. In my usage we have modified them for backwards compat reasons (e.g. if we infer an old schema, we can upgrade to the new one on-the-fly), but OK I agree they can be useful except for the specific schema being hardcoded.

kleinschmidt · 2023-02-21T16:22:40Z

we have modified them for backwards compat reasons (e.g. if we infer an old schema, we can upgrade to the new one on-the-fly)

Yeah, I feel like this is the class of situations where in vanilla legolas land you are encouraged to use Arrow.read directly; since that fallback is always available, I think there isn't much downside to making the convenience functions we do provide useful

kleinschmidt · 2023-02-21T16:22:55Z

Okay I'll make a PR then with my proposed changes and some tests!

kleinschmidt · 2023-02-21T20:50:24Z

fixed by #26

kleinschmidt mentioned this issue Feb 21, 2023

feature request: write_record / read_record to read/write a single row from a record instance beacon-biosignals/Legolas.jl#80

Open

kleinschmidt closed this as completed Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read_model_row` does not work for schema extensions of `ModelV1` #24

`read_model_row` does not work for schema extensions of `ModelV1` #24

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023 •

edited

Loading

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

read_model_row does not work for schema extensions of ModelV1 #24

read_model_row does not work for schema extensions of ModelV1 #24

Comments

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023 • edited Loading

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

hannahilea commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

ericphanson commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

kleinschmidt commented Feb 21, 2023

`read_model_row` does not work for schema extensions of `ModelV1` #24

`read_model_row` does not work for schema extensions of `ModelV1` #24

ericphanson commented Feb 21, 2023 •

edited

Loading