Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deserialize nested schema after adding optional field #95

Open
ericphanson opened this issue Jun 11, 2023 · 2 comments
Open

Cannot deserialize nested schema after adding optional field #95

ericphanson opened this issue Jun 11, 2023 · 2 comments

Comments

@ericphanson
Copy link
Member

ericphanson commented Jun 11, 2023

This is very similar to #94 but less scary because it throws an error (and this is actually how I realized that bug existed).

using Legolas
using Legolas: @schema, @version

@schema "test.issue" Issue

@version IssueV1 begin
    a::Int
end


@schema "test.parent" Parent
@version ParentV1 begin
    x::IssueV1
end


table = [ParentV1(; x=IssueV1(; a=1))]
Legolas.write("issue.arrow",table,  ParentV1SchemaVersion())

# New Julia session

using Legolas
using Legolas: @schema, @version

@schema "test.issue" Issue

@version IssueV1 begin
    a::Int
    b::Union{Missing, Int}
end


@schema "test.parent" Parent
@version ParentV1 begin
    x::IssueV1
end

Legolas.read("issue.arrow").x[1]

throws

ERROR: MethodError: no method matching get(::Int64, ::Symbol, ::Missing)

Closest candidates are:
  get(::Union{Tables.AbstractColumns, Tables.AbstractRow}, ::Union{Integer, Symbol}, ::Any)
   @ Tables ~/.julia/packages/Tables/AcRIE/src/Tables.jl:193
  get(::IdDict{K, V}, ::Any, ::Any) where {K, V}
   @ Base iddict.jl:101
  get(::Base.ImmutableDict, ::Any, ::Any)
   @ Base dict.jl:808
  ...

Stacktrace:
 [1] IssueV1(row::Int64)
   @ Main ~/Legolas.jl/src/schemas.jl:601
 [2] fromarrow(#unused#::Type{IssueV1}, x::Int64)
   @ ArrowTypes ~/.julia/packages/ArrowTypes/aTFES/src/ArrowTypes.jl:168
 [3] getindex(s::Arrow.Struct{IssueV1, Tuple{Arrow.Primitive{Int64, Vector{Int64}}}}, i::Int64)
   @ Arrow ~/.julia/packages/Arrow/ZPy8W/src/arraytypes/struct.jl:49
 [4] top-level scope
   @ REPL[9]:1
 [5] top-level scope
   @ ~/.julia/juliaup/julia-1.9.1+0.aarch64.apple.darwin14/share/julia/stdlib/v1.9/REPL/src/REPL.jl:1416

Again the issue is the fromarrow definition in

$Arrow.ArrowTypes.fromarrow(::Type{<:$R}, $(keys(record_fields)...)) = $R(; $(keys(record_fields)...))
which expects exactly the current fields to be present to be deserialized (even optional ones).

We can change that line to

        $Arrow.ArrowTypes.fromarrow(::Type{<:$R}, $(field_kwargs...)) = $R(; $(keys(record_fields)...))

which will work as long as new optional schemas are always added to the end of the list of columns (and the order doesn't change). This works by changing the signature to make those positional arguments optional, of the form a=missing, b=missing, etc. I'm not sure if that is acceptable or not, because it relies on the ordering. Xref #94

@ericphanson
Copy link
Member Author

One workaround here is to add new optional fields by declaring a whole new version of the nested schema, i.e.

@version IssueV2 begin
    a::Int
    b::Union{Missing, Int}
end

and add

Legolas.accepted_field_type(::ParentV1SchemaVersion, ::Type{IssueV1}) = Union{<:IssueV1, <:IssueV2}

to allow old parent tables to continue to validate, and update the field in the parent constructor:

@version ParentV1 begin
    x::IssueV2 = IssueV2(x)
end

@ericphanson ericphanson changed the title Cannot deserializing nested schema after adding optional field Cannot deserialize nested schema after adding optional field Jun 12, 2023
@jrevels
Copy link
Member

jrevels commented Dec 5, 2023

just calling out the example in the OP is "not supported" since adding the b::Union{Missing,Int} field would necessitate a version bump to Issue, and then you enter the regime described in your second comment where things work fine.

Adding b::Any without bumping the Issue version should indeed work, though. But probably a low priority bug in practice given that I doubt this use case is super common

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants