Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read! on byte array ignores custom read methods defined on the eltype #9327

Closed
twadleigh opened this issue Dec 12, 2014 · 10 comments
Closed
Labels
domain:io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@twadleigh
Copy link
Contributor

I've written up an example script demonstrating a failure as a gist.

I also ran into another, seemingly related, issue where my read fn was doing big-endian-to-little-endian byte swaps, but the array version of read was ignoring that and reading the values as little endian. I'll see if I can put together a failing test case for that, as well.

@twadleigh
Copy link
Contributor Author

I updated the gist to include both the failure to byte-reverse and to read too many bytes. The ouput is as follows:

sizeof A: 8
A(0x00010203,0x0405)
Bytes read for single value: 6
[A(0x03020100,0x0504)]
Bytes read per value for array: 8

I tested this on master. I haven't tried it on stable yet.

@twadleigh
Copy link
Contributor Author

I can confirm that 0.3.3 has the bug as well. The definitions in base/io.jl (121--133) look correct, so this seems to be due to some under-the-hood chicanery.

@twadleigh
Copy link
Contributor Author

OK, it turns out the issue is that, because my instance of IO was, in fact, an IOStream, the read!{T}(::IOStream, ::Array{T}) defined in base/iostream.jl at line 191 was invoked, instead of read!{T}(::IO, ::Array{T}) defined in base/io.jl at line 128. The latter internally calls read(::IO, ::Type{T}), as I was expecting, whereas the former uses sizeof to read uninterpreted bytes from the stream.

My sense is that it is unlikely that both should coexist in base as two methods of the same function---especially having such distinct semantics for so similar a signature. If I had to pick, I'd keep the more generic version in io.h.

I'd like to hear feedback about what the path forward should be (if there needs to be a fix).

@timholy
Copy link
Sponsor Member

timholy commented Dec 13, 2014

Nice analysis, @twadleigh. And ouch, this is a nasty one. It almost seems there need to be two entirely different notions of read/write: one that includes the padding, and one without. This is actually pretty scary (how do you know which version was used to write the file?), and I'm surprised it hasn't come up before. If we have just one I agree with your pick, but OTOH mmapping big files requires the version that includes the padding. How do other languages handle this issue? Possibly relevant is #2555.

I'll also add permalinks for those line numbers you referenced:

julia/base/iostream.jl

Lines 191 to 202 in 62685fc

function read!{T}(s::IOStream, a::Array{T})
if isbits(T)
nb = length(a)*sizeof(T)
if ccall(:ios_readall, UInt,
(Ptr{Void}, Ptr{Void}, UInt), s.ios, a, nb) < nb
throw(EOFError())
end
else
invoke(read!, (IO, Array), s, a)
end
a
end

julia/base/iostream.jl

Lines 191 to 202 in 62685fc

function read!{T}(s::IOStream, a::Array{T})
if isbits(T)
nb = length(a)*sizeof(T)
if ccall(:ios_readall, UInt,
(Ptr{Void}, Ptr{Void}, UInt), s.ios, a, nb) < nb
throw(EOFError())
end
else
invoke(read!, (IO, Array), s, a)
end
a
end

(If you haven't discovered it yet, a cool trick is to hit "y" when you're on a source-code page; GitHub then "specializes" the URL to a specific commit, ensuring that if you select particular lines your link won't become out of date as the code changes.)

Finally, with regards to endian conversion, I almost wonder if that should be done via a wrapper around the io object.

@twadleigh twadleigh changed the title read for array of immutable fails if the file footprint is not the same as the memory footprint. read! in iostream.h shadows read! in io.h Dec 13, 2014
@twadleigh twadleigh changed the title read! in iostream.h shadows read! in io.h read! in iostream.jl shadows read! in io.jl Dec 13, 2014
@twadleigh
Copy link
Contributor Author

@timholy: thanks for the tip on permalinks.

This was a nasty gotcha, a little bit faith-shaking, even. (And it immediately followed my getting burned by the "fill! copies a reference to a single value for compound types" gotcha for the first time---which is what led me to convert my types to immutables, only to fall into this trap!) I'm glad it turned out to be relatively innocuous. I guess I can say I'm happy to have used code_lowered and code_typed "in anger" for the first time.

I'll work out a PR that tries to deconflict these. I think, while I'm at it, I'll propose a new array constructor or two that is friendly to compound types.

@JeffBezanson
Copy link
Sponsor Member

Very related: #6466

I think we should considerably narrow the isbits test in read!{T}(s::IOStream, a::Array{T}). It probably dates from before we had immutables. Since it's already not using dispatch, we could put in a pretty complicated condition if we wanted, or just restrict it to known types like Float64 and Complex.

@JeffBezanson
Copy link
Sponsor Member

We could also add a FlatBinaryFormat TimHolyTrait, or the like.

@twadleigh
Copy link
Contributor Author

Yeah, I was thinking of something like a TimHolyTrait so that isbits types could explicitly opt in. I'll go that route.

I think we should also make sure all the various specializations of read!{T}(::IO, ::Array{T}) have intuitively compatible semantics.

@twadleigh
Copy link
Contributor Author

I just started to look into making a proposal to fix this. I noticed that the docs and help strings for read and write make mention of "canonical binary representation" (CBR, henceforth). The trap I fell into was when this assumption was violated.

The byte swapping issue I ran into may be fixed by associating with a stream an endianness that can differ from that of the hardware, in which case byte swapping happens implicitly. (Was this more or less what you were suggesting in your comment, @timholy?)

As for the other issue (stream footprint differs from memory footprint), it would seem there are two possible ways forward:

  1. Make sure it is safe for people to provide read and write methods for their own types that work naturally in base. This seems like the most natural path forward to me, but would mean relaxing the CBR assumption for the semantics of read/write in every case. Of course, there would still be a Holy Trait (a.k.a., Virtue?) that would declare when the CBR assumption is safe.
  2. Continue to enforce CBR for read/write as part of that function's interface, but provide another pair of analogous functions for types where CBR doesn't hold.

I'll start down the path for 1. but I also want to make sure that removing CBR from the stated semantics for read/write is OK.

@JeffBezanson JeffBezanson added the domain:io Involving the I/O subsystem: libuv, read, write, etc. label Jan 28, 2016
@vtjnash vtjnash changed the title read! in iostream.jl shadows read! in io.jl read! on byte array ignores custom read methods defined on the eltype Mar 14, 2016
@twadleigh
Copy link
Contributor Author

Super stale. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests

3 participants