`read!` on byte array ignores custom `read` methods defined on the eltype #9327

twadleigh · 2014-12-12T13:00:38Z

I've written up an example script demonstrating a failure as a gist.

I also ran into another, seemingly related, issue where my read fn was doing big-endian-to-little-endian byte swaps, but the array version of read was ignoring that and reading the values as little endian. I'll see if I can put together a failing test case for that, as well.

twadleigh · 2014-12-12T13:12:49Z

I updated the gist to include both the failure to byte-reverse and to read too many bytes. The ouput is as follows:

sizeof A: 8
A(0x00010203,0x0405)
Bytes read for single value: 6
[A(0x03020100,0x0504)]
Bytes read per value for array: 8

I tested this on master. I haven't tried it on stable yet.

twadleigh · 2014-12-12T15:16:37Z

I can confirm that 0.3.3 has the bug as well. The definitions in base/io.jl (121--133) look correct, so this seems to be due to some under-the-hood chicanery.

twadleigh · 2014-12-13T04:51:16Z

OK, it turns out the issue is that, because my instance of IO was, in fact, an IOStream, the read!{T}(::IOStream, ::Array{T}) defined in base/iostream.jl at line 191 was invoked, instead of read!{T}(::IO, ::Array{T}) defined in base/io.jl at line 128. The latter internally calls read(::IO, ::Type{T}), as I was expecting, whereas the former uses sizeof to read uninterpreted bytes from the stream.

My sense is that it is unlikely that both should coexist in base as two methods of the same function---especially having such distinct semantics for so similar a signature. If I had to pick, I'd keep the more generic version in io.h.

I'd like to hear feedback about what the path forward should be (if there needs to be a fix).

timholy · 2014-12-13T10:21:24Z

Nice analysis, @twadleigh. And ouch, this is a nasty one. It almost seems there need to be two entirely different notions of read/write: one that includes the padding, and one without. This is actually pretty scary (how do you know which version was used to write the file?), and I'm surprised it hasn't come up before. If we have just one I agree with your pick, but OTOH mmapping big files requires the version that includes the padding. How do other languages handle this issue? Possibly relevant is #2555.

I'll also add permalinks for those line numbers you referenced:

julia/base/iostream.jl

Lines 191 to 202 in 62685fc

    
           function read!{T}(s::IOStream, a::Array{T}) 
        
               if isbits(T) 
        
                   nb = length(a)*sizeof(T) 
        
                   if ccall(:ios_readall, UInt, 
        
                            (Ptr{Void}, Ptr{Void}, UInt), s.ios, a, nb) < nb 
        
                       throw(EOFError()) 
        
                   end 
        
               else 
        
                   invoke(read!, (IO, Array), s, a) 
        
               end 
        
               a 
        
           end

julia/base/iostream.jl

Lines 191 to 202 in 62685fc

    
           function read!{T}(s::IOStream, a::Array{T}) 
        
               if isbits(T) 
        
                   nb = length(a)*sizeof(T) 
        
                   if ccall(:ios_readall, UInt, 
        
                            (Ptr{Void}, Ptr{Void}, UInt), s.ios, a, nb) < nb 
        
                       throw(EOFError()) 
        
                   end 
        
               else 
        
                   invoke(read!, (IO, Array), s, a) 
        
               end 
        
               a 
        
           end

(If you haven't discovered it yet, a cool trick is to hit "y" when you're on a source-code page; GitHub then "specializes" the URL to a specific commit, ensuring that if you select particular lines your link won't become out of date as the code changes.)

Finally, with regards to endian conversion, I almost wonder if that should be done via a wrapper around the io object.

twadleigh · 2014-12-13T14:29:47Z

@timholy: thanks for the tip on permalinks.

This was a nasty gotcha, a little bit faith-shaking, even. (And it immediately followed my getting burned by the "fill! copies a reference to a single value for compound types" gotcha for the first time---which is what led me to convert my types to immutables, only to fall into this trap!) I'm glad it turned out to be relatively innocuous. I guess I can say I'm happy to have used code_lowered and code_typed "in anger" for the first time.

I'll work out a PR that tries to deconflict these. I think, while I'm at it, I'll propose a new array constructor or two that is friendly to compound types.

JeffBezanson · 2014-12-13T19:23:17Z

Very related: #6466

I think we should considerably narrow the isbits test in read!{T}(s::IOStream, a::Array{T}). It probably dates from before we had immutables. Since it's already not using dispatch, we could put in a pretty complicated condition if we wanted, or just restrict it to known types like Float64 and Complex.

JeffBezanson · 2014-12-13T19:26:23Z

We could also add a FlatBinaryFormat TimHolyTrait, or the like.

twadleigh · 2014-12-13T20:58:31Z

Yeah, I was thinking of something like a TimHolyTrait so that isbits types could explicitly opt in. I'll go that route.

I think we should also make sure all the various specializations of read!{T}(::IO, ::Array{T}) have intuitively compatible semantics.

twadleigh · 2014-12-20T18:28:02Z

I just started to look into making a proposal to fix this. I noticed that the docs and help strings for read and write make mention of "canonical binary representation" (CBR, henceforth). The trap I fell into was when this assumption was violated.

The byte swapping issue I ran into may be fixed by associating with a stream an endianness that can differ from that of the hardware, in which case byte swapping happens implicitly. (Was this more or less what you were suggesting in your comment, @timholy?)

As for the other issue (stream footprint differs from memory footprint), it would seem there are two possible ways forward:

Make sure it is safe for people to provide read and write methods for their own types that work naturally in base. This seems like the most natural path forward to me, but would mean relaxing the CBR assumption for the semantics of read/write in every case. Of course, there would still be a Holy Trait (a.k.a., Virtue?) that would declare when the CBR assumption is safe.
Continue to enforce CBR for read/write as part of that function's interface, but provide another pair of analogous functions for types where CBR doesn't hold.

I'll start down the path for 1. but I also want to make sure that removing CBR from the stated semantics for read/write is OK.

twadleigh · 2021-11-04T23:06:17Z

Super stale. Closing.

twadleigh changed the title ~~read for array of immutable fails if the file footprint is not the same as the memory footprint.~~ read! in iostream.h shadows read! in io.h Dec 13, 2014

twadleigh changed the title ~~read! in iostream.h shadows read! in io.h~~ read! in iostream.jl shadows read! in io.jl Dec 13, 2014

JeffBezanson added the domain:io Involving the I/O subsystem: libuv, read, write, etc. label Jan 28, 2016

vtjnash changed the title ~~read! in iostream.jl shadows read! in io.jl~~ read! on byte array ignores custom read methods defined on the eltype Mar 14, 2016

twadleigh closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`read!` on byte array ignores custom `read` methods defined on the eltype #9327

`read!` on byte array ignores custom `read` methods defined on the eltype #9327

twadleigh commented Dec 12, 2014

twadleigh commented Dec 12, 2014

twadleigh commented Dec 12, 2014

twadleigh commented Dec 13, 2014

timholy commented Dec 13, 2014

twadleigh commented Dec 13, 2014

JeffBezanson commented Dec 13, 2014

JeffBezanson commented Dec 13, 2014

twadleigh commented Dec 13, 2014

twadleigh commented Dec 20, 2014

twadleigh commented Nov 4, 2021

read! on byte array ignores custom read methods defined on the eltype #9327

read! on byte array ignores custom read methods defined on the eltype #9327

Comments

twadleigh commented Dec 12, 2014

twadleigh commented Dec 12, 2014

twadleigh commented Dec 12, 2014

twadleigh commented Dec 13, 2014

timholy commented Dec 13, 2014

twadleigh commented Dec 13, 2014

JeffBezanson commented Dec 13, 2014

JeffBezanson commented Dec 13, 2014

twadleigh commented Dec 13, 2014

twadleigh commented Dec 20, 2014

twadleigh commented Nov 4, 2021

`read!` on byte array ignores custom `read` methods defined on the eltype #9327

`read!` on byte array ignores custom `read` methods defined on the eltype #9327