Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disentangle map and collect generic functions #48510

Open
goretkin opened this issue Feb 3, 2023 · 9 comments
Open

disentangle map and collect generic functions #48510

goretkin opened this issue Feb 3, 2023 · 9 comments
Labels
domain:collections Data structures holding multiple items, e.g. sets

Comments

@goretkin
Copy link
Contributor

goretkin commented Feb 3, 2023

When I originally opened #39628 , I realized I should have made two issues, and didn't This is the non-doc portion of that issue.

Where possible, it is desirable to have properties such as

map(identity, c) == c

See #38150 (comment) . As that thread indicates, that property doesn't currently hold, because map is conflated with collect. To relate the behavior of map and collect, I expect the following property

map(f, collect(c)) == collect(map(f, c))

One example where the property doesn't hold is with Generators:

julia> c = (x for x in 1:4)
Base.Generator{UnitRange{Int64},typeof(identity)}(identity, 1:4)

julia> map(identity, c)
4-element Array{Int64,1}:
 1
 2
 3
 4

That property could hold with the following definitions:

Base.:(::(typeof(identity)), f) = f
Base.map(f, g::Base.Generator) = Base.Generator(f  g.f, g.iter)

This will break code that was using map for its collect behavior. Some examples (with some false positives from splatting arguments from a generator into map) can be found with the regex map\([^,]*,\s*\(.*for.*in .

@jakobnissen recently made similar comments on Slack

(For v2.0)

@N5N3
Copy link
Member

N5N3 commented Feb 4, 2023

I think we can do this for Iterators.map before 2.0?

@brenhinkeller brenhinkeller added the domain:collections Data structures holding multiple items, e.g. sets label Feb 7, 2023
@goretkin
Copy link
Contributor Author

That search link no longer works: here is an updated link:
https://juliahub.com/ui/Search?q=map%5C%28%5B%5E,%5D%2a,%5Cs%2a%5C%28.%2afor.%2ain&type=code&r=true

An example true positive of the search: https://github.com/zgornel/EmbeddingsAnalysis.jl/blob/e045d5c9169a29a68d66aad15297348de1863e8e/test/vocab_reduction.jl#L5

The current behavior (illustrated below) shows why changing the behavior of map on Generator to not also collect can be problematic:

julia> gen1 = (fill(i, (3, )) for i = 1:4)
Base.Generator{UnitRange{Int64}, var"#21#22"}(var"#21#22"(), 1:4)

julia> gen2 = (sum(vec) for vec in gen1)
Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}(var"#23#24"(), Base.Generator{UnitRange{Int64}, var"#21#22"}(var"#21#22"(), 1:4))

julia> arr2 = map(sum, gen1)
4-element Vector{Int64}:
  3
  6
  9
 12

julia> @show typeof(arr2) typeof(gen2)
typeof(arr2) = Vector{Int64}
typeof(gen2) = Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}
Base.Generator{Base.Generator{UnitRange{Int64}, var"#21#22"}, var"#23#24"}

julia> eltype(arr2)
Int64

julia> eltype(gen2)
Any

@uniment
Copy link

uniment commented Feb 18, 2023

An example true positive of the search:

It seems like that specific example could be fixed by introducing a method of permutedims that takes Generators? After that, the ./= operator works with generators.

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types? If so, then eltype could be made to work on generators.

If those two issues could be addressed, then is there any code that this fix would break?

@jariji
Copy link
Contributor

jariji commented Feb 18, 2023

@uniment said

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types?

Core.Compiler.return_type is undocumented and not totally reliable, but that's more or less what it does.

@uniment
Copy link

uniment commented Feb 19, 2023

Seems to work:

julia> Base.eltype(::Type{Base.Generator{I,F}}) where {I,F} =
           Core.Compiler.return_type(Tuple{F,eltype(I)})

julia> @btime eltype(i/j for i=1:4, j=1:4)
  1.000 ns (0 allocations: 0 bytes)
Float64

julia> @btime eltype(ch+1 for ch='a':'z')
  1.000 ns (0 allocations: 0 bytes)
Char

There's a discrepancy though, which can be fixed by special-casing Tuple:

julia> eltype(x+1 for x=(2,2.0))
Any

julia> Base.eltype(::Type{Base.Generator{T,F}}) where {T<:Tuple,F} =
           typejoin(map(P->Core.Compiler.return_type(Tuple{F,P}),Tuple(T.parameters))...)

julia> @btime eltype(x+1 for x=(2,2.0))
  1.000 ns (0 allocations: 0 bytes)
Real

(I had to convert T.parameters in a Tuple because calling map on a Core.svec currently collects into a Vector.)

Dunno if a specialization would be helpful or not:

julia> Base.eltype(::Type{Base.Generator{T,F}}) where {T<:NTuple{N,E},F} where {N,E} =
           Core.Compiler.return_type(Tuple{F,E})

@uniment
Copy link

uniment commented Feb 20, 2023

Making adjoint work with generators:

julia> Base.reverse(itr::Iterators.ProductIterator) = Iterators.ProductIterator(reverse(itr.iterators))

julia> Base.adjoint(g::Base.Generator{I,F}) where {I<:Iterators.ProductIterator{Tuple{Any,Any}},F} =
           Base.Generator(adjoint  g.f  reverse, reverse(g.iter))

julia> Base.adjoint(g::Base.Generator) = # for vectors
           Base.Generator(adjoint  g.f  (((_,x),)->x), Iterators.ProductIterator((1:1,g.iter)))

Test:

julia> collect((exp(i*im) for i=1:3)') == collect((exp(i*im) for i=1:3))'
true

julia> collect((exp(i+j*im) for i=1:3, j=1:4)') == collect((exp(i+j*im) for i=1:3, j=1:4))'
true

@uniment
Copy link

uniment commented Feb 20, 2023

Making permutedims work with generators:

julia> Base.permutedims(g::Base.Generator{<:Iterators.ProductIterator}, perm) =
           Base.Generator(g.f  (args -> map(p -> args[p], perm)), Iterators.ProductIterator(map(p -> g.iter.iterators[p], perm)))

julia> Base.permutedims(g::Base.Generator{<:Iterators.ProductIterator}) = permutedims(g, (2, 1))

julia> Base.permutedims(g::Base.Generator) = permutedims(Base.Generator(g.f  ((x,_),)->x, Iterators.ProductIterator((g.iter, 1:1))))

Test:

julia> collect(permutedims(x for x=1:3)) == permutedims(collect(x for x=1:3))
true

julia> collect(permutedims(i+j for i=1:3, j=1:4)) == permutedims(collect(i+j for i=1:3, j=1:4))
true

julia> collect(permutedims((i+j+k for i=1:3, j=1:4, k=1:5), (3,1,2))) == permutedims(collect(i+j+k for i=1:3, j=1:4, k=1:5), (3,1,2))
true

@goretkin
Copy link
Contributor Author

goretkin commented Feb 23, 2023

An example true positive of the search:

It seems like that specific example could be fixed by introducing a method of permutedims that takes Generators? After that, the ./= operator works with generators.

I wonder if there's an efficient way to ask the compiler what the return type of a function will be, given certain argument types? If so, then eltype could be made to work on generators.

If those two issues could be addressed, then is there any code that this fix would break?

This has been discussed before, but I'll try to summarize here. There is a considerable extent to which the functional/behavioral semantics of Julia code should not depend on type inference, which exists almost exclusively in service of generating efficient code.

There has been a deliberate choice not to rely on Core.Compiler.return_type in many situations that are tempting to use it. I believe map does use it, however.

julia> Core.Compiler.return_type(sin, Tuple{Int64})
Float64

julia> Core.Compiler.return_type(sin, Tuple{Int64})
Float64

julia> sum(Vector{Float64}())
0.0

julia> map(sin, 1:0)
Float64[]

julia> sum(Base.Generator(sin, 1:0))
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
[...]

julia> sum(Base.Generator(identity, 1:0))
0

collect is actually fairly complicated, and I think going down the approach of aligning the interface to Base.Generator to look like the interface to the return value of collect (e.g. Vector{Any}) is more involved than it may appear.

@uniment
Copy link

uniment commented Feb 24, 2023

There has been a deliberate choice not to rely on Core.Compiler.return_type in many situations that are tempting to use it. I believe map does use it, however.

An argument can be made that Generators are exceptional enough to be granted special consideration. In #34674 it is said,

generators are said to be the lazy variant of map

this is consistent with that view.

I think going down the approach of aligning the interface to Base.Generator to look like the interface to the return value of collect (e.g. Vector{Any}) is more involved than it may appear.

"The journey of a thousand miles begins with one step." - Lao Tzu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

5 participants