Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow map to work on Iterators.Pairs #38150

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Oct 23, 2020

map works on many iterators, but not on pairs, because Iterators.Pairs <: AbstractDict. Is there a good reason to prevent this?

julia> map(identity, Iterators.product(rand(2), 1:3))
2×3 Matrix{Tuple{Float64, Int64}}:
 (0.722276, 1)  (0.722276, 2)  (0.722276, 3)
 (0.122676, 1)  (0.122676, 2)  (0.122676, 3)

julia> map(identity, enumerate(rand(2)))
2-element Vector{Tuple{Int64, Float64}}:
 (1, 0.22322250333280813)
 (2, 0.6522875545740439)

julia> map(identity, pairs(rand(2)))
ERROR: map is not defined on dictionaries

Stumbled across this here: https://discourse.julialang.org/t/enumerate-multi-dimensional-array/48877/7

Edit: earlier discussion at #5794, about Dict as well as pair.

@nalimilan
Copy link
Member

Counter-argument: if we are serious about Pairs being AbstractDict, we have to keep their implementations of map consistent. So we should decide first what map should do on AbstractDict before implementing it for Pairs.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 31, 2020

So I guess the concern (in brief) is that if map(f, dict) were later to do something like iterate over values & keep the keys to return another Dict, then Pairs would become an odd man out by iterating pairs.

julia> using Dictionaries

julia> Dictionary('a':'c', 2:4)
3-element Dictionary{Char,Int64}
 'a' │ 2
 'b' │ 3
 'c' │ 4

julia> map(sqrt, ans)
3-element Dictionary{Char,Float64}
 'a' │ 1.4142135623730951
 'b' │ 1.7320508075688772
 'c' │ 2.0

julia> map(x->x^2, x for x in ans)
3-element Array{Float64,1}:
 2.0000000000000004
 2.9999999999999996
 4.0

Another option would be to make collect do the simple iteration. With a type, collect(Float64, 1:5) works, and this could be generalised to

julia> Base.collect(f, itr) = collect(Base.Generator(f, itr));

julia> Base.collect(f, itrs...) = collect(Base.Generator(Base.splat(f), Iterators.product(itrs...)));

julia> map(enumerate(rand(Bool, 2,10))) do (i,v)
         i * v
       end
2×10 Array{Int64,2}:
 0  0  0  7   0  11  13  15  17  19
 2  4  6  0  10   0  14   0   0  20

julia> collect(pairs(rand(Bool, 2,10))) do (i,v)
           prod(i.I) * v
       end
2×10 Array{Int64,2}:
 1  0  3  4  5  6  0  0  0  10
 2  0  0  8  0  0  0  0  0   0

Which might be nice in any case, as a convenient way to write outer products with do notation:

julia> collect(1:2, 4:10) do x,y
         string(x, '/', y)
       end
2×7 Matrix{String}:
 "1/4"  "1/5"  "1/6"  "1/7"  "1/8"  "1/9"  "1/10"
 "2/4"  "2/5"  "2/6"  "2/7"  "2/8"  "2/9"  "2/10"

julia> ans == [string(x, '/', y) for x in 1:2, y in 4:10]
true

@nalimilan nalimilan added the status:triage This should be discussed on a triage call label Oct 31, 2020
@JeffBezanson
Copy link
Sponsor Member

I've always been skeptical of whether Iterators.Pairs is really a dictionary. The only thing it's supposed to be is an iterator over index-value pairs of some collection.

@JeffBezanson
Copy link
Sponsor Member

Oh right, we actually have a map(f, ::Any) method, so we have already conflated "map" with "collect the elements of an iterator into an array", which I don't think are the same thing. That's a bit unfortunate. For example something like map(identity, x) == x should hold.

@mcabbott
Copy link
Contributor Author

For example, map(identity, x for x in pairs(rand(2))) does now work, and gives a vector of Pairs.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 12, 2020

Right, I think that example shows why map(identity, x) == x is not able to hold.

@@ -272,6 +272,8 @@ setindex!(v::Pairs, value, key) = (v.data[key] = value; v)
get(v::Pairs, key, default) = get(v.data, key, default)
get(f::Base.Callable, v::Pairs, key) = get(f, v.data, key)

Base.map(f, A::Pairs) = collect(Base.Generator(f,A))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps should be general (and in abstractdict.jl)?

Suggested change
Base.map(f, A::Pairs) = collect(Base.Generator(f,A))
Base.map(f, A::AbstractDict) = collect(Base.Generator(f, A))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triage says yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this means that map(f, d::Dict) would return a vector? Then it's the same as map(f, d) and we have no way to apply a function and get a dictionary back like in Dictionaries.jl?

@oscardssmith oscardssmith removed the status:triage This should be discussed on a triage call label Nov 4, 2021
@aplavin
Copy link
Contributor

aplavin commented Sep 8, 2022

Really looking forward for map(pairs(A))!
Currently, map(pairs(A)) do (i, x) ... doesn't work in Julia, while map(enumerate(A)) do (i, x) ... does. So people are likely to write the latter even when they know about pairs, and this doesn't help in gaining a wider support for non-1-based arrays.

I believe it makes sense to define map(Pairs) (as this PR does) instead of map(AbstractDict). The former is basically unambiguous, while there are at least two different potential meaning for the latter, both totally reasonable.

However, would be ideal to have map(pairs(arr)) do (i, x) as close to map(arr) do x as possible. In particular, it should preserve the original array type: think OffsetArray, StructArray, KeyedArray, etc. This should be possible with similar, I think...

@@ -322,6 +322,8 @@ setindex!(v::Pairs, value, key) = (getfield(v, :data)[key] = value; v)
get(v::Pairs, key, default) = get(getfield(v, :data), key, default)
get(f::Base.Callable, v::Pairs, key) = get(f, getfield(v, :data), key)

Base.map(f, v::Pairs) = Base.collect_similar(values(v), Base.generator(f, v))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version handles unusual arrays better, probably needs more tests:

julia> using StaticArrays

julia> z = SA[10, 20, 30];

julia> map(pairs(z)) do (ind, val)
         val + ind
       end
3-element MVector{3, Int64} with indices SOneTo(3):
 11
 22
 33

julia> using StructArrays

julia> x = StructArray([(sqr=i^2, str=string(i)) for i in 1:3]);

julia> map(pairs(x)) do (ind, nt)
         (; ind, nt)
       end
3-element StructArray(::Vector{Int64}, ::Vector{NamedTuple{(:sqr, :str), Tuple{Int64, String}}}) with eltype NamedTuple{(:ind, :nt), Tuple{Int64, NamedTuple{(:sqr, :str), Tuple{Int64, String}}}}:
 (ind = 1, nt = (sqr = 1, str = "1"))
 (ind = 2, nt = (sqr = 4, str = "2"))
 (ind = 3, nt = (sqr = 9, str = "3"))

With just collect not collect_similar (as in first commit), a StaticArray lead to a SizedVector, and a StructArray leads to a Vector. I believe that offsets should work with either version:

julia> using OffsetArrays;

julia> y = OffsetArray([10, 20, 30], 4);

julia> map(pairs(y)) do (ind, val)
         val + ind
       end
3-element OffsetArray(::Vector{Int64}, 5:7) with eltype Int64 with indices 5:7:
 15
 26
 37

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 8, 2022

it should preserve the original array type:

There are many counter examples to this, since map is expected to accept any iterable(s), and produce an array capable of holding the output. There is no particular reason to expect that output type must be similar to the input type(s).

@aplavin
Copy link
Contributor

aplavin commented Sep 9, 2022

Regular map(array) does its best to preserve the array type by calling similar somewhere inside. And this is extremely useful in generic programming, for both performance and convenience. Most other array functions try to behave the same when keeping the array type makes sense.

So, I don't really understand why

There is no particular reason to expect that output type must be similar to the input type(s).

Regular map does this, and making map(pairs(array)) preserve the array type turns out pretty simple, as evidenced by @mcabbott's latest commit.

@mcabbott
Copy link
Contributor Author

Yes, it seems nice to try to call similar in the same way as map does, even if not required. But this is a small detail, and the big question is whether to do this at all.

I'm not sure why Pairs is an AbstractDict, maybe an AbstractArray would have been better. Probably that can't be changed now. We could introduce a non-dictionary eachpair or something. I guess this PR instead just makes it less dict-like.

Note BTW that it does already work in map(f, x, y), and this knows about the shape of the array:

julia> map([10 20; 30 40], pairs(rand(2,2))) do x, (y, z)
         x + y[1]
       end
2×2 Matrix{Int64}:
 11  21
 32  42

julia> axes(pairs(rand(3,4)))
(Base.OneTo(3), Base.OneTo(4))

Pairs also doesn't support broadcasting. It could be fixed with something like this:

Base.broadcastable(p::Base.Pairs) = p
Base.ndims(::Type{Base.Pairs{A,B,C,D}}) where {A,B,C,D}  = ndims(D)

@aplavin
Copy link
Contributor

aplavin commented Sep 22, 2022

From the comments above, it looks like there is a total consensus that map(pairs) should work. Any issues remain with the implementation in this PR? Broadcasting mentioned by @mcabbott can be added (or not) separately, but it would be great to get map(pairs) into the next julia version.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 6, 2022

Bump, for 1.9 perhaps?

Tests probably need elaborating for the collect_similar idea, and maybe broadcastable should be added too... is this worth doing?

@aplavin
Copy link
Contributor

aplavin commented Aug 31, 2024

A suggestion that may make the PR more focused, less controversial, and potentially lead to even more generic code in the future: to make map overloads only for specific pairs types at first.
Ie, the current PR implementation could be constrained to Pairs{<:AbstractArray}, then for Pairs{<:NamedTuple} it would be map(f, keys(nt), values(nt)), etc. This would ensure code using map(pairs) remains performant and doesn't eg turn tuples into vectors.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants