Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Implement Map.deep_merge/2 and Map.deep_merge/3 #5339

Closed
wants to merge 1 commit into from

Conversation

PragTob
Copy link
Contributor

@PragTob PragTob commented Oct 19, 2016

This was first proposed in https://groups.google.com/forum/#!topic/elixir-lang-core/ak3kVqJ4-8g

People seemed generally in favor of it but the need for some adjustable behavior was made clear quickly as people had differing requirements (what about lists etc.?), hence there was also a need for Map.deep_merge/3.

To my knowledge (not knowing all the core members by heart) there was no green light by elixir-core members for this feature but some rather popular devs agreed. The discussions got a lot into the implementation/API details so I figured drafting up an implementation was the best course of action to see if this was truly a good idea (due to some delays it took me until now to finish up a half done implementation).

Please don't feel compelled to just accept it cause someone made the work. I'm perfectly aware this might be rejected and in that case will create a deep_merge hex package. I'm also aware, that for this to be fully fledged there should be a Keyword list version but only wanted to implement that once the Map version/general idea is green lit :)

Implementation Details + Performance

I decided to implement deep_merge/2 in terms of deep_merge/3 - had a specific implementation for it (called deep_merge_specific in the benchmark) that was only slightly faster (you can look it up here) so I decided for the easier to maintain version.

Benchmark code (from my elixir_playground repo - basically a map with 50 keys + some deeply nested structures:

base_map = (0..50)
           |> Enum.zip(300..350)
           |> Enum.into(%{})

# In the end those are 2 maps with 51 flat keys plus these
orig = Map.merge base_map, %{150 => %{1 => 1, 2 => 2}, 155 => %{y: :x}, 170 => %{"foo" => "bar"}, z: %{ x: 4, y: %{a: 1, b: %{c: 2}, d: %{"hey" => "ho"}}}, a: %{b: %{c: %{a: 99, d: "foo", e: 2}}, m: [33], i: 99, y: "bar"}, b: %{y: [23, 87]}, z: %{xy: %{y: :x}}}
new = Map.merge base_map, %{150 => %{3 => 3}, 160 => %{a: "b"}, z: %{ xui: [44], y: %{b: %{c: 77, d: 55}, d: %{"ho" => "hey", "du" => "nu", "hey" => "ha"}}}, a: %{b: %{c: %{a: 1, b: 2, d: "bar"}}, m: 12, i: 102}, b: %{ x: 65, y: [23]}, z: %{ xy: %{x: :y}}}

simple = fn(_key, _base, override) -> override end

Benchee.run %{
  formatters: [&Benchee.Formatters.Console.output/1,
               &Benchee.Formatters.PlotlyJS.output/1],
  plotly_js: %{file: "deep_merge_bench.html"}
},
%{
  "Map.merge/2"           => fn -> Map.merge orig, new end,
  "Map.merge/3"           => fn -> Map.merge orig, new, simple end,
  "deep_merge/2"          => fn -> DeepMerge.deep_merge orig, new end,
  "deep_merge_specific/2" => fn -> DeepMerge.deep_merge_specific orig, new end
}

Results:

tobi@happy ~/github/elixir_playground $ mix run bench/deep_merge.exs 
Erlang/OTP 19 [erts-8.1] [source-4cc2ce3] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
Elixir 1.3.4
Benchmark suite executing with the following configuration:
warmup: 2.0s
time: 5.0s
parallel: 1
Estimated total run time: 28.0s

Benchmarking Map.merge/2...
Warning: The function you are trying to benchmark is super fast, making measures more unreliable! See: https://github.com/PragTob/benchee/wiki/Benchee-Warnings#fast-execution-warning

Benchmarking Map.merge/3...
Benchmarking deep_merge/2...
Benchmarking deep_merge_specific/2...

Name                            ips        average  deviation         median
Map.merge/2               2098.47 K        0.48 μs    ±13.25%        0.47 μs
Map.merge/3                 81.08 K       12.33 μs    ±31.98%       12.00 μs
deep_merge_specific/2       68.29 K       14.64 μs    ±37.77%       13.00 μs
deep_merge/2                68.14 K       14.68 μs    ±31.47%       14.00 μs

Comparison: 
Map.merge/2               2098.47 K
Map.merge/3                 81.08 K - 25.88x slower
deep_merge_specific/2       68.29 K - 30.73x slower
deep_merge/2                68.14 K - 30.80x slower

ips benchmark comparison

As one can see, deep_merge is a lot slower than merge/2 but that is largely due to the fact that merge/3 is a lot slower than merge/2 (that's why it's included in the benchmark but with the same behaviour as merge/2) - merge/2 just delegates to Erlang while merge/3 is a custom implementation that we can probably improve upon (in some other PR).

Ideas + feedback welcome :)

@@ -24,7 +24,7 @@ defmodule KeywordTest do

test "implements (almost) all functions in Map" do
assert Map.__info__(:functions) -- Keyword.__info__(:functions) ==
[from_struct: 1]
[deep_merge: 2, deep_merge: 3, from_struct: 1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned before, I know that this is less than ideal but I wanna get the general deep_merge through before tackling it for Keyword

@@ -474,6 +474,77 @@ defmodule Map do
end

@doc """
Merges two maps similar to `Map.merge/2`, but with the important difference
Copy link
Member

@josevalim josevalim Oct 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line of the documentation should always be a short summary.

@josevalim
Copy link
Member

I am still on the fence. It feels the most common use case would be something that knows how to merge across different data-types. Map.deep_merge/2 would do only half of the job, Keyword.deep_merge/2 the other half, but nothing that would know how to merge across types with reasonable semantics (and reasonable semantics vary per project).

I am particular biased because I have only needed deep_merge twice and it was once for Phoenix and once for Elixir.

@olafura
Copy link
Contributor

olafura commented Oct 19, 2016

There are definitely a lot of use cases for deep merge, we use it heavily
and we made a library for us to handle those cases.
https://github.com/planswell/kitchen-sink

We have both Map and Keyword deep_merge/2. We don't do list merges
for the values.

Plus I just commited a patch to Absinthe to fix a deep merge problem:
absinthe-graphql/absinthe#179

Maybe Enum.deep_merge is apt, though it would be hard to do stuff like
list merges well, though it might be possible.

@lexmag
Copy link
Member

lexmag commented Oct 19, 2016

I didn't comment in the mailing list discussion as I got feeling that people agreed that it could be useful but looks limited and doesn't seem like it belongs to the stdlib.
I feel more of 👎 on it, mostly, as @josevalim said, for semantics reason. For example, proposed implementation silently merges structs as well, it can be desirable behaviour in some cases but not always (e.g. Task, Stream structs).
As for me, I have needed "deep merge" once, and for that case proposed implementation would not work.

@olafura
Copy link
Contributor

olafura commented Oct 19, 2016

@lexmag We handle merging structs in our library ;)

@PragTob
Copy link
Contributor Author

PragTob commented Oct 19, 2016

This is just a draft, I wasn't aware of a problem with structs (which is my bad) but that would most certainly be fixable.

I'm of course also open to write another implementation that lives in Kernel and as a general deep_merge handles both Maps and Keywords (that was sort of the long term plan anyhow :) ).

Personally I used deep_merge a lot more often than some of the common Enum methods and noticed that Mix.Config also needed to implement it itself. Might be my own fault for reaching for nested structures to easily, though :)

Of course, I also understand that it is not the most common method overall and as I said perfectly fine with it if you say it's not stdlib worthy and people should write it themselves/rely on some other source :)

Thanks for the discussion!

@josevalim
Copy link
Member

@lexmag We handle merging structs in our library ;)

Sorry for being lazy and not checking the source but how was it solved? In my mind the correct way would be to use a protocol?

@olafura
Copy link
Contributor

olafura commented Oct 19, 2016

@josevalim So it's pretty basic I detect a struct and create an empty struct of the same kind then remove the default values from the map that needs to be merged. So then we have a clean map to merge with the struct. We also remove the __struct__ key value. We are using a Enum.filter.

If you want to set a variable to default the you just do that outside of the merge.

@olafura
Copy link
Contributor

olafura commented Oct 19, 2016

@josevalim I think protocols are a good solution, but the map one would just have to be aware of structs.

@josevalim
Copy link
Member

josevalim commented Oct 19, 2016

@olafura yup. Every implementation would need to have a logic that checks if two things are of the same type before invoking the protocol. The protocol could also be implemented for Any with a default of taking the second value as is.

I believe though there are still too many questions to be answered before moving forward. Using protocols seems to be the most correct but it feels a very niche feature to justify adding a new protocol to the language.

@josevalim josevalim closed this Oct 19, 2016
@olafura
Copy link
Contributor

olafura commented Oct 19, 2016

@josevalim I'm wondering if it would be good to do a staging like library for stdlib then we can just see how many people are using those staging functions roughly (we wouldn't see some private code bases like ours). I know that there is Experimental, maybe it would be a good candidate?

@josevalim
Copy link
Member

I don't believe it needs to be anything special for stdlib, just a regular library is fine. There is no reason to Experimental as well, it is just a random namespace after all. The only reason we used Experimental for GenStage was to avoid naming conflicts when it is merged into Elixir but as long as you pick a sane namespace, like DeepMerge, it should be fine. It is unlikely we would add it to Elixir as DeepMerge.

@PragTob PragTob deleted the map-deep_merge branch November 10, 2016 19:41
@PragTob
Copy link
Contributor Author

PragTob commented Nov 10, 2016

For anyone who may be stumbling upon this, I took the feedback from here and implemented it in the deep_merge library.

  • It handles both maps and keyword lists
  • It does not merge structs or maps with structs…
  • …but you can implement the simple DeepMerge.Resolver protocol for types/structs of your choice to also make them deep mergable
  • a deep_merge/3 variant that gets a function similar to Map.merge/3 to modify the merging behavior, for instance in case you don’t want keyword lists to be merged or you want all lists to be appended

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants