Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversions to and from hashmap #230

Closed
clauswilke opened this issue Jul 18, 2021 · 14 comments · Fixed by #254
Closed

Conversions to and from hashmap #230

clauswilke opened this issue Jul 18, 2021 · 14 comments · Fixed by #254

Comments

@clauswilke
Copy link
Member

See #229 (comment)

Should hashmaps be converted to lists or environments by default? I think both conversions should be available, and then one can be made the default if the user doesn't specify what result they want.

There's also the question what happens if an unnamed list is converted to a hashmap. It should probably fail with an error.

@yutannihilation
Copy link
Contributor

There's also the question what happens if an unnamed list is converted to a hashmap. It should probably fail with an error.

One more failing case is that when the list has duplicated name of entries (e.g. list(x = 1, x = 2)).

We might be able to have a one-way conversion by implementing only Into (c.f. [https://stackoverflow.com/a/29815270]), but, reading Into's document I think we should probably avoid implementing both From and Into if the conversion is not bidirectional.

So, maybe it should be TryFrom instead of From?

@yutannihilation
Copy link
Contributor

yutannihilation commented Jul 19, 2021

On the other hand, conversion between HashMap and Envrionment won't fail usually, but it's lossy in that HashMap cannot contain the information about the parent environment. In my understanding, From can be used for lossy conversions, but I'm not fully confident. I want to look around other crates' implementations.

@andy-thomason
Copy link
Contributor

The conversion is not ideal for lists as it doesn't conserve the order either.

But as Hiroaki says, environments do not preserve order either and take only the last duplicate assignment.

I primarily added hashmaps for the case where you just wanted to take an anonymous list or env and map
it to a R object. I've done this in RCpp a few times.

I hope to have a Serde running for structs at some point which will provide more complex options
to convert R to Rust and vice versa as well as to serialise Rust object in RDA files.

@brendanrbrown
Copy link
Contributor

for the issue of ordering, in the case of named list conversion: is BTreeMap worth considering for this conversion, rather than hashmap? at least iter for BTreeMap orders in a predictable way, sorting on keys. perhaps there are downsides i am not aware of, and it doesn't entirely resolve the problem for named list conversion.

https://doc.rust-lang.org/std/collections/struct.BTreeMap.html#method.iter

relevantiter call in the hashmap conversion presently:

let res: Robj = List::from_values(val.iter().map(|(_, v)| v)).into();

@clauswilke
Copy link
Member Author

@brendanrbrown I feel you're raising a slightly different question. The purpose of extendr is to provide bindings, and the user should decide what data structure to use. So if somebody wants to work with hashmaps they should be able to do so, and if somebody wants to work with BTreeMaps they should also be able to do so. So by all means, yes, we should support BTreeMaps. But that's a different question than what the default choices should be for hashmaps.

If you'd like to provide support for BTreeMaps you're welcome to add it.

@brendanrbrown
Copy link
Contributor

yes, good point

@yutannihilation
Copy link
Contributor

yutannihilation commented Jul 25, 2021

Sorry, I think I was a bit confused at the time of writing the above comment. Let me summarize my thoughts here.

In general, conversions can be categorized by the two characteristics

  • safe / fallible
  • lossless / lossy

and it seems the sense is that we should use From or TryFrom only when the conversion is lossless, while probably no official document explicitly prohibits using them for lossy ones (and actually there are such implementations). For example, the FromLossy / TryFromLossy RFC describes as following (this RFC isn't accepted yet, but I believe this part should be what everyone can agree):

  • safe, exact conversions (e.g. u32u64) are handled by the From trait (RFC 529)
  • fallible conversions (e.g. u32i8) are handled by the TryFrom trait (RFC 1542)
  • lossy conversions (e.g. i64f32)
  • lossy fallible conversions (e.g. f64u64)

On the other hand, regarding HashMap, the R-to-Rust conversions are lossy. There's no problem to implement Rust-to-R direction, by the way.

from to fallible? lossy? note
HashMap list no no
HashMap environment no no it' not obvious what env should be the parent
list HashMap yes (fails when there are unnamed or duplicated elements) yes (the order will be lost)
environment HashMap no yes (the order and the parent env will be lost)

I think the problem is that these two are conflicting.

  • our strategy to provide all R-to-Rust conversions via TryFrom
  • the common assumption that TryFrom is for lossless conversions

so, our choice can be:

  1. tweak extendr macro to accept some other way than TryFrom, probably a dedicated constructor method (e.g. to_hashmap(), from_list())
  2. violate the common assumption; just implement TryFrom and don't care the lossiness

and I'm inclined to accept the option 2. Option 1 should be technically possible, but I don't feel it's worth the effort.

(I do want to document the general rule of our conversions. I'd appreciate your comments and suggestions on #244!)

@clauswilke
Copy link
Member Author

After reading the Rust RFC discussion, I'm thinking that option 1 is better. It makes the conversion explicit without being much more work. In this scenario, if I wanted to write a function that accepts a hashmap, I'd have to instead accept a list and then call an explicit constructor to convert the list to a hashmap. That's one additional line of code (so not a big burden) and it guarantees that I've at least thought about the fact that I'm converting a list to a hashmap.

I would prefer the same approach when returning a hashmap back to R, as it would require me to think about whether I want to return a list or an environment. And in the case of environment, I'd probably also have to figure out and explicitly state what the parent env is.

So all around a cleaner solution, for a minor cost in additional code.

@clauswilke
Copy link
Member Author

In other words, instead of writing this:

#[extendr]
fn list_str_hash(x: HashMap<&str, Robj>) {
    // do some computation on x
}

we'd have to write something like this:

#[extendr]
fn list_str_hash(y: List) {
    let x = HashMap<&str, Robj>::from_list(y);
    // do some computation on x
}

And similarly when returning to R. I like the second option better.

@yutannihilation
Copy link
Contributor

yutannihilation commented Jul 26, 2021

Thanks. So, what you are suggesting is that it's users who write the conversion explicitly, not that #[extendr] macro does automatically, right? If so, I think I agree with you.

Sorry, I enumerated the options under the assumption that we are to provide auto-conversion. Obviously, we have an option not to provide. So, the complete list of options would be:

  1. provide auto-conversion in #[extendr] macro
    1. tweak extendr macro to accept some other way than TryFrom, probably a dedicated constructor method (e.g. to_hashmap(), from_list())
    2. violate the common assumption; just implement TryFrom and don't care the lossiness
  2. leave the conversion as users' job
    1. provide a dedicated constructor method (e.g. to_hashmap(), from_list())
    2. provide nothing (I think this option is improbable, but for completeness)

@clauswilke
Copy link
Member Author

Exactly. I think 2.i is the right choice for types for which TryFrom creates issues of data loss.

@yutannihilation
Copy link
Contributor

I see. Then I agree with you!

@clauswilke
Copy link
Member Author

@andy-thomason Are you Ok with this plan of not providing TryFrom for hashmaps but instead providing a constructor that needs to be called explicitly?

@andy-thomason
Copy link
Contributor

Yes. That sounds like a good plan. It is much more explicit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants