Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traits for lossy conversions #3415

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Aloso
Copy link

@Aloso Aloso commented Apr 14, 2023

Add lossy numeric conversions as an alternative to the as operator, and deprecate as for lossy numeric casts in a future edition, so

let n = f64::PI as usize;

becomes

let n: usize = f64::PI.approx();
// or
let n = f64::PI.approx::<usize>();

RENDERED

This solves the problem that when you see as, you don't know what it does. Does it truncate? saturate? round towards zero? lose numeric precision? Or is it lossless? When as for lossy conversions is deprecated, as is guaranteed to be lossless.

@Lokathor
Copy link
Contributor

Bikeshed which sounds fake but that I'm really serious about: that name is too long.

@Aloso Aloso changed the title Traits for lossy conversions Lossy conversions Apr 14, 2023
@Aloso
Copy link
Author

Aloso commented Apr 14, 2023

@Lokathor I changed the name.

@Lokathor
Copy link
Contributor

Clarification: the method name is too long, not the PR name :3

@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Apr 14, 2023
@Aloso
Copy link
Author

Aloso commented Apr 14, 2023

@Lokathor the section Future possibilities: Inherent methods has a possible solution for that.

EDIT: The inherent methods are now the main part of the RFC.

@Aloso Aloso changed the title Lossy conversions Traits for lossy conversions Apr 14, 2023
@thomcc
Copy link
Member

thomcc commented Apr 14, 2023

Deprecating as seems like a step too far to me, as it will cause an enormous amount of churn in essentially every Rust codebase for what is arguably a stylistic preference.

Not to mention, for some time in the future, as will still be required, as trait usage in const context is still some ways off.

@Aloso
Copy link
Author

Aloso commented Apr 14, 2023

@thomcc I hope that the need for lossy conversions will decrease by making the compiler smarter. For example with this proposal, called "pattern types", the compiler knows the range in which a number is. If this range fits in the type you want to cast to, you can use as instead of .truncating_into(), and the compiler will check that is in fact lossless:

fn foo(x: u32 is 0..1000) {
    let _ = x as u16;
}

This means that you don't have to spell long method names in most cases, and you can be more confident that the code is correct, because the compiler checks that x fits into a u16.

@Lokathor
Copy link
Contributor

if it's less terse than as u8 or similar then it's annoyingly long when you have to stick it in a math expression

@Aloso
Copy link
Author

Aloso commented Apr 14, 2023

You do not want truncation to happen in a math expression, do you? Then using as is likely a bug, and what you actually need is .try_into()?.

@Lokathor
Copy link
Contributor

I most often use as for going one way or the other between f32 and an int type of some size, and it does exactly what i expect, no bug.

@Aloso
Copy link
Author

Aloso commented Apr 14, 2023

I think I'll update the RFC to include the inherent methods discussed in "future possibilities". Then instead of as f32 you can write .approx() (assuming that type inference succeeds), which is only 2 characters longer.

EDIT: Done.

@programmerjake
Copy link
Member

I think wrapping should be used as the name instead of truncating, since technically converting -0x1234i16 as i8 == -0x34i8 or 0x56789ABCu32 as i16 == -0x6544i16 is wrapping (it wraps around 0x12 and 0x5679 times respectively assuming I did my math right).

for method naming, I like:

  • wrap_to (int -> int for all ints)
  • sat_to (int -> int for all ints)
  • trunc_sat_to (for floats -> ints conversion, since, if you ignore NaN, they are equivalent to saturate(trunc(v)) -- trunc in the sense of rounding toward zero; implemented for all floats/ints -> all ints; int -> int is identical to sat_to)
  • lossy_to (for f64 -> f32 and large ints -> float, but implemented for all ints & floats)
// all of these traits are designed to be extensible for things like
// BigInt or BigFloat or a 3rd party u256 or u5 or f16 implementation
pub trait WrapFrom<T> {
    fn wrap_from(v: T) -> Self;
}
pub trait WrapTo<T> {
    fn wrap_to(self) -> T;
}
impl<F, T> WrapTo<T> for F {..}
// maybe also have impl<T> WrapFrom<T> for T or impl<T: Into<U>, U> WrapFrom<T> for U

// SatFrom/SatTo/TruncSatFrom/TruncSatTo/LossyFrom/LossyTo just like above
impl all-primitive-ints-and-floats {
    pub fn wrap_to<T>(self) where Self: WrapTo<T> {
        WrapTo::wrap_to(self)
    }
    pub fn sat_to<T>(self) where Self: SatTo<T> {
        SatTo::sat_to(self)
    }
    pub fn trunc_sat_to<T>(self) where Self: TruncSatTo<T> {
        TruncSatTo::trunc_sat_to(self)
    }
    pub fn lossy_to<T>(self) where Self: LossyTo<T> {
        LossyTo::lossy_to(self)
    }
}
macro_rules! impl_from {
    ([$($from:ident),*] => $to:ident $Tr:ident::$f:ident($v:ident) $body:block) => {
        $(impl $Tr<$from> for $to {
            #[inline]
            fn $f($v: $from) -> Self $body
        })*
    };
    ($from:tt => [$($to:ident),*] $Tr:ident::$f:ident($v:ident) $body:block) => {
        $(impl_from!($from => $to $Tr::$f($v) $body);)*
    };
}
impl_from! {
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize] =>
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize]
    WrapFrom::wrap_from(v) {
        v as _ // can be compiler magic instead of `as` in the future
    }
}
impl_from! {
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize] =>
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize]
    SatFrom::sat_from(v) {
        if v < 0 {
            if Self::MIN < 0 {
                if v.wrap_to::<i128>() < Self::MIN.wrap_to::<i128>() {
                    Self::MIN
                } else {
                    v.wrap_to()
                }
            } else {
                0
            }
        } else if v.wrap_to::<u128>() > Self::MAX.wrap_to::<u128>() {
            Self::MAX
        } else {
            v.wrap_to()
        }
    }
}
impl_from! {
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize] =>
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize]
    TruncSatFrom::trunc_sat_from(v) {
        v.sat_to()
    }
}
impl_from! {
    [f32, f64] =>
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize]
    TruncSatFrom::trunc_sat_from(v) {
        v as _
    }
}
impl_from! {
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize, f32, f64] =>
    [u8, u16, u32, u64, u128, usize, i8, i16, i32, i64, i128, isize, f32, f64]
    LossyFrom::lossy_from(v) {
        v as _
    }
}

@Lokathor
Copy link
Contributor

@Aloso I share your concern about inference.

let rf = (ru as f32) / (u8::MAX as f32);

This is an entirely normal line of code you might see anywhere, but if these were turned into calls to approx() we might get some inference problems suddenly.

@pitaj
Copy link
Contributor

pitaj commented Apr 14, 2023

@lebensterben

this is too specific and dependent on the architecture. I think it's better to say it will drop least significant bits to fit the memory.

Don't you mean the opposite? Generally the most significant bits are dropped.

@lebensterben
Copy link

@pitaj

I meant to say least significant digit. (e.g. "4" in 0.1234)

@pitaj
Copy link
Contributor

pitaj commented Apr 15, 2023

Oh, you were talking about floats. Understood

@lebensterben
Copy link

@pitaj

I've deleted my previous comments because I was mistaken. TruncatingFrom is about integer types not floats.

@VitWW
Copy link

VitWW commented Apr 15, 2023

The Loosy is sound unclear and wide meaning.
Precise1 / SinglePrecise (as Single precision for f32) is a better one.
And Precise2 / DoublePrecise for f64 (from f128)
Squeezed/Crammed also looks much better than Saturating.

@Aloso
Copy link
Author

Aloso commented Apr 15, 2023

I have tried my best to incorporate the feedback into the RFC. Any further feedback is of course welcome!

@VitWW
Copy link

VitWW commented Apr 16, 2023

I like word truncate, but truncate is by definition a Real to Integer operation.
It is modular or remainder operation:

 42_u8.remainder::<i8>()  == 42
-24_i8.remainder::<u8>()  == 232  // u8::MAX + 1 - 24
232_u8.remainder::<i8>()  == -24  // u8::MAX + 1 - 232
280_i16.remainder::<u8>() == 24   // 280 % u8::MAX = 24  <<-- here

@VitWW
Copy link

VitWW commented Apr 16, 2023

And nearest is more mathematical, than saturate

 42_i8.nearest::<u8>()  == 42
-14_i8.nearest::<u8>()  == 0    // u8::MIN
169_u8.nearest::<i8>()  == 127  // i8::MAX
280_i16.nearest::<u8>() == 255  // u8::MAX

@Aloso
Copy link
Author

Aloso commented Apr 16, 2023

@VitWW the remainder has not the same result for negative integers. modulo also doesn't make sense when the target type is a signed integer.

EDIT: modular works if we define it as the only result for value (mod 2n) that lies in the output type's range, where n is the output's number of bits. But that doesn't seem more intuitive than truncate to me.

nearest is a good idea, but I went with saturate because that name is already used in the standard library.

@programmerjake
Copy link
Member

if the traits are implementable by 3rd party crates (so not sealed or perma-unstable), I strongly think there should be both From and Into versions and the inherent functions should use the Into versions, this allows more flexibility when 3rd party crates try to implement conversions between 2 3rd party types or a 3rd party type and a standard type from 2 independent crates, since the From traits can only be implemented in some crates, and the Into traits can only be implemented in other crates due to which type is the Self type and which is the trait argument and rust's rules that prevent overlapping implementations by restricting who can implement what.

@sunfishcode
Copy link
Member

The word "truncating" has an existing common meaning in the context of numeric conversions. This sense of the word truncate is also already present in Rust. It's confusing that this RFC is using the word "truncate" with a different meaning.

The proposed ApproxFrom here sometimes rounds-to-nearest and sometimes rounds-toward-zero. It sometimes preserves NaN and sometimes maps NaN to 0. What's the advantage of having an ApproxFrom trait that groups different kinds of conversions into a single interface, compared to perhaps adding the float conversions as inherent functions that can have interfaces specialized to their individual needs?

@VitWW
Copy link

VitWW commented Apr 16, 2023

@Aloso
Hm.., maybe congruent suits better then remainder.
Anyway, a reminder could be negative as well (Negative remainder r1 = -r is congruent ~ to positive reminder r2 = base - r) and vice versa (Positive remainder r1 = +r is congruent ~ to negative reminder r2 = r - base)
Base base = 2⁸ is always congruent ~ to zero: r0 = 0

-24_i8.congruent::<u8>()  == 232  // 2⁸ - 24
232_u8.congruent::<i8>()  == -24  // 232 - 2⁸
536_i16.congruent::<u8>() == 24   // 536 mod 2⁸

@Noratrieb
Copy link
Member

this has some overlap with rust-lang/libs-team#204

@Lokathor
Copy link
Contributor

wrap_to might provide a terse but still quite literal explanation of what's happening, eg: -24_i8.wrap_to::<u8>()

@programmerjake
Copy link
Member

wrap_to might provide a terse but still quite literal explanation of what's happening, eg: -24_i8.wrap_to::<u8>()

agreed, as mentioned previously: #3415 (comment)

@Aloso
Copy link
Author

Aloso commented Apr 17, 2023

I'm open to using the word "wrap" instead of "truncate" here, but I think the name wrap_to would be weird when the type is inferred, for example

fn foo(x: u8, y: i8) -> u8 {
    x + y.wrap_to()  // what?
}

I'm not a fan of the name trunc_sat_to, which is why I used a single method+trait for both truncation and wrapping.

@Lokathor
Copy link
Contributor

Honestly, would that even work? I think inference would just fail entirely since the int types have more than one Add impl, and so the compiler would get confused.

@programmerjake
Copy link
Member

Honestly, would that even work? I think inference would just fail entirely since the int types have more than one Add impl, and so the compiler would get confused.

yeah, it fails: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=25f46a7fabb838c24006cb166d2104fa

@Aloso
Copy link
Author

Aloso commented Apr 17, 2023

That is unfortunate. I just checked that x + y as _ also fails. So there might be fewer cases where the type can be omitted than I thought.

@AaronKutch
Copy link

As someone who has developed const capable bigint crates with no inference issues, I'm going to make the crazy suggestion that we might not want traits for these types of conversions. Stuff like From and TryFrom have a lot of generic multiplexing uses, but in the context of lossy conversions, there are a lot of edge cases (e.x. widening and shortening arithmetic around u8 and u128) and necessary specialization that tends to mean we always use macros or custom traits instead of std traits and generic functions when implementing over the different integers (I would still be using macro_rules everywhere because the lossy casts aren't the only thing). The many different types of lossy conversion is making the interaction between terseness, inference, constness and other things pretty bad. as casts have the type attached to them, and if you don't have the type in at least one consistent direction of the inference chain, it breaks down and we have turbofish everywhere.

What I would do is use macro_rules to separately define i64::wrap_to_u64(), f64::..._to_i32 and so on. So, we would have

const fn foo(x: u8, y: i8) -> u8 {
    x + y.wrap_to_u8()
}

We can really go to town on bikeshedding since the additional _i/u/fX suffix prevents a lot of collisions, in my crates we simply use x.to_i8(), x.to_u32(), x.to_f64() to denote the most straight forward kind of truncation and reinterpretation of the bits for integers, and round-to-even when floats are involved (ok, actually I use truncation for floats, but out of the 5 rounding modes that IEEE-754 has, the one that is optimal for 99% of real use cases is round-to-even or banker's rounding). If we add more rounding modes it will result in severe inflation of documentation, so we probably want a function that takes a rounding mode enum as an argument.

I would additionally suggest not adding direct conversions between integers that are both of different sizes and signedness. The thing that personally weirded me about as casts is the precise definition of what happens when you use something like u32 as i64. Does it do the sign conversion first and then potentially sign extend to 64 bits? Or, does it expand first to 64 bits and then make it signed? I can't remember at the moment what as does for that case, and instead I always separate it into two as casts. I have shot myself in the foot with this once with usize conversion because it can mean different things on different platforms.

@programmerjake
Copy link
Member

programmerjake commented Apr 20, 2023

I would additionally suggest not adding direct conversions between integers that are both of different sizes and signedness.

I strongly disagree.

The thing that personally weirded me about as casts is the precise definition of what happens when you use something like u32 as i64. Does it do the sign conversion first and then potentially sign extend to 64 bits? Or, does it expand first to 64 bits and then make it signed?

as for integers always takes the input value as a mathematical unbounded integer, then wraps the value to the output type.

I have shot myself in the foot with this once with usize conversion because it can mean different things on different platforms.

on every platform it takes the input value and wraps to the output type.

@AaronKutch
Copy link

In the case of uX -> iX, we just use to_iX to reinterpret the bits as two's complement. If the sign bit gets clobbered, there's no other kind of sensical casting that will help you. We have TryInto for handling that kind of fallibility.

In the case of iX -> uX, we use the existing unsigned_abs if we want to preserve absolute value losslesly, otherwise we use to_uX for reinterpretation just like as casts.

In the case of uX -> uY, the clear winner is plain truncation and zero extension, use to_uY.

In the case of iX -> iY, sign extension if X < Y and truncation if X > Y (we have TryInto for fallibility of numerical representation), so simple to_iY. If we don't want sign extension, cast to unsigned first.

In the case of u/iX -> fY, define to_fY to mean whatever the as cast currently means, which uses roundTiesToEven and infinity on overflow.

fX -> fY is to_fY, just use lossless or round-to-even as currently defined for as.

The current as case of fX -> u/iX is the only suspicious case to me, they use truncation and NaN to 0. This direction is the hardest, there was a lot of discussion before the NaN case was decided on. We may not want to define to_... for float-to-integer conversion, and instead have the specifiable rounding mode function for this.

@programmerjake
Copy link
Member

I really prefer v.wrap_to::<u64>() or even v as u64 over v.wrap_to_u64() because:

  1. sometimes the type can be omitted.
  2. when you need to use macros to be generic over integer types, having wrap_to_u64 means you have to pass in that function name as a separate macro ident argument, since concat_idents isn't stable, whereas having the type as a standard generic argument means that isn't necessary.

@AaronKutch
Copy link

means you have to pass in that function name as a separate macro ident argument

This part I can't overcome unfortunately. I feel like as casts can't be properly phased out.

@programmerjake
Copy link
Member

The thing that personally weirded me about as casts is the precise definition of what happens when you use something like u32 as i64. Does it do the sign conversion first and then potentially sign extend to 64 bits? Or, does it expand first to 64 bits and then make it signed?

as for integers always takes the input value as a mathematical unbounded integer, then wraps the value to the output type.

an easy way to think about what happens is to think of all of uX/iX as being representative of the mathematical integer you get by zero/sign-extending the bits infinitely far, and then always truncate that infinitely wide integer to the result type.

e.g.:
0b10110010 as i8 as u16 is really 0b...111110110010_i∞ as u16 == 0b1111111110110010u16

@VitWW
Copy link

VitWW commented Apr 25, 2023

Previous alternative RFC discussion: Conversions: FromLossy and TryFromLossy traits #2484

0000-lossy-conversions.md Show resolved Hide resolved

However, I am convinced that removing (or at least reducing) this papercut will make Rust safer and prevent more bugs. This is similar in spirit to the `unsafe` keyword, which makes Rust more verbose, but also more explicit about potential problems.

# Prior art
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there existing libraries doing something similar? I tried writing one myself, but I am the only user there. Would be great if there are some idiomatic libraries to sandbox this before it gets into the language.

0000-lossy-conversions.md Show resolved Hide resolved
0000-lossy-conversions.md Show resolved Hide resolved
0000-lossy-conversions.md Show resolved Hide resolved
This list of conversions should be implemented:

- `TruncatingFrom` and `SaturatingFrom`:
- all **signed** to **unsigned** integers
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it is a good idea to make u32 -> u16 and i16 -> u16 have the same trait? The implication feels quite different.

Copy link
Author

@Aloso Aloso May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure at all. I'm open to introducing more granular traits and methods, if most people prefer it. But I find it difficult to judge the community's attitude towards this matter. Of course, people who are happy with the proposal in its current form are underrepresented in this discussion.

0000-lossy-conversions.md Show resolved Hide resolved
Comment on lines +200 to +201
- `u32`, `u64`, `u128`, `i8`, `i16`, `i32`, `i64`, `i128` into `usize`
- `u16`, `u32`, `u64`, `u128`, `i32`, `i64`, `i128` into `isize`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that u8 is not part of isize because isize is always a superset, but would it be more consistent to make them symmetric?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to implement saturating and truncating methods for lossless conversions? I don't think that makes sense. Or what kind of symmetry are you looking for?


3. Instead of deprecating `as` only for lossy numeric casts, it could be deprecated for all numeric casts, so `From`/`Into` is required in these situations.

This feels like overkill. If people really want to forbid `as` for lossless conversions, they can use clippy's `cast_lossless` lint.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any justification that people do not already use cast_lossless? Or might it simply be because it is too troublesome to do .into() and specify the proper type (because we don't have .into::<T>())?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, cast_lossless isn't used because of the more verbose syntax. The rationale for this lint is: as can sometimes perform lossy conversions, so using it even for lossless conversions is bad, because it may silently become lossy when changing types while refactoring.

Once lossy as casts produce a warning, there will be no benefit to forbidding lossless as casts.


4. The `approx()` method could have a more descriptive name. Likewise, `truncate()` isn't ideal since it sometimes wraps around. I am open to better suggestions (though bear in mind that having multiple methods, like `truncate()`, `wrap()` and `truncate_and_wrap()` will make the feature more complicated and harder to learn).

5. `truncate` and `saturate` could be abbreviated as `trunc` and `sat`. This would make it more concise, which is appealing to people who convert between numbers a lot.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a trunc method in f64, not sure if that would cause confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.