Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify UUIDv8 Hash-based Example #147

Closed
danielmarschall opened this issue Sep 13, 2023 · 103 comments · Fixed by #158
Closed

Simplify UUIDv8 Hash-based Example #147

danielmarschall opened this issue Sep 13, 2023 · 103 comments · Fixed by #158

Comments

@danielmarschall
Copy link
Contributor

danielmarschall commented Sep 13, 2023

(Continuation of a discussion in #144)

I want to propose that part of UUIDv8 gets split in a new format UUIDv9. I hope there is a slight chance that it is still possible at this stage.

Current situation: UUIDv8 has three functions:

  1. Fully custom UUID (defined by vendor)
  2. Custom time-based
  3. New name-based

Proposal:

  1. Fully custom UUID stays UUIDv8
  2. Custom time-based stays UUIDv8
  3. New name-based UUID becomes UUIDv9

Why do I handle custom time-based and custom name-based differently?

With Section C.8 (name-based example) and Appendix B (hash spaces, #143), we have a pretty clear definition of how such a name-based UUID is calculated. We know the OIDs of a lot of algorithms and have a mechanism for how to convert the OID in hash space ID (#143). Even if we don't know the hash space ID, then IANA will probably have it in their registry (#144). The calculation of the UUID is defined very well, and therefore, if the hash space ID is unambiguous, then the UUID is unambiguous.

In opposite to custom time-based (which is very custom, because the vendor can define the length of time-part, clock sequence, random part, etc.), the new name-based version does not allow changing any contents/fields.

So, I do not think that these new name-based UUIDs should be considered '"custom". They are not "custom" in my opinion. Therefore I think they should have their own format, UUIDv9.

Since custom time-based and fully custom UUID are very custom, I don't think it would be an issue if they both share the same version (UUIDv8). After all, people who create custom UUIDs (either by defining a custom time-format, or a fully custom UUID) know that their custom definition is not standardized and might cause collissions.


By the way (personal opinion): I really dislike the time-based example (in section C.7), because the nanoseconds-resolution is rather extreme, causing a wrap-around of the time very quickly. I would have lowered the resolution to make the wrap-around take ~100 years. But on the other hand, the text "It should be noted that this example is just to illustrate one scenario for UUIDv8." is clear that this is not a fixed definition.

@fabiolimace
Copy link

fabiolimace commented Sep 13, 2023

Maybe off-topic, but...

It is too late to replace the compound adjective “name-based” with “hash-based” in v3, v5 (and v9); or at least treat them as interchangeable terms in the document? I see both when I inspect some implementations.

@danielmarschall
Copy link
Contributor Author

Hm… i think I used both terms, but without intention. I think it is okay if UUIDv9 gets named like UUIDv3 and UUIDv5 (i.e. name-based)

@danielmarschall danielmarschall changed the title UUIDv9 for new hash-based UUIDs UUIDv9 for new name-based UUIDs Sep 13, 2023
@fabiolimace
Copy link

fabiolimace commented Sep 13, 2023

I still have no opinion on the creation of UUIDv9. But we should be careful with expressions like "fully custom UUIDv8" and "time-based UUIDv8" as if they were side by side in the hierarchy.

There should be only one UUIDv8, which is "Fully Custom UUID" (if I'm correct). The time-based example is just one instance of a possible implementation of UUIDv8.

I think the confusion remains because UUIDv8 was originally a time-based UUID with custom timestamp precision. There is a lot of outdated webpages that still say that UUIDv8 is, in essence, a time-based version.

However, all UUIDv8 requirements have been eliminated except the variant and version bits, making it fully custom/free form/proprietary format.

Please, see the original discussion about UUIDv8: uuid6/uuid6-ietf-draft#31.

@mcr
Copy link
Contributor

mcr commented Sep 13, 2023

Uhm. It's really about six months too late for this suggestion.

@mcr
Copy link
Contributor

mcr commented Sep 13, 2023

It is too late to replace the compound adjective “name-based” with “hash-based” in v3, v5 (and v9); or at least treat them as interchangeable terms in the document? I see both when I inspect some implementations.

That is probably a tolerable change at this point.

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 13, 2023

Uhm. It's really about six months too late for this suggestion.

Actually, I do feel terrible for making a proposal at this point. But maybe there is a slight chance that it is possible. I really wish I had known about the UUID-Rev group 6+ months ago, then I would have contributed much more :-)

In theory (if the proposal is accepted in substantially), the editing steps are rather small:

  • Rename section 5.9 (Nil UID) and 5.10 (Max UID) to 5.10 and 5.11.
  • Move section C.8 (Example of UUIDv8) and rename it to 5.10 "UUID Version 9".
  • Section 3.2 (Abbreviations): Add UUIDv9
  • Section 5.5 (UUIDv5): Recommendation for other hashes change UUIDv8 to UUIDv9.
  • Section 6.5 (Best practises): Change Named bed UUIDs using UUIDv8 to UUIDv9
  • Appendix B: Change UUIDv8 to UUIDv9.
  • Check references still valid after renaming sections? (But I guess the Editor will do it in the finalization?)

If my proposal could be formally and substantially accepted, then I would like to help as good as I can by reading the whole draft from top to bottom to verify all references.

@ben221199
Copy link

Custom time-based stays UUIDv8

I don't agree on this one yet.

@mcr
Copy link
Contributor

mcr commented Sep 14, 2023

In theory (if the proposal is accepted in substantially), the editing steps are rather small:

Then, get all the formal reviews that have occurred to redo their work. That's what we've been doing for the past six months.
Your list is too detailed and half of it would be mechanical.
As I understand it, the substantive part would seem to be that we would making Appendix C.8 into UUID Version 9.
That would make the text normative rather than informative, and would chew up another vX number, of which we have only 8 left.

I am still not in anyway convinced I understand why this matters.
Please take this to the list.

@ben221199
Copy link

If UUIDv8 is FULLY custom, we shouldn't give implementation examples in my opinion. So, I think that the time-based version has to go.
Also, adding a UUID Hashspace IDs registry only makes sense when there is a version using them, but I also don't think UUIDv8 should be that version.

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 14, 2023

If UUIDv8 is FULLY custom, we shouldn't give implementation examples in my opinion.

No, I think examples are always good.

Also, adding a UUID Hashspace IDs registry only makes sense when there is a version using them, but I also don't think UUIDv8 should be that version.

I agree with that.

I have expressed my concerns earlier (#127) that I think it is weird that the only use of the non-example-section "hash spaces" is in an example-section - and now, even an IANA registry is planned (#144). Why should IANA register hash spaces, when they are only used in an "example" of our draft? So I agree with @ben221199 that there should be UUIDv9 in case we want the hashspace thing to be recognized by IANA. An Exception would be if the idea of hash spaces (i.e. UUID identifying an algorithm that is not already defined by an OID) might be useful in future standards.

I am still not in anyway convinced I understand why this matters.

I think one of the main reasons is the question "Why should IANA (or anyone else) care about hash spaces when they are only used in an "example" of our draft?"

Also, don't you think that modern hash algorithms aren't important enough to get their own version? It is a strong signal for people to stop using UUIDv3 and UUIDv5.

That would make the text normative rather than informative

I have no idea what you are talking about. UUIDv9 behaves the same as UUIDv3 and UUIDv5, just with different algorithm and a hash space.

Please take this to the list.

I don't understand. Which list are you referring to?


But in any case, I want to say that it is very important for me that the custom name-based (i.e. non-MD5/SHA1) stay, especially since both MD5 and SHA1 are insecure. Please do not remove the name-based example or the hash-space appendix. They are an amazing idea, even if they use the wrong UUID version in my opinion :-)

@ben221199
Copy link

No, I think examples are always good.

Examples are, but for me it now seems more like a version definition than an example. I think we should be more clear about that. Don't understand the test vectors of UUIDv8 either.

For all other parts, I think that @danielmarschall and I are mostly on the same line.

@LiosK
Copy link
Contributor

LiosK commented Sep 17, 2023

I'd point out one thing that seems to be missed in the recent discussion around the new name-based scheme: the uniqueness of hashspace-based UUIDs is absolutely dependent on the hash function used.

The current draft allows arbitrary hash functions chosen by implementers, and thus the uniqueness is dependent on the application-specific choice of hash functions. This application-specific nature makes the hashspace approach a very good example of v8.

In my opinion, it isn't really useful or meaningful to spare a separate version number without eliminating this application-specific nature, and to achieve that, we have to do at least the following work:

  • Select the appropriate hash functions to be allowed
  • Substantiate a reasonable level of uniqueness guarantee between UUIDs created from the selected hash functions

Perhaps, it's considerably late to complete these.

In addition, the above bullet points pose a subtle question relating to #143 and #144: what hash functions should we officially list in the specification? The current hashspace approach accepts whatever hash function (because it's v8), but it doesn't make sense to list and promote unsafe hash functions in the specification. I don't know how IANA works, but if it doesn't have a mechanism to reject nonsensical hash functions from being listed, IANA will not be a good choice. The OID-based hashspace IDs would make noise because it allows any hash function to get an official-ish hashspace ID without selection.

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 18, 2023

I'd point out one thing that seems to be missed in the recent discussion around the new name-based scheme: the uniqueness of hashspace-based UUIDs is absolutely dependent on the hash function used.

The current draft allows arbitrary hash functions chosen by implementers, and thus the uniqueness is dependent on the application-specific choice of hash functions. This application-specific nature makes the hashspace approach a very good example of v8.

I am not sure if I understood what you mean. Yes, if different hash methods are used, the resulting UUIDv9 is different. However, you need to think about the hash algorithm be an input of the hash function.

UUIDv3(Namespace, Value) := MD5(Namespace || Value)
UUIDv5(Namespace, Value) := SHA1(Namespace || Value)
UUIDv9(Hash, Namespace, Value) := Hash(HashId || Namespace || Value)

As I mentioned above, if we stay at UUIDv8, then we do not know "which" UUIDv8 was chosen. Was it a fully custom, a custom time format, or a custom hash format? We don't know.

For UUIDv9, if the hash is unambiguous, then the UUIDv9 is unambiguous.

The current hashspace approach accepts whatever hash function (because it's v8), but it doesn't make sense to list and promote unsafe hash functions in the specification. I don't know how IANA works, but if it doesn't have a mechanism to reject nonsensical hash functions from being listed, IANA will not be a good choice. The OID-based hashspace IDs would make noise because it allows any hash function to get an official-ish hashspace ID without selection.

If a hash algorithm is "unsafe" is dependant of the time. We know a few hash functions which are unsafe today. But we don't know if SHA2 or SHA3 might become unsafe tomorrow, if someone finds a flaw. So safe/unsafe should be out of scope for the RFC. However, the requirement of the hash function should be that it has at least 122 bits output (or it needs to be zero-padded).

I think the selection of hash algorithms in the current draft is good. It contains the NIST algorithms which are VERY well-known and are currently (2023) safe to use. If other hash algorithms emerge in the future and/or SHA2 and SHA3 become very insecure, then there can be a revision of the RFC with a different Appendix B. But this is not mandatory because the mechanism of hash spaces and/or the IANA registry of hashes allows the developer to simply choose a different algorithm.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Sep 18, 2023

As I mentioned above, if we stay at UUIDv8, then we do not know "which" UUIDv8 was chosen. Was it a fully custom, a custom time format, or a custom hash format? We don't know.

Yes, and we don't care at all. This hash UUID was not the intent of this RFC. And now this hash UUID is delaying the approval of everyone's expected UUIDv7. The uniqueness of the hash UUID is questionable not only because of problems with satisfactory hash functions, but also because the hash functions argument is not unique. Worse, the variability of the hash functions argument leads to variability in the hash UUID, but a volatile key is unsuitable for databases and other information systems. And hash UUIDs are also unordered, so they are no better than UUIDv4. We should throw hash UUID into the UUIDv8 category as soon as possible and stop delaying final RFC approval.

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 18, 2023

Yes, and we don't care at all.

Who is "we"?

This hash UUID was not the intent of this RFC.

And why did it end up in the latest draft then?

a volatile key is unsuitable for databases and other information systems

Then you also need to strike UUIDv4.

@chorman0773
Copy link

Worse, the variability of the hash functions argument leads to variability in the hash UUID, but a volatile key is unsuitable for databases and other information systems

Then you don't use them for databases.

I would like for a standardized mechanism for non-SHA-1 hash UUIDs for sure. I use UUIDs as device identifiers in the operating system I'm working on, and standard device names get fixed IDs assigned by using namespace UUIDS. Using something other than SHA-1 or md5 would be nice.

@LiosK
Copy link
Contributor

LiosK commented Sep 18, 2023

@danielmarschall

The current hashspace method accepts an arbitrary hash function, so I can define one as follows, give it a hashspace ID, and generate UUIDv8 name-based UUIDs.

func MiracleHash(message []byte) []byte {
    digest := make([]byte, 16, 16) // prepare zero-filled byte sequence
    time.Sleep(42 * time.Second)   // wait for a miracle to compute digest
    return digest
}

This MiracleHash function is obviously nonsense, but it's absolutely fine within the v8 space if it meets the application-specific needs.

However, out of the v8 space, in my opinion, the spec must reject unsafe (in terms of uniqueness) hash functions. UUID isn't an almighty identifier framework but is just a universally unique identifier standard. All the versions provided in the document (except for v8, which is clearly marked as implementation-specific) must provide a reasonable guarantee of universal uniqueness, because that is exactly what general readers of the standard are looking for. This is my personal opinion, but I believe this way of thinking maximizes the utility of the UUID standard.

The hashspace approach as in the current draft should be good enough as an informative guidance, but there are several questions we have to answer to make it a normative definition that gives a reasonable uniqueness guarantee. For example:

  • What hash functions should we allow? Do we really need all the SHA-2 and SHA-3 items? We have to pick one not just because it has some users but because it provides uniqueness.
  • Is it safe to truncate a digest to 128 bits? FIPS 180-4 explicitly allows truncation while other standards do not.
  • Is it safe to mix multiple hash functions in one version space? Do similar SHA functions not correlate with each other? SHA functions are each designed not to produce the same value, but the collision resistance across multiple algorithms shouldn't have been a primary design goal.
  • What if a hash function is proven unsafe in the future? How do we remove from a list of name-based UUIDs an unsafe algorithm-based UUID that may collide with other values generated in the future?

These questions are kept on hold when the current hashspace approach was first introduced because it was expected to be just an informative example. Thinking twice, we might conclude this approach is not the best option to deal with the above questions. Perhaps, we need to reconsider the other ideas of name-based schemes we have explored before. Anyway, developing a normative name-based scheme isn't that easy, and perhaps we have run out of time.

Btw, I don't really care the delay caused. I am thankful to @danielmarschall for raising this discussion.

@ben221199
Copy link

If you say UUIDv8 is fully custom, is not good to give an example which is the only example where hashspaces are used and then register those hashspaces at IANA. Then the only right to exist of hashspaces are based on an example; and then you are actually not talking about a example anymore, but about a new format.

UUIDv8 is fully custom. You are able to give implementation examples, but know that giving examples can give the suggestion that it are real formats that need to implemented, which should not be the case. So caution is advised.

My advise options for time-based UUIDv8 example:

  • Remove it to avoid confusion
  • Revise the text to be FULLY clear about it being an example

My advise options for name-based UUIDv8 example:

  • Remove it to avoid confusion
  • Move it to UUIDv9

@sergeyprokhorenko
Copy link

Yes, and we don't care at all.

Who is "we"?

This hash UUID was not the intent of this RFC.

And why did it end up in the latest draft then?

a volatile key is unsuitable for databases and other information systems

Then you also need to strike UUIDv4.

@danielmarschall

The goals of this RFC are well stated in 2.1. Update Motivation
Essentially the main goal is to replace UUIDv1 and UUIDv4 with UUIDv7 as a key in databases. All other goals are secondary

@ben221199
Copy link

The goals of this RFC are well stated in 2.1. Update Motivation
Essentially the main goal is to replace UUIDv1 and UUIDv4 with UUIDv7 as a key in databases. All other goals are secondary

I'm fine with UUIDv6 and UUIDv7 as far as I know. So if we fix or drop UUIDv8, the RFC could be published in my opinion. However, I think many also want UUIDv8 in the same RFC, but not with the chaotic definition it has now.

@danielmarschall
Copy link
Contributor Author

@LiosK In re your MiracleHash: I didn't read the definition of "Hash", but I think one very strong requirement is that it is deterministic?

@sergeyprokhorenko
Copy link

@ben221199

I agree that the UUIDv8 examples should be removed completely as they can be taken as a guide to action.
I don't mind UUIDv6 and the fixed UUIDv8 without examples

@danielmarschall
Copy link
Contributor Author

Removing UUIDv8 examples (without introducing UUIDv9) would mean that we strike the complete SHA2/SHA3 functionality and we are left with MD5 and SHA1. That would be horrible.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Sep 18, 2023

@danielmarschall

Removing UUIDv8 examples (without introducing UUIDv9) would mean that we strike the complete SHA2/SHA3 functionality and we are left with MD5 and SHA1. That would be horrible.

You can simply list depricated technologies and accepted technologies for UUIDv8 without examples. You can also specify a list of UUIDv8 categories (time-based, hash-based and so on)

@mcr
Copy link
Contributor

mcr commented Sep 18, 2023

@sergeyprokhorenko
Copy link

@danielmarschall, Please clarify your proposal taking into account the discussion that took place.

It would be terrible if we had to re-do the entire approval process of this long-awaited RFC because of frankly completely useless details regarding almost unused hash UUIDs.

@mcr, I completely agree with @LiosK's point of view and give him my vote

@danielmarschall
Copy link
Contributor Author

@danielmarschall, Please clarify your proposal taking into account the discussion that took place.

Okay. I will carefully read through the discussion and adjust the initial post of this GitHub issue. Maybe also re-phrasing some parts for better understanding.

It would be terrible if we had to re-do the entire approval process of this long-awaited RFC ......

Is it true that the approval process needs to be done again, or is this change something that just needs a Re-review of the ADs (i.e. they only re-review the changes, not everything?)

....... because of frankly completely useless details regarding almost unused hash UUIDs.

It depends on the use-cases. I understand that UUIDv7 is very important for databases because of their order. But that doesn't mean that the other UUID versions are useless for everyone. I have worked on a lot of projects where hash based UUIDv3 and UUIDv5 were required. Having SHA2/3 or Any-Hash would be an important improvement for the UUIDv3 and UUIDv5 use-cases.

@sergeyprokhorenko
Copy link

I suggest to freeze the draft RFC and stop making any changes to it because it's too late and improvements could go on forever. There was a lot of time to make the discussed amendments in the previous stages of RFC development. Stakeholders will be able to propose changes to an already approved RFC

@LiosK
Copy link
Contributor

LiosK commented Sep 21, 2023

That's an extreme. The same logic holds for the hash-based v8 too and will ultimately remove all the v8 examples from the document. Examples are helpful for readers as they concisely convey our intention to introduce v8, which can't be expressed in the succinct normative description. Examples won't confuse readers as long as they are present clearly labeled as implementation examples.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Sep 21, 2023

That's an extreme. The same logic holds for the hash-based v8 too and will ultimately remove all the v8 examples from the document. Examples are helpful for readers as they concisely convey our intention to introduce v8, which can't be expressed in the succinct normative description. Examples won't confuse readers as long as they are present clearly labeled as implementation examples.

In this case the examples of UUIDv8 (only!) must be accompanied by a disclaimer that the implementer can use these examples at his/her own risk, but the examples themselves are not recommended by the standard, have not been properly tested or examined, and their use may lead to errors in the information system.

@kyzer-davis
Copy link
Collaborator

FYI, in line with #150 thinking, I was planning to move the v8 "test vectors" to a new appendix titled "illustrative examples".
Then put some leading text around how they are simply showing how one can use v8 and are not meant to be implemented (unless somebody really likes that logic.) and prefix with lots of "use at your own risk" text.

@bradleypeabody, @bradleypeabody

  • My only suggestion for SHA-256 example over some other hashing library is that it is in the same NIST/FIPS specs as SHA-1 and easily transposable. That is, just like our v1 to v6 modification. There is little that needs to be done to illustrate the points and get the topic across.
  • Why not SHA-512 or some other? Luck of the draw I guess. That is what I already have a test vector on in my notes. But I will be clear that this isn't the only possible option, just an illustrative example of how somebody could do this with a next gen hashing function in lieu of MD5/SHA-1. Plus SHA-256 is PQC safe so it should be okay for a while.
  • XOR/Filling zeros: Out of Scope, do what you must since it's v8. I am not sure we need to provide guidance explicit to this (there are also many other places in the doc where we truncate, modify, fill/pad, etc.) So folks who are inspired can find some helpful examples even if it is not explicitly for hash-based functions.

Timeline: Let me finish out some of these other early draft-12 tracker items for easier things. Then I will branch this off of the new draft 12 base. ETA Monday/next week as I let those other items in PR #152 bake.

@danielmarschall
Copy link
Contributor Author

Then put some leading text around how they are simply showing how one can use v8 and are not meant to be implemented (unless somebody really likes that logic.) and prefix with lots of "use at your own risk" text.

If it is not meant to be implemented, can it then be used as a reference for interoperability? (Well, technically, it can, if it gets its own appendix/section number that someone can refer to)

  • My only suggestion for SHA-256 example over some other hashing library is that it is in the same NIST/FIPS specs as SHA-1 and easily transposable. That is, just like our v1 to v6 modification. There is little that needs to be done to illustrate the points and get the topic across.

(Personal opinion) I am fine with either SHA-X or xxHash; I don't have a preference. SHA-X is a bit more well-known and the truncation of extra bits could be illustrated in the example. xxHash on the other hand is very fast and might be good for UUIDs.

  • Why not SHA-512 or some other? Luck of the draw I guess. That is what I already have a test vector on in my notes. But I will be clear that this isn't the only possible option, just an illustrative example of how somebody could do this with a next gen hashing function in lieu of MD5/SHA-1. Plus SHA-256 is PQC safe so it should be okay for a while.
  • XOR/Filling zeros: Out of Scope, do what you must since it's v8. I am not sure we need to provide guidance explicit to this (there are also many other places in the doc where we truncate, modify, fill/pad, etc.) So folks who are inspired can find some helpful examples even if it is not explicitly for hash-based functions.

Since it would be great to have it as a reference for interoperability, it would be good if it could be defined for hash algorithms < 128 bits. The developer needs to decide if they want to use that algorithm, though.

Timeline: Let me finish out some of these other early draft-12 tracker items for easier things. Then I will branch this off of the new draft 12 base. ETA Monday/next week as I let those other items in PR #152 bake.

Sounds good!

@mcr
Copy link
Contributor

mcr commented Sep 21, 2023

@LiosK
Copy link
Contributor

LiosK commented Sep 21, 2023

  • XOR/Filling zeros: Out of Scope, do what you must since it's v8. I am not sure we need to provide guidance explicit to this (there are also many other places in the doc where we truncate, modify, fill/pad, etc.) So folks who are inspired can find some helpful examples even if it is not explicitly for hash-based functions.

Agreed. Examples don't need to contain information for interoperability.

@mcr, I cannot make the interim, unfortunately.

@chorman0773
Copy link

chorman0773 commented Sep 22, 2023

FTR, I do not consider a completely custom v8 useful for my identified use case, regardless of the examples provided for it.
My issue is with the fact it cannot be considered unique in the face of non-malicious uncontrolled code.

@sergeyprokhorenko
Copy link

FTR, I do not consider a completely custom v8 useful for my identified use case, regardless of the examples provided for it.
My issue is with the fact it cannot be considered unique in the face of non-malicious uncontrolled code.

No wonder. Huge efforts have been put into making UUIDv7 perfect. Therefore, there are no more useful ideas left for UUIDv8. The only purpose of UUIDv8 is not to limit the imagination of implementers. And it would be strange to advise them on this using examples.

@bradleypeabody
Copy link
Collaborator

My issue is with the fact it cannot be considered unique in the face of non-malicious uncontrolled code.

@chorman0773 I don't follow. If you direct implementations to use UUIDv8 following the hash example provided, it should have approximately the same uniqueness probabilities as any other UUID (while also being hash-based). What aspect of this makes the resulting UUID not/less unique?

@ben221199
Copy link

ben221199 commented Sep 22, 2023

https://mailarchive.ietf.org/arch/msg/uuidrev/dhxgO66xkpNBrOtSy0AY8nV9bAE/ @sergeyprokhorenko @danielmarschall @ben221199 @LiosK @chorman0773 If this matters, then please participate.

https://notes.ietf.org/notes-ietf-interim-2023-uuidrev-04-uuidrev

I have to dive into how these meetings work. Don't know if I have time to participate, because I also have other work to do.

@kyzer-davis kyzer-davis changed the title UUIDv9 for new name-based UUIDs Simplify UUIDv8 Hash-based Example Sep 22, 2023
@kyzer-davis
Copy link
Collaborator

FYI, I have changed the title of the issue so it reflects the current state of this tracker item.

@chorman0773
Copy link

chorman0773 commented Sep 22, 2023

@chorman0773 I don't follow. If you direct implementations to use UUIDv8 following the hash example provided, it should have approximately the same uniqueness probabilities as any other UUID (while also being hash-based). What aspect of this makes the resulting UUID not/less unique?

The point of using UUIDs is that I don't need to provide any guidance to driver implementors beyond the UUID RFC except that they should not use the namespace db4ec4af-a7b7-315b-8c02-bb928e3d4281 (which is the UUID for the OS itself) or UUIDs derive from it to derive (via any method) device IDs, as they may collide with future "well-known" devices. My main worry is that they'd use v8 time-based, or some other custom method for generating v8 IDs that ends up with the same result.

@LiosK
Copy link
Contributor

LiosK commented Sep 23, 2023

@chorman0773

The draft is clear that the UUIDv8's uniqueness MUST NOT be assumed, so in your case, if you do not provide guidance to driver implementers, then you must reject a v8 value from registering. You are also free to provide driver implementers with detailed v8 guidance specifying the structure and hash functions, just like you would provide namespace guidance for v3/v5. That's the "implementation-specific" exactly means.

I see your case might want (though not require) a standardized hash-based ID definition, but it seems considerably difficult to achieve consensus for such a scheme, and in my opinion it needs a separate RFC project.

@chorman0773
Copy link

The draft is clear that the UUIDv8's uniqueness MUST NOT be assumed, so in your case, if you do not provide guidance to driver implementers, then you must reject a v8 value from registering

Well, strictly speaking, unless the device id is already in use or it's one of the two sentinel values (full is explicitly reserved, nil is "Don't care, kernel assign device id", the ID isn't going to be rejected, either for kernel mode drivers or user mode drivers. It's up to the individual drivers if they. The question is whether the kernel itself would use the IDs, not whether they'd be available for anything else to use.

@bradleypeabody
Copy link
Collaborator

bradleypeabody commented Sep 23, 2023

@chorman0773 Since, from what I gather, you control at least the recommendation of what you're suggesting drivers do when generating a UUID, it seems like just providing a specific suggestion of what kind of UUID you recommend would work.

The point of using UUIDs is that I don't need to provide any guidance to driver implementors beyond the UUID RFC...

Why? What prevents you from simply telling people who make drivers that the recommendation is to use UUID v3, 5 or v8 following the hash example, and they do something else there is a very small chance that the probability of collision increases.

My main worry is that they'd use v8 time-based, or some other custom method for generating v8 IDs that ends up with the same result.

Keep in mind that there's only so much uniquness you're gong to get in a 128 bit value. There physically no guarantees with any of the UUID versions that fully absolutely prevent collisions, all you can do is reduce the probability. And if we assume that whatever other weird stuff people do with UUIDv8 is going to be roughly evenly distributed, it's easy to make the argument that your collision probabily doesn't increase.

Anyway, this UUID v8 example looks like what we'll realistically be able to get into the RFC. So if you want to help shape that, great, if not, that's fine too.

ALSO: Keep in mind that it takes various implementations and experience in order to end up with a new standard. If enough people get behind it, then this v8 example could end up being v9 in a later update to the document, another reason to put effort into this v8 example - it's a starting point.

fabiolimace added a commit to f4b6a3/uuid-creator that referenced this issue Sep 23, 2023
The `GUID.v8()` method is no longer supported due to recent sudden
changes in the UUIDv8 discussions. It will be removed when the new RFC
is finally published.

See the latest discussions about UUIDv8:

* ietf-wg-uuidrev/rfc4122bis#143
* ietf-wg-uuidrev/rfc4122bis#144
* ietf-wg-uuidrev/rfc4122bis#147
kyzer-davis added a commit that referenced this issue Sep 28, 2023
@kyzer-davis
Copy link
Collaborator

I have greatly reduced the complexity of the UUIDv8 Name/Hash based example in a4f5693

To summarize this commit:

  • Removes Hashspace IDs and scrubs the document of their verbiage.
  • Changes the illustrative example in the final appendix to use JUST SHA-256 (e.g UUIDv5 with the only swaps being SHA-256 and the version bit to be v8)
  • Removes all text in name-based examples except that of UUIDv5 pointing to the illustrative example as guidance.

All-in-All this checks the box of:

  • Providing an implementor who may not be able to use SHA1/MD5 and still wants a UUID guidance on how to do so.
  • Provides some examples of how to do this (without as much heavy text in official name-based UUID generation sections.)
  • Provides a much simpler solution that somebody with knowledge of UUIDv5 can pick up easily. (I modified Python3 UUID.py in two lines and had this example running.)

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 28, 2023

@kyzer-davis Thank you very much for your work! I think it is very good to have this appendix with examples, and that the UUIDv8 are simplified by removing the hash-space.

I have a few thoughts about https://github.com/ietf-wg-uuidrev/rfc4122bis/blob/hash-based-uuids/draft-ietf-uuidrev-rfc4122bis.md :

The examples below are not meant to be implemented

I am not sure... doesn't this conflict with the idea of a reference for interoperability (which was discussed earlier)?

In other words: If one decides to implement SHA-256 UUIDv8 according to RFC xxx Appendix xxx, will they do a "bad job" if they implement something that is "not meant to implemented"?

These MAY leverage newer hashing protocols such as SHA-256 or SHA-512 defined by {{FIPS180-4}}, SHA-3 or SHAKE defined by {{FIPS202}}, or even protocols that have not been defined yet.

I think it should mean hash algorithms and not hash protocols.

This UUIDv8 illustrative example utilizes a well-known 64 bit Unix epoch timestamp with nanosecond precision, truncated to the least-significant, right-most, bits to fill the first 48 bits through version.

The horrible example with "one nanosecond in 48 bits" is still there... Can't we assign more bits, or anything so that it wraps at least in each 35 years instead of just a few hours?

@danielmarschall
Copy link
Contributor Author

danielmarschall commented Sep 28, 2023

For the time-based example, here are alternatives (I am keeping 1 nano second):

A. tttttttt-tttt-Xttt-rrrr-Xrrrrrrrrrrr : 1 nano second in 60 bits = wraps after 36 years (2^60/(1000*1000*1000*60*60*24*365)), and has 62 bits are left for random (i.e. chance to collide is 1/2^62)

B. tttttttt-tttt-Xttt-trrr-Xrrrrrrrrrrr : 1 nano second in 64 bits = wraps after 584 years (2^64/(1000*1000*1000*60*60*24*365)), and has 58 bits are left for random (i.e. chance to collide is 1/2^58)

C. tttttttt-tttt-Xttt-ttrr-Xrrrrrrrrrrr: 1 nano second in 68 bits = wraps after 9359 years (2^68/(1000*1000*1000*60*60*24*365)), and has 54 bits are left for random (i.e. chance to collide is 1/2^54)

D. tttttttt-tttt-Xttt-tttt-Xrrrrrrrrrrr : 1 nano second in 76 bits = wraps after 2395924 years (2^76/(1000*1000*1000*60*60*24*365)), and has 46 bits are left for random (i.e. chance to collide is 1/2^46)

My preferrence would be >=100 years, so 64 bit.

@kyzer-davis
Copy link
Collaborator

kyzer-davis commented Sep 28, 2023

In terms of hash-based items on #147 (comment)

  • I can remove the "The examples below are not meant to be implemented" and change it to simply "The examples below have not been through..." Edit: changed in 2a29aa0
  • Yeah, I should use algorithms, let me change that in a commit. Edit: changed in 815fa11

Time comment: #147 (comment)

@danielmarschall
Copy link
Contributor Author

@kyzer-davis Thank you! Looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants