Skip to content
This repository has been archived by the owner on Jul 9, 2024. It is now read-only.

Discussion: Monotonicity and Counters #60

Open
kyzer-davis opened this issue Feb 24, 2022 · 105 comments
Open

Discussion: Monotonicity and Counters #60

kyzer-davis opened this issue Feb 24, 2022 · 105 comments
Labels
Discussion Further information is requested

Comments

@kyzer-davis
Copy link
Contributor

kyzer-davis commented Feb 24, 2022

All things Monotonicity and Counters to assist with sort ordering, db index locality, and same Timestamp-tick collision avoidance during batch UUID creation.

Sub-Topics: Counter position, length, rollover handling and seeding.

Extension of the thread: #58 (comment)

Required Reading: #41 (comment) and Research efforts

@kyzer-davis kyzer-davis added the Discussion Further information is requested label Feb 24, 2022
@kyzer-davis kyzer-davis mentioned this issue Feb 25, 2022
@sergeyprokhorenko
Copy link

Please include this in version 7:

A counter MAY be placed instead of the left side of the random section, immediately after the timestamp. The counter is designed to keep UUIDs monotonic within a timestamp step, by default within a millisecond. The default counter length is 16 bits for the millisecond timestamp. The counter MUST be reset every timestamp step. The counter MUST be frozen just before overflowing up to the next timestep step. For each UUID instance, the random section MUST be updated despite the use of the counter.

@broofa
Copy link
Contributor

broofa commented Feb 26, 2022

A counter MAY be placed instead of...

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does? In which case, the counter is effectively at the very end ("far right") side of the random field.

The advantage of ULID approach is that counter rollover so statistically unlikely as to not warrant mention. Whereas placement at the left side of the field requires reasoning about creation rates, counter sizes, and rollover error cases.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Feb 26, 2022

Is there any benefit to this over simply treating the entire rand field as a counter, the way ULID does?

  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.
  2. If random part is close to 111...111, then you have too short counter

@broofa
Copy link
Contributor

broofa commented Feb 27, 2022

  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

  1. If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Feb 27, 2022

  1. The random part of ULID is almost frozen within every millisecond while it's used as a counter. Therefore it's very easy to guess the next or previous ULID. Just add 1 or substract 1. It's not secure.

Isn't this true of any monotonic counter scheme? Unless you randomize all bits with every uuid, there will be some degree of predictability.

l beleive that all random bits of every uuid MUST be randomized despite using counter. Such monotonic counter scheme is secure.

  1. If random part is close to 111...111, then you have too short counter

Again, true of any counter scheme.

No, if we have 16 empty bits of separate counter for every millisecond.

Treating the whole 70-bit random value as a counter means the odds of actually encountering a rollover case are very small. Even if the lowest 20 bits turn over (1M uuids / msec), the top-most 50 bits would have to all be ones. There's only a ~1 in 1015 chance of that occurring. For the truly cautious, simply checking for that case and generating a different random value, or just forcing any of those high-order bits to be zero, eliminates that possibility.

You are correct.

@LiosK
Copy link

LiosK commented Feb 27, 2022

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

@peterbourgon
Copy link

peterbourgon commented Feb 27, 2022

oklog/ulid.Monotonic provides monotonicity parameterized by a user-provided inc value. Higher inc values provide less predictable output at the cost of fewer overall possible outputs (i.e. higher probability of collision).

IMO — specs should not take any stance on any property of random portions of IDs. Specs should define them as random, and if implementations want to provide monotonicity within some bounds, that's above-and-beyond spec guarantees.

@sergeyprokhorenko
Copy link

@peterbourgon If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either. Therefore, the specification should contain at least a more or less satisfactory solution as an option.

@sergeyprokhorenko
Copy link

I agree that it should generally be encouraged to entirely randomize some (say, 16 or 32 bits) trailing bits at every generation to ensure some levels of unpredictability, rather than use the whole rand field as a monotonic random counter. I think the draft RFC should state this explicitly to guide implementers, but the thing is, it is very difficult to determine the appropriate lengths of the monotonic random and per-ID random. So, perhaps it might be best for the draft to just state this point and have implementers decide appropriate sizes.

@LiosK I like your solution very much. I would add freezing just before overflowing the counter, although this is unlikely.

@peterbourgon
Copy link

If something is not in the specification, then this is a 99.99% guarantee that this will not be in the implementation either.

I don't think I agree, but I also don't think this is a bad thing. In fact, I think it's correct! The spec defines the random portion of the ID to be random. If those bytes happen to be monotonic, that's a detail of the implementation.

@kyzer-davis
Copy link
Contributor Author

@peterbourgon @sergeyprokhorenko
I just merged Draft 03 into master.
Could you take a look at the latest Monotonicity and Counters section? I believe it already hits on both of your points.

@sergeyprokhorenko
Copy link

@kyzer-davis

  1. Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.

  2. I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

@peterbourgon
Copy link

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

@sergeyprokhorenko
Copy link

UUID generation is necessarily a possible source of runtime errors, because entropy is exhaustible.

@peterbourgon This is not true. It is entirely possible to prevent errors during the generation of UUIDs. It is better to prevent errors if possible.

If you mean collisions of UUIDs, then such errors do not occur during generation, but when writing to the table.

@peterbourgon
Copy link

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

@kyzer-davis
Copy link
Contributor Author

@sergeyprokhorenko,

I suggest to add the left part of to the text: "With this method the left part of random data is extended to also double as a counter" to avoid of guessability of UUIDs. The right part of random data will not be frozen till the next timestamp tick.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

I simply say you SHOULD use a monotonic random and for fixed-length counters you SHOULD use UUIDv8 because of the heated debates around counter length, rollover handling, and initialization of the counter have so many variations based on user input that it is better fit for UUIDv8.

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

Waiting for the timestamp to advance can slow down the generation of UUIDs and thus break the application. The UUID generation should not be a source of possible run-time errors for the application. It is better to allow a slight non-monotonicity, which the DBMS can handle quite well. It will only slow down the search for some records a little. In any case, with several data sources, a slight non-monotonicity is inevitable.

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

@sergeyprokhorenko
Copy link

If you run out of entropy, as far as I'm aware your options are (a) fail, or (b) block until entropy is replenished.

@peterbourgon It sounds like "run out of numbers" :))

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Mar 3, 2022

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean #60 (comment) I like it very much

@sergeyprokhorenko
Copy link

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

@peterbourgon
Copy link

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

@sergeyprokhorenko
Copy link

The text let's the application implementers decide and provides some guidance. Both usages in this section are SHOULD not MUST. Implementations are fit to do what they want. If they desire to let some items go out of order then they may.

@kyzer-davis It seems to me that the specification should suggest to developers a safer option, and not an unexpected pause in the generation of UUIDs when the counter overflows. Overflows always remind me of the Ariane flight V88 disaster

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Mar 3, 2022

@sergeyprokhorenko Running out of entropy is much more common than running out of numbers :)

System administrators, especially those supervising Internet servers, have to ensure that the server processes will not halt because of entropy depletion.

https://en.m.wikipedia.org/wiki/Entropy_(computing)

This is about the low quality of pseudo-random numbers, but not about their deficit. Poor quality generators generate a repeating sequence of pseudo-random numbers, which leads to collisions.
The problem of low quality of pseudo-random numbers is already solved in the spec: https://uuid6.github.io/uuid6-ietf-draft/#name-unguessability

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 3, 2022

@sergeyprokhorenko

If you would like I can put some text in the "Fixed-Length Dedicated Counter Bits (Method 1):" bullet that stated UUIDv7 rand_a MAY double as a counter when utilizing this method.

@kyzer-davis Yes please. That would be great!

Let me put together a Change Proposal template for this and get it in Draft 03 Phase 3 PR.

I assume you are referencing rand_a, which totally can be a counter based on my current text in the the paragraph "Fixed-Length Dedicated Counter Bits (Method 1):"

@kyzer-davis No, I mean #60 (comment) I like it very much

I apologize, somehow I missed this comment entirely. It does seem like a valid approach but seems pretty similar to Fixed-Length Dedicated Counter Bits (Method 1) at the moment. The difference is in semantics. My text states to dedicated a portion of bits after the timestamp; this text states to dedicate a portion of the most significant, left-most, part of the random. 9/10 times they will likely be the same thing.

I could update Fixed-Length Dedicated Counter Seeding: to describe this practice. I took a quick swing below:


Fixed-Length Dedicated Counter Seeding:

  • Implementations utilizing either fixed-length counter method MAY randomly initialize the counter with each new timestamp tick. However, when the timestamp has not incremented; the counter SHOULD be frozen and incremented by one.
  • When utilizing a randomly seeded counter alongside Method 1; the random MAY be regenerated with each counter increment without impacting storability. The downside is that Method 1 is prone to overflows if a counter of adequate length is not selected or the random data generated leaves little room for the required number of increments.
  • A randomly seeded counter alongside Method 2 is more-or-less the same as Monotonic Random (Method 3).
  • Implementations utilizing either fixed-length counter method MAY also choose to initialize the counter at zero to ensure the full bit-space is utilized and help avoid counter rollovers. This approach has less entropy and more guessibility but ensures the most of the counter bit space.

@sergeyprokhorenko
Copy link

@sergeyprokhorenko ...did you even check #76? With this update both Method 1 and Method 3 are recommended counter solutions for v7E.

I deleted my comment

@kyzer-davis
Copy link
Contributor Author

I deleted my comment

As did I :)

Let me know what you think about that proposed "Fixed-Length Dedicated Counter Seeding:" text.
If it looks good I can write up a change proposal template and possibly sneak it in the open PR.

@peterbourgon
Copy link

@sergeyprokhorenko

This is about the low quality of pseudo-random numbers, but not about their deficit.

I believe you're mistaken. Both are issues, but I'm pointing out the fact that entropy — crypto-grade and pseudorandom both — is a resource which can be depleted. If something consumes that resource, then it necessarily must deal with the possibility that there is no entropy available in a given period of time.

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 18, 2022

@LiosK Great writeup.

I would like the new RFC to put guardrails around unsafe choices and extreme approaches to prevent implementers from falling in common pitfalls.

This statement hits the nail on the head for the best practices section. I am working on the counter changes. Another update to come.

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Mar 18, 2022

@LiosK, I merged the changes which are live now via https://uuid6.github.io/uuid6-ietf-draft/#name-monotonicity-and-counters

The text should now describe:

  • Counter Method 1, Increment Type A
  • Counter Method 1, Increment Type B
  • Counter Method 2, Increment Type A
  • Counter Method 2, Increment Type B

I removed the harsh verbiage against fixed-length counters. Both methods solve the problems. The bits being tallied are just in two different positions within the UUID layout. We are describing the solutions and their possible upsides/downfalls/challenges. Implementations pick the one that seems correct for them.

I also made the changes to the fixed-length seeding, length, rollover parts as per your comment #60 (comment)

Lastly, I abstracted the "when to increment" sub-section to point to Counter Methods and Increment Types.


@sergeyprokhorenko, the text reads:

applications that care about absolute monotonicity and sortability SHOULD freeze the counter

This should be sufficient to cover the topic.


Edit: Personal note, I have always been a proponent of the fixed-length counter method. Drafts 01 and 02 never had monotonic random because of this personal bias. Any harsh verbiage against fixed-length counters in the previous stages of Draft 03 was not intended.

As we have iterated via this thread, both methods have proven sufficient at solving this problem. As a result in this latest revision both should be presented as equal solutions. If this is not the case please suggest an edit.

@sergeyprokhorenko
Copy link

As we have iterated via this thread, both methods have proven sufficient at solving this problem. As a result in this latest revision both should be presented as equal solutions. If this is not the case please suggest an edit.

@kyzer-davis, Now the text of section Monotonicity and Counters is confusing. It mixes the new and the outdated, the great and the bad. There is no need to list all possible options if some of them are obviously bad. It is better to use Occam's razor.

I'm convinced that both methods 1 and 2 should be dropped in favor of paragraph Fixed-Length Dedicated Counter Seeding. Both methods 1 and 2 are now obsolete thanks to LiosK's great idea.

We also need to drop Random Increment (Type B) in favor of Plus One Increment (Type A). See rationale from LiosK: #60 (comment) and from me: #60 (comment)

@LiosK
Copy link

LiosK commented Mar 19, 2022

The new draft looks good! Thank you, @kyzer-davis. If I might add something, Counter Rollover Handling is also relevant to the monotonic random approach particularly when it's combined with the random increment strategy, and thus it should be placed under "The following sub-topics cover methods behind incrementing either type of counter method:".

@kyzer-davis
Copy link
Contributor Author

@LiosK, good callout. I split into two sections for rollover handling and rollover guarding in the latest PR.

@edo1
Copy link

edo1 commented Apr 7, 2022

There is no need to list all possible options if some of them are obviously bad. It is better to use Occam's razor.

Completely agree.

@kyzer-davis, IMO current standard version is too loose, it is unclear and confusing.
You seem to be assuming that each application uses its own customized UUID generator. In the real world, almost all applications use a generic implementation from some library, this implementation should "fit all".

  1. Any reason to keep methods 1 and 2, types A and B?
  2. Why is the recommended counter size not specified?
  3. Why not use SHOULD here?
    Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized.
    SHOULD allows other implementations to be considered RFC-compliant, but makes it clear which approach is recommended.
  4. Why not use SHOULD here?
    Implementations MAY use the following logic to ensure UUIDs featuring embedded counters are monotonic in nature:
    Compare the current timestamp against the previously stored timestamp.
    If the current timestamp is equal to the previous timestamp; increment the counter according to the desired method and type.
    If the current timestamp is greater than the previous timestamp; re-initialize the desired counter method to the new timestamp and generate new random bytes (if the bytes were frozen or being used as the seed for a monotonic counter).
  5. The recommended approach to handle clock drift is not specified.
    Implementations SHOULD check if the the currently generated UUID is greater than the previously generated UUID. If this is not the case then any number of things could have occurred. Such as, but not limited to, clock rollbacks, leap second handling or counter rollovers
    My proposal was to reuse the last timestamp+counter if the jump is small enough (less than ≈2.5s), it will handle leap second and/or small clock drift due to NTP, VM migration, and so on.
  6. The recommended approach to handle counter overflow is not specified.
    My proposal was to increment the timestamp (it shouldn't hurt because the suggested timestamp resolution was 100ns).
    ULID approach is to fail.

@LiosK
Copy link

LiosK commented Apr 7, 2022

  1. Why not use SHOULD here?
    Implementations utilizing fixed-length counter method MAY also choose to randomly initialize a portion counter rather than the entire counter. For example, a 24 bit counter could have the 23 bits in least-significant, right-most, position randomly initialized.
    SHOULD allows other implementations to be considered RFC-compliant, but makes it clear which approach is recommended.

IMO MAY is fine here. With a 24-bit counter, on average 8 million IDs can be generated per millisecond and with a 99.9% chance at least 10,000 IDs can be generated. BTW, I am an advocate of 42-bit counters and in this case 2.2 trillion on average and with a 99.999999% chance at least 10,000 can be generated. These numbers are more than enough for most applications and only few extremely high-load applications and those that prefer short counters are adversely affected by the initialize-full-counter approach. On the other hand, if the spec recommends the initialize-portion-counter approach, the generic libraries might force every application to pay one bit entropy for the overflow guard not really relevant to most applications.

The initialize-portion-counter approach definitely solves some problems but is not a clearly superior approach that should be recommended to everyone.

  1. The recommended approach to handle clock drift is not specified.
  2. The recommended approach to handle counter overflow is not specified.

Your approaches are actually very interesting and creative if combined together. The timestamp state is simply incremented at a counter overflow and such an advanced timestamp state is accepted so long as the difference from the real clock is small enough. The generator will stall, reset state, or raise error only when the real clock is rewound so much or when the generator is "overheated" (i.e. when it experiences too many counter overflows). Very interesting! I would try this in my prototype. I'm not sure if many people think this approach is "right", but I'm kind of sure this works well in common scenarios.

@edo1
Copy link

edo1 commented Apr 8, 2022

On the other hand, if the spec recommends the initialize-portion-counter approach, the generic libraries might force every application to pay one bit entropy for the overflow guard not really relevant to most applications.

So you're advocating not using "N-1 rollover protection" as the default approach?

@LiosK
Copy link

LiosK commented Apr 8, 2022

N-1 approach shouldn't be default. Overflow protection isn't worth one bit entropy for most applications. Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

@edo1
Copy link

edo1 commented Apr 8, 2022

Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

I agree.

One note: in my proposal, the timestamp resolution was 100ns, which is overkill in most cases. So timestamp incrementing shouldn't hurt.
I have some doubts about incrementing the timestamp with 1ms resolution

@LiosK
Copy link

LiosK commented Apr 8, 2022

1 ms is likely overkill in a scenario where distributed nodes use their own clocks, while even 100 ns increment can be problematic if a clock is shared by distributed nodes. I'm now trying to figure out in what conditions the timestamp increment can hurt and in what conditions not.

Anyway, using the timestamp field as a reserve counter is an eye opening but counterintuitive idea that not many implementers will come up with. I think it's worth discussing this here and in the RFC.

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Apr 8, 2022

@edo1, great feedback on the section.

On the topics:

  1. This section of document is written for any implementers (custom or those writing a library). But you are right, many will likely use a library and somebody has already made these decisions for them. However, the goal from Brad and I is to detail how to solve the problems so that the implementer of said library can make their choices based on our research here.
    1. Counter Methods 1 and 2 both solve the batch UUID creation problem when timestamp has not incremented. Each with some pro/cons that I want to clearly define.
    2. Increment Types A and Type B both provide some other pro/cons.
    3. There are many advocates for each; removing one will likely cause negative feedback from one group I am sure.
  2. We have a counter range specified. Different languages have different speeds and may produce more/less than others. They can test and choose a length that fits them.
  3. SHOULD vs MAY is always tricky. I believe somebody asked me to change this to SHOULD or vice versa. Maybe it was @fabiolimace?
  4. Same as previous bullet.
  5. I like this approach. I can add it to Counter Rollover Handling. Could you open a new Change Proposal issue for me to track?
  6. Again, N-1 works, increasing timestamp as rollover guard works. They both solve the problem. Neither are wrong. Similar to Method 1/2 and Increment Type A/B. I can describe this rollover guard along with N-1 in Counter Rollover Handling section and let an implementation choose. To differentiate them from Counter Method and Increment Type I can define these as Rollover Guard Option i and ii. Same as previous. If you could open a Change Proposal I can get this into Draft 04.

Plus, remember, everything in this section also pertains to UUIDv8 where somebody may want to do it their own way. Rather than spend a ton of cycles on R&D; we have already vetted most problems/solutions and documented the pros/cons. They can implement what they need and be on their way.


Edit: Saw your other comment on timestamp length thread:
I agree, implementers will likely not expose every single knob from the best practices section. They will likely have UUIDv7 libraries will likely have the minimum required parts:

  • Timestamp
  • Entropy
  • This is is the fastest as you said (uuid_generate_v7_fast())

And MAY layer in a counter of parts:

  • Counter Method
  • Increment Type
  • Rollover Guard Option
  • of which will make things slower but improve collision resistance i.e uuid_generate_v7()

The folks using those libraries will take what they get and not need to think about all the other "best practices" defined in that section.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Apr 8, 2022

N-1 approach shouldn't be default. Overflow protection isn't worth one bit entropy for most applications. Plus, counter overflow isn't a serious disaster if handled properly e.g. by your smart handling technique.

I agree to trade the N-1 approach for a timestamp increment. Complex problems require complex but ultimate tools. It's not a choice between green and purple UUIDs with an A/B test among Instagrammers

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Apr 8, 2022

6. Again, N-1 works, increasing timestamp as rollover guard works. They both solve the problem. Neither are wrong. Similar to Method 1/2 and Increment Type A/B. I can describe this rollover guard along with N-1 in Counter Rollover Handling section and let an implementation choose. To differentiate them from Counter Method and Increment Type I can define these as Rollover Guard Option i and ii. Same as previous. If you could open a Change Proposal I can get this into Draft 04.

It would be clearer if all options had short names instead of numbers, letters and counting sticks.

It seems to me that the merits and demerits of the options are less described in the RFC than in the GitHub issues. Therefore, the implementers will turn into Buridan donkeys. The RFC should indicate the preferred option, of course with justification. It seems to me that in the course of the discussion, the positions of the participants here quickly converge. Therefore, it should not be a problem to choose the preferred option. If something causes controversy, then it has not yet been sufficiently thought out.

It would be desirable to show an exact and unambiguous reference layout of UUID for example, for a marketplace or an international bank.

@fabiolimace
Copy link

fabiolimace commented Apr 8, 2022

SHOULD vs MAY is always tricky. I believe somebody asked me to change this to SHOULD or vice versa. Maybe it was @fabiolimace?

@kyzer-davis , I just wanted to suggest this change below as the Plus One increment type is a special case of Plus N (random increment).

With this increment the actual increment of the counter MAY SHOULD be a
random integer of any desired length larger than zero. ...

I gave up on the idea of creating a Change Proposal issue because the deadline for submitting the draft was running out. We all want to see our collective effort specification published as a standard as soon as possible.

However, it is important that we correct small problems as they are discovered. So I want to take the opportunity to suggest further changes. Forgive me for saying this just now. I was looking forward to the submission of the new draft.


I really liked the two counter methods and the two increment types. The combinations produced by them seemed to resolve all the differences of opinion that there were. I mean if an implementer wanted to use method 1 with type 2 they would be free to do so with full approval of the specification.

However, while implementing a GitHub Gist to test the four combinations of counter methods and increment types, I realized that maybe it's not necessary to have 4 combinations. I'll explain.

  1. The Method 1 and Type A combination seems to be accepted by many of us, although there are small details that are not unanimous, such as the rollover guard bit.

  2. The Method 1 and Type B combination does not seem to be of much practical use, as it greatly reduces the increment space and increases the risk of rollover. Furthermore, it is identical to the Method 2 and Type B combination if we put the two side by side.

  3. The Method 2 and Type A combination seems to be preferred by those who like the monotonic ULID. This combination has the problem of being easy to guess the next value, but it has the advantage of generating UUIDs at a very high speed, for applications that need it.

  4. The Method 2 and Type B combination seems to be preferred by people who don't like the Method 1 and Type A combination. In practice both combinations have the same effect. So the preference of one or the other will depend on the ease of implementing in a given programming language.

So I would like to propose that there are only 2 combinations: Method 1 with Type A and Method 2 with Type B. With that I hope to remove the cons and just keep the pros. Of course you are all free to disagree.

The changes I want to propose in the text are the ones below. I just copy-pasted and changed some modal verbs.

Fixed-Length Dedicated Counter Bits (Method 1):

This references the practice of allocating a specific number of
bits in the UUID layout to the sole purpose of tallying the total
number of UUIDs created during a given UUID timestamp tick.
Positioning of a fixed bit-length counter SHOULD be immediatly
after the embedded timestamp. This promotes sortability and
allows random data generation for each counter increment. With
this method rand_a section of UUIDv7 MAY SHOULD be utilized
as fixed-length dedicated counter bits. In the event more counter
bits are required the most significant, left-most, bits of rand_b MAY
be leveraged as additional counter bits.

With this increment logic the counter method is incremented by one
for every UUID generation. When this increment method is utilized
with Fixed-Length Dedicated Counter the trailing random generated
for each new UUID can help produce unguessable UUIDs.
(Copy-pasted from Type A)

Monotonic Random (Method 2):

With this method the random data is extended to also double as a
counter. This monotonic random can be thought of as a "randomly
seeded counter" which MUST be incremented in the least significant
position for each UUID created on a given timestamp tick.
UUIDv7's rand_b section SHOULD be utilized with this method to
handle batch UUID generation during a single timestamp tick.

With this increment the actual increment of the counter MAY SHOULD
be a random integer of any desired length larger than zero. When
this increment method is utilized with Monotonic Random Counters the
counter ensures the UUIDs retain the required level of unguessability
characters provided by the underlying entropy.
(Copy-pasted from Type B)

The increment of the countar MAY be 1. However when this increment
method is utilized with Monotonic Random the resulting values are
easily guessable. Implementations that favor unguessiblity SHOULD NOT
utilize this method with the monotonic random method.
(Copy-pasted and adapted from Type A)

@LiosK
Copy link

LiosK commented Apr 9, 2022

I'd rather suggest to subsume the monotonic random under the fixed-length counter because the monotonic random is just a 74-bit variation of fixed-length counters, but I don't have a strong opinion here as long as unsafe choices are properly quarantined.

it has the advantage of generating UUIDs at a very high speed

It's a very good point. I've been observing that UUIDv7 with counter is much faster than that without counter because even the blazing fast arc4random implementations are still much slower than incrementing a counter.

@edo1
Copy link

edo1 commented Apr 9, 2022

I've been observing that UUIDv7 with counter is much faster than that without counter because even the blazing fast arc4random implementations are still much slower than incrementing a counter.

I guess it will exactly the opposite in real life. Version with counter needs mutex or another lock.

@fabiolimace
Copy link

fabiolimace commented Apr 9, 2022

I'd rather suggest to subsume the monotonic random under the fixed-length counter because the monotonic random is just a 74-bit variation of fixed-length counters

Yes, monotonic random can be thought of as a 74-bit variation of fixed-length counters. I just thought "hey, maybe we can reduce the cognitive load if we omit the fixed-length counter that increments by a random number" (Method 1 with Type B). I don't have a strong opinion either.

I guess it will exactly the opposite in real life. Version with counter needs mutex or another lock.

I think it might be true if the pseudo-random generator is not cryptographically secure. I just tested this hypothesis in a project of mine using Random() (fast generator) instead of SecureRandom() (secure generator). But the benchmarks did not confirm this. Monotonic ULIDs were always faster than non-monotonic ULIDs, even using a fast non-cryptographically secure generator. Maybe it's due to my implementation, but I'm not sure.

@LiosK
Copy link

LiosK commented Apr 9, 2022

I guess it will exactly the opposite in real life.

I think it depends. Nowadays, ChaCha20-based fast cryptographic PRNGs are widely available, and thus CSRNG can be regarded non-blocking, but taking secure random bits used to be considered (or in some environments it actually is) a slow I/O operation. Also, many RNGs use some sync mechanisms to protect their own states.

@LiosK
Copy link

LiosK commented Apr 9, 2022

"hey, maybe we can reduce the cognitive load if we omit the fixed-length counter that increments by a random number"

@fabiolimace, perhaps you're right! The random increment approach makes sense with 74-but counters only and the plus-one approach makes sense with 42-bit or less counters only. In this sense, the two types of counters might have different natures, so your approach will make the discussion easier to understand!

@sergeyprokhorenko
Copy link

It would be desirable to show an exact and unambiguous reference layout of UUID for example, for a marketplace or an international bank.

To facilitate the discussion, I propose Layout grid for UUIDv7 with Crockford's Base32 encoding (for a new RFC Draft)

@kyzer-davis
Copy link
Contributor Author

kyzer-davis commented Apr 11, 2022

@fabiolimace, The old deadline for Draft 03 submission was just so the existing draft didn't expire!
Draft 04 will need to happen as a result of some mistakes from me... so we might as well get some fixes in for MAY/SHOULD and any other clarifications we need to make via change proposals so I can easily iterate on them.

My new deadline for Draft 04 is a month before IETF 114. So June 23 since I plan to attend in person and have a live working session/discussion for this draft. (I will update the README shortly.) [Edit: Draft 04 live and ready for updates. Deadline set on README.]

As for your summary of Counter Method, Increment Type combos:
This is spot on with my own analysis and ultimately what I was trying to convey for pros/cons. This is also the type of feedback I like to see from the prototypes. So with that, I agree, removing Fixed-Length Method A + Random Increment Type B will help. Let's open a change proposal on this too.


@sergeyprokhorenko

It would be desirable to show an exact and unambiguous reference layout of UUID for example, for a marketplace or an international bank.

I agree: I was creating the test vectors in the appendix for this reason. I wanted to show these with real data and ultimately a real UUIDv6/7/8. I think it would be beneficial to have a version of UUIDv7 with counters as a reference point. I will open a change proposal for that too.


Also just tracking for my own sanity:
@edo1 change proposal for "re-use last timestamp handling if UUID < current" and change proposal for "alternative to N-1 by incrementing least sig bits of time" (if this makes sense for MS timestamp resolution.)

@LiosK
Copy link

LiosK commented May 22, 2022

Hi! I've just created a change proposal #107 to insert a single line discussing the "timestamp as reserve counter" approach discussed above. It's a very powerful approach while being counter-intuitive at first sight; it's worth noting in the standard to help implementers come up with it.

I've tested this approach on my prototypes in JS and Rust for a while and found that:

  • This approach can guarantee the non-blocking generation of monotonic UUIDs and, as a result, simplify the library API by getting rid of sync/async headaches.
  • The actual timestamp usually catches up with the incremented timestamp immediately because counter overflows rarely occur multiple times in a row.
  • This approach is consistent with the current draft RFC that allows altering, fuzzing or smearing of timestamps.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Discussion Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants