Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TrieDBRawIterator::prefix_then_seek #190

Merged
merged 6 commits into from
Mar 17, 2023

Conversation

koute
Copy link
Contributor

@koute koute commented Mar 17, 2023

After calling the TrieDBRawIterator::prefix_then_seek the iterator is not supposed to return any keys outside of the requested prefix; this unfortunately is not the case, and this PR fixes the issue.

Implementation notes

  • Considering the complexity of the code I didn't completely trust myself to get this right, so I also added a fuzz test for this method. So far it's been running for quite a while and stopped barfing any new errors.
  • For the fuzz test and for the tests I've copy-pasted the trie layout code from Substrate. The reference trie here already contains a Substrate-like trie layout, but it's not exactly the same. If I'm going to fuzz on something I'd rather do it on exactly the same thing we use in production.
  • I only test and fuzz on the trie layout from Substrate. For historical reasons (e.g. we had to be Ethereum compatible in the past) the trie crate here is generic over various different layouts, but those 1) do behave differently (AFAIK some of the tests fail on other layouts), and 2) we really don't need to support those (honestly, I think we should just remove most of this flexibility as it just makes the code more complex for absolutely zero benefit to us).
  • For the tests I've added a reproduction of the real-world issue that was affected by the bug this PR fixes, and also a few extra tests for anything that the fuzzer barfed while I was developing the fix.
  • The case of what happens when both a start key and a prefix key is specified was not very well tested and it was mostly underspecified; now it is tested and behaves in a way that (I think) makes intuitive sense (take look at the filter in the fuzz test I added).
  • Bumped the crate versions for immediate release.

Potential future work

  • Refactor the rest of the tests to use the paste crate and run each trie layout as their own test instead of running every layout inside of a single test.
  • Refactor the rest of the fuzz tests to use the arbitrary crate instead of manually generating the fuzz input data.
  • Cut down on the genericity and simplify everything.

Copy link
Contributor

@cheme cheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing this one (I did not go through the fuzz code yet but prefer to approve asap).

Refactor the rest of the fuzz tests to use the arbitrary crate instead of manually generating the fuzz input data.

sure is desirable.

Cut down on the genericity and simplify everything.

we should fork to have substrate only crate, this is something we want for a while.
my opinion was to wait for removing rocksdb support (which will also allow simplifying a lot), but probably a better idea to
do sooner.

@@ -0,0 +1,725 @@
// Copyright (C) Parity Technologies (UK) Ltd.
Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove test-support/reference-trie/src/substrate-like.rs (can be done later, might be some slight difference and would be better to get fix in)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we can remove that one if we're going to have the real thing. I just didn't want to needlessly bloat up this PR with more changes.

Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some slight modif in substrate_like (iirc only having the threshold size as a parameter), but generally would make sense to use the same names as substrate as in your file.
Like I said could be done later.

return Ok(())
}

if !self.seek(db, prefix)? {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the fix is to seek in two steps (and avoid doing it twice if prefix == seek actually just using prefix) with second seek that cannot go outside the prefix due to prechecks.

Indeed my code was broken on empty prefix, seek going after the prefix and trail being clean as if it is into the prefix (original code did not test the return value).

@cheme
Copy link
Contributor

cheme commented Mar 17, 2023

From what I read this bug is not in a host function (only clear_prefix v3 can go in the code path but it is "register_only" cc @bkchr is it possible to still use the function? (in this case I am not sure if it can be an issue since runtime will only call it with a cursor if it is not empty, and having cursor at last position do not make sense as we know it is the last position)).

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked Substrate and the good thing is that we don't have the clear_prefix host functions with maybe_cursor enabled yet. So, we should not be able to have build a block with this bug? @cheme and @koute please check again that I'm right!

Comment on lines +328 to +333
NodePlan::Leaf { partial, .. } => {
len += partial.len();
},
NodePlan::Extension { partial, .. } => {
len += partial.len();
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheme why are we not adding here len += 1? What is the difference to the others?

Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Branches contains a nibble that is not part of its partial key, the index to the child node.
a trie with key (nibbles not bytes) '1,0' and '1,2' will be a branch and two empty partial key children leafs. The partial key of the branch will be 1 and the branch contains 0 and 2 nibble as index to the leafs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay!

Comment on lines +303 to +308
if !seek.starts_with(prefix) {
// We're supposed to seek *after* the prefix,
// so just return an empty iterator.
self.trail.clear();
return Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like something that shouldn't be supported at all and the upperlying logic should forbid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what exactly you mean here, but without this check this method can give nonsensical results (testcase_4 will fail).

Basically I wanted this whole thing to work in the following way with no other special/corner cases:

  • All of the keys returned by this are guaranteed to start with the prefix.
  • All of the keys returned by this are guaranteed to be equal or bigger lexicographically to seek.

And even if you give the method arguments which might seem silly (e.g. like here put a seek that's after the prefix and doesn't start with it) it's supposed to handle it in a way in which it makes sense without returning garbage results.

Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it from the sp_io perspective and api should be I think:

  • prefix some prefix
  • seek some seek key without the prefix (I mean after or bellow the prefix).

So the first value you access will not be seek when into the prefix, but prefix ++ seek

Copy link
Contributor Author

@koute koute Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it from the sp_io perspective and api should be I think:

  • prefix some prefix

  • seek some seek key without the prefix

So the first value you access will not be seek when into the prefix, but prefix ++ seek

So if I understand this right, are you suggesting that the subsequent seek after seeking to the prefix should be allowed to "break out" of the prefix and iterate over values which do not start with the prefix?

Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more an api suggestion that would avoid some corner cases (current code looks plain good to me):
Instead of calling prefix_then_seek(b"prefix", b"prefix_seek") or prefix_then_seek(b"prefix", b"not_in_prefix").
We would call prefix_then_seek(b"prefix", b"_seek") and could not do the second call at all.

Copy link
Contributor

@cheme cheme Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doing this change is breaking api. I think it would be more interesting at sp_io level also.
(was just a thought I had during lunch and wanted to share, not something that should really be part of this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I guess we could do that, but that might be more awkward to use depending on the use case I guess? But then it could be easier in some other cases? And also, in this case I think it could make sense for it to return keys which are also have the prefix stripped, e.g. assume the following code:

loop {
    let accounts = fetch_values(ACCOUNTS_PREFIX, cursor, 100);
    // ...
    cursor = match accounts.last() {
        Some(account) => account,
        None => break
    }
}

For this code to work and be convenient to use it'll have to return the same "type" of key (stripped of the prefix or not stripped) as what the function accepts in cursor, otherwise you'll have to do awkward concats or have to slice off the prefix yourself.


Also, only tangentially related, but maybe naming wise we could rename prefix to only_starting_with and seek to skip_until_or_after or something like that? It's a little wordy but it would explain what it does more clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not evident it is better, if it does not sounds evidently better, then it is probably not worth the change.

Also, only tangentially related, but maybe naming wise we could rename prefix to only_starting_with and seek to skip_until_or_after or something like that? It's a little wordy but it would explain what it does more clearly.

Got no opinion for the trie crate, but for frame or host function, clearly would make sense but I think the api is somehow rather different already.
One idea that is hard to convey is that when we do prefix operation, only nodes starting the prefix or bellow the prefix are accessed (we don t do the seek then iterate until a key that is not under the prefix is reached (avoiding adding the first key out of the prefix in the proof)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking twice could be good for trie to have better naming, but I got pretty use to the current one so my opinion is rather biased.

Comment on lines +310 to +314
if !self.seek(db, prefix)? {
// The database doesn't have a key with such a prefix.
self.trail.clear();
return Ok(())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix or?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because before we still returned some data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual fix or?

Because before we still returned some data?

I think @cheme already explained it pretty well here, but basically yes, previously we seeked to a key which might not start with the prefix anymore, and the code just checked that the length is good but didn't check that it's still within the prefix.

@bkchr
Copy link
Member

bkchr commented Mar 17, 2023

From what I read this bug is not in a host function (only clear_prefix v3 can go in the code path but it is "register_only" cc @bkchr is it possible to still use the function? (in this case I am not sure if it can be an issue since runtime will only call it with a cursor if it is not empty, and having cursor at last position do not make sense as we know it is the last position)).

Ahh you also came to the same conclusion as me! Good :D

@koute
Copy link
Contributor Author

koute commented Mar 17, 2023

I also checked Substrate and the good thing is that we don't have the clear_prefix host functions with maybe_cursor enabled yet. So, we should not be able to have build a block with this bug? @cheme and @koute please check again that I'm right!

From a cursory look this sounds about right.

@koute
Copy link
Contributor Author

koute commented Mar 17, 2023

Okay, since I want to put up a substrate PR with this fix I'll merge this now. If you'd still like for me to make any further changes feel free to leave them and I'll do them in a followup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: done
Development

Successfully merging this pull request may close these issues.

3 participants