Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for merkle patricia trie proofs #240

Open
MaksymZavershynskyi opened this issue Aug 3, 2020 · 12 comments
Open

Tests for merkle patricia trie proofs #240

MaksymZavershynskyi opened this issue Aug 3, 2020 · 12 comments
Assignees

Comments

@MaksymZavershynskyi
Copy link
Contributor

We need exhaustive tests for various corner cases of merkle patricia trie proofs to exercise this code here: https://github.com/near/rainbow-bridge/blob/master/libs-rs/eth-prover/src/lib.rs#L177-L315

@ailisp
Copy link
Contributor

ailisp commented Aug 3, 2020

As discussed in meeting, @Kouprin is going to help on this issue since having similar test in nearcore. @Kouprin please to drop my assignment if you start work on it, i'll also drop you if i have time to start

@Kouprin
Copy link
Contributor

Kouprin commented Aug 4, 2020

I see the following ways to prepare lots of tests:

  1. Generate locally lots of fake trees locally and check proofs for them. The main concern here is our inner way of tree generating and proving may differ from Etherium one. This way of testing seems to be completely useless because it doesn't include ETH feedback.
  2. Take lots of real ETH records and check proofs for them. This is tricky because:
    a) we can check only valid records easily and need to check invalid as well - we cannot just naively replace proof with random bytes;
    b) we can't guarantee we can test all corner cases because we can't build trees arbitrary. We may even don't know how those trees look like.
    It seems we already have some tests built in such way.
  3. Run ETH node from scratch then build arbitrary tree and then prove it. This seems to be extremely complex as it implies lots of knowledge how ETH stuff works and then develop a technique to build arbitrary trees in ETH.

Any ideas?

@ailisp
Copy link
Contributor

ailisp commented Aug 4, 2020

Good analysis! I suggest we do 1 since 2 is hard to test corner case and 3 is really hard. Fabricate corner cases to cover all code paths and verify result with other reference implementations. If result same it's very likely correct. If not same, there could be our or reference impl incorrect, we can do further verification there.
At the meantime, i can run 2 for a wide range of blocks, if it can correctly verify 1 year of block it's probably correct enough. But these are test for "valid blocks", we don't know if it reject "invalid blocks", maybe do 2a)? Or use a fuzzy tester?

@nearmax WDYT?

@Kouprin
Copy link
Contributor

Kouprin commented Aug 5, 2020

@ailisp I'm against 1 and here's why. It sounds like solid way to run thousand of tests full of corner cases and check validity on them. We can do that. However, it will test only what we're building on our inner infrastructure and completely not related of what is actually happening on ETH side. Solution 1 will create feeling safe and well-tested wrongly.

I might be wrong if there is a Standard which describes how the trees built and Etherium follows it clearly. For now I see lots of weird conditions like if dec.iter().count() == 17 { // branch node and else if prefix == 3 { // odd leaf node. It seems those conditions are found by reverse engineering, not because the Standard says: "branch node is the node when its data size is 17 elements".

I'm not convinced that we should rely on our ability to reproduce ETH way for trees building. We must use trees built on Etherium instead.

@ailisp
Copy link
Contributor

ailisp commented Aug 5, 2020

I see, I agree it's not much useful to manually construct info. How does other implementation test, would it possible to borrow their test suites or test tools to "build trees on Ethereium"?

@MaksymZavershynskyi
Copy link
Contributor Author

Let's finish this discussion and set estimate.

@Kouprin
Copy link
Contributor

Kouprin commented Aug 21, 2020

@nearmax I suggest to divide it into the following tasks:

  1. Imitate Eth trees - moderate benefits, high risk of being useless, moderate to implement.
  2. Download all possible Eth trees - high benefits, will be useful for forever, moderate to implement (I expect we already have some tools to download, need to sync with @ailisp)
  3. Run real Eth node and build arbitrary trees on it - high benefits (covering corner cases), moderate usefulness, hard to implement. Needs lots of investigation.

I'd suggest to do it in order 2-1-3. I expect doing 2 will cover most of the cases we want to cover.

@ailisp
Copy link
Contributor

ailisp commented Aug 22, 2020

Download all possible Eth trees - high benefits, will be useful for forever, moderate to implement (I expect we already have some tools to download, need to sync with @ailisp)

Yes we have the tool, but download all receipt logs of a block is a time consuming op (proof is extract from receipt logs), as there's usually hundreds of them in one block, each of them request as separate HTTP request and soon hit infura rate limit. Downloading at slower rate cost several minutes to download all of logs of one block. I'll run our own eth full node, and parallelize hundred of eth-dumper to make this faster.
Test all proofs is also slow, verify-near-proof take 20 minute to verify 10 blocks of proofs. We can also parallelize this and wait days till it finish

@Kouprin
Copy link
Contributor

Kouprin commented Aug 23, 2020

@ailisp Yep, that's why it's important to start downloading earlier. Can we analyze downloaded trees for their structure? If yes, we can get lots of trees and then build a subset of tests which covers most of corner cases.

@ailisp
Copy link
Contributor

ailisp commented Aug 25, 2020

@ailisp Yep, that's why it's important to start downloading earlier. Can we analyze downloaded trees for their structure? If yes, we can get lots of trees and then build a subset of tests which covers most of corner cases.

There's a sample in https://s3-us-west-1.amazonaws.com/rainbow-bridge.nearprotocol.com/test-data/eth-proofs.tar.gz you can take a look first. Sorry i haven't start downloading, let me see if i can start instances download today

@MaksymZavershynskyi
Copy link
Contributor Author

This is quite challenging. Setting estimate to 5.

@alexauroradev alexauroradev added P2 and removed P1 labels Feb 19, 2021
@alexauroradev
Copy link

Connected with #488

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants