Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Snap/Fast sync vs. Full sync for Archive nodes? #24413

Closed
dzou opened this issue Feb 16, 2022 · 12 comments
Closed

[Question] Snap/Fast sync vs. Full sync for Archive nodes? #24413

dzou opened this issue Feb 16, 2022 · 12 comments

Comments

@dzou
Copy link

dzou commented Feb 16, 2022

The documentation describes that there are 3 sync modes {fast, snap, full} and different types of nodes including archive and full.

What is the difference between using snap/fast sync for archive nodes versus if full sync is used for archive nodes? Are archive nodes using snap sync not able to serve the same requests that archive nodes using full sync can serve?

Running:

geth --syncmode snap --gcmode archive

I tried to find some documentation but not clear what the verdict is:

Would love to receive some clarification on this! Many thanks.

@dzou dzou added the type:docs label Feb 16, 2022
@karalabe
Copy link
Member

There are two different notions: gcmode and syncmode. The latter refers how to do the initial sync and the former is how to behave garbage collection wide.

In fast/snap sync, the "current" state of the network is downloaded directly. As such, any state before it (not block, just the state) will not be available for serving. If you are running with gcmode=archive after snap sync, then your node will hold onto all generated state after initial sync, but anything before it will still be missing.

In full sync, blocks are processed one by one from the genesis. By default get will garbage collect the generated state, but if cgmode=archive is specified it will hold onto them. Thus you will have all the state available from genesis.

This allows a fast synced node to still retain its status an an archive node

That statement is wrong (I guess I messed it up at the time). I was meaning to write it "retains it's status a a full node".

Author speculates that the node is not able to serve requests prior to a "pivot block".

Archive mode retains all the state that gets generated during block processing. If you reprocess all the blocks from genesis - with archive flag set - you will have all that state. If you do the initial sync without block processing, the state for that segment will not be available.

Does --gcmode=archive require --syncmode=full? -- Answer is not known.

Depends on the behavior you want. If you want everything since genesis, then full is required. There are also use cases where you don't want everything, only from "today onward". In that case you can fast/snap sync and have archive keeping only the states after.

@dzou
Copy link
Author

dzou commented Feb 22, 2022

@karalabe -- Thank you so much for the response. I just have one more followup to help me understand --

What would happen if we ran an archive node with --syncmode full --gcmode archive and then shut it down for a day then switched to --syncmode snap --gcmode archive? Would it be able to sync faster to current time and still retain all information in history to serve?

We manage some archive nodes will syncmode=full and are wondering if there is someway to speed things up.

@holiman
Copy link
Contributor

holiman commented Feb 23, 2022

What would happen if we ran an archive node with --syncmode full --gcmode archive

Then it would store every state from genesis to until you shut it off.

day then switched to --syncmode snap --gcmode archive?

It would ignore the new syncmdoe and continue.

I guess what would be desireable for you would be this:

  1. Node A has (all) state for blocks 0-2M,
  2. Node B has all state for blocks 2M-3M,
  3. etc..
    N. Node N has state from 13M to head

This is possible, but would require a bit of coding, and some special setup. For example, you would run nodes 1-N-1 with --nodiscover --maxpeers=0 to prevent them from importing more data.

The way to create a "archive node from 1M to 2M could be to:

  1. Use syncmode=full until 1M,
  2. Do a state-pruning
    • After pruning, you can also copy the datadir for use with the 2M-3M node, which needs to continue without gcmode=archive
  3. Use syncmode=full gcmode=archive between 1M and 2M
  4. Stop the node
  5. Run the node with --nodiscover --maxpeers=0.

I guess the one thing lacking to script up such a scenario right now is that we don't have a way to stop at a certain block, e.g. geth ...args.. --exit-at=2000000.

Another useful option would be to extend gcmode, so that one could say e.g. gcmode=0:full,1000000:archive,2000000:full, meaning it would be given a set of N:<mode>, in increasing order, and automatically switch at the given numbers.
I'll file this up as a potential feature.

@dzou
Copy link
Author

dzou commented Feb 23, 2022

Thank you! 🙏

@shiziwen
Copy link

2. Do a state-pruning

@holiman Thanks for your replay.
But what do you mean by Do a state-pruning? What should I do or which command should I use to complete the state-pruning?

@holiman
Copy link
Contributor

holiman commented May 31, 2022

I mean geth snapshot prune-state.

@shiziwen
Copy link

shiziwen commented Jun 1, 2022

I mean geth snapshot prune-state.

Thank you very much, I will figure out what this command do.

@shiziwen
Copy link

shiziwen commented Jun 1, 2022

@holiman Hi, I have another question about the state and the snapshot.

As I know, every block has its state(accurately, the world state MPT) which contains the account (also the contract) info, for full sync mode + full node, it will use the downloaded transactions to generate the state, and for fast(now is snap) sync + full node, it will not download the state until the pivot block and after that it will work as full sync, right?

So the full node actually save the state for every block after the full sync, right?

But from some documents and my test, the full node(either snap sync or full sync), it will only preserve state for the latest 128 blocks. Otherwise, it will return error with missing trie node XXX (path ) when use eth_getBalance to get one account at a specific block number .
So, I don't understand why, or for the state, what's the difference between full node and archive node?

And what's the snapshot? what's the differences between state and snapshot?

Thank you very much.

@MikeC-BC
Copy link

MikeC-BC commented Jul 27, 2022

A follow-up question: Is it possible to reprocess blocks from a certain block height to retain that state?

E.g. Say I ran my node with --syncmode fast and --gcmode archive until block 15000000 when I switched to --syncmode full. I now want state starting from block 14000000.

Can this be done with some reprocessing of blocks without having to do a full sync from scratch?

@karalabe
Copy link
Member

Only if you reprocess everything from genesis (i.e. a sull sync).

@ubuntutest
Copy link

an old documentation told me to use Geth with pruning to get a copy of only the newest blocks.

to do this I should have used the "--fast" flag, I understand this is deprecated.

currently it seems there are "full" "snap" "light" flags.

Can you tell me the difference between "snap" and "light" ? full I think it's obvious.

@MariusVanDerWijden
Copy link
Member

@ubuntutest https://geth.ethereum.org/docs/fundamentals/sync-modes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants