Re-Genesis #7458

sorpaas · 2020-10-29T09:15:41Z

This documents some of notes and designs of a Re-Genesis process. Re-Genesis is basically the process of exporting the current chain state, and create a new chain building on it.

Rationale

The discussions started as an alternative method to Swappable Consensus (#1304). Many consensus engines we have right now (like BABE) make assumptions about the chain state, block numbers, among other things, so a direct consensus swapping will require some heavy modification of the consensus engines themselves. In addition, custom migration code must be written individually for each possible swapping.

Re-Genesis, on the contrary, is much simpler. If implemented with care, it can accomplish the same thing as Swappable Consensus. We do not need to modify existing consensus engines to remove their assumptions, but just need to make switching and restarting a runtime plus consensus engine combination fast.

Re-Genesis can also be used for other purposes that Swappable Consensus is not able to cover:

Replace faulty runtime upgrades.
As a hard fork process.
As a way to "squash" the chain and reduce syncing time.
Carry out stop-the-world migrations more smoothly and reliably.

Design

Choosing the Re-Genesis block

A Re-Genesis process divide a blockchain into eras. If a blockchain is considered in era N prior to Re-Genesis, it becomes in era N + 1 post Re-Genesis. At each era, the block number starts from 0. So we can refer to blocks as "era N block M".

The first question is how we choose the Re-Genesis block.

We can always choose the head block at a particular height, but that would not be reliable. There can be multiple such blocks at the same time, and if the state rebuilding process is heavy, allowing it to be switched around is an attack vector.

Instead, we define the Re-Genesis block as a finalized block at a particular height (for chains with finalization), or a block at a particular height with siblings of depths at least D (for chains with probabilistic finalization). This means that when switching from era N to era N + 1, upon the Re-Genesis block, the old era N chain will continue to build blocks and states, but those built blocks and states will not be accounted for in the new era. Instead, they're only there to make the possibility of having multiple Re-Genesis blocks low.

Stopping the old era chain

Having the old era N chain continuing to build blocks and states is definitely not ideal. So we can work on additional support for the runtime to stop the old era chain. The chain stopping process consists of two steps:

First, the chain state is frozen. No balance can be transferred. No new proposals can be submitted. The validator set is frozen. No reward will be issued. The existing validator set continue to build blocks.
Upon a finalization block at the Re-Genesis height, the runtime then issues a setCode command with an empty code, to permanently shut down the code chain.

Starting the new era chain

Substrate users define their own migration script. The migration will obviously define the initial parameters of the new consensus engine. For the rest of the states, Substrate users can cherry-pick what they want and discard others -- either taking the full state over, or just take the balances and other essential things.

After migration, this new state is then set as the genesis block state for era N + 1, and a new chain continues to function beyond this point.

We note that the difference of a Re-Genesis process and a complete new blockchain, is that the genesis state for a Re-Genesis process is not known until the Re-Genesis block is identified.

Discussions

Light client

Light client implementations differ by consensus engines. As a result, no matter using Swappable Consensus or Re-Genesis, they may not work accross the border. Substrate users may have to ask node users to manually switch light clients, upon Re-Genesis.

Missed time

During the Re-Genesis process, we note there's a stop-the-world migration. Even if that is fast, to identify the Re-Genesis block, time has to be spent on the old era chain to finalize the Re-Genesis block. This will result in a period of time when no actual blocks with state is building for the blockchain.

UX issues

Re-Genesis introduces a new concept called "era", and compared with Swappable Consensus, the new era's block starts their block numbers from 0 again. This can be an UX issue that we should take care of.

Prior usages

The only real-world usage right now (relying on an ad-hoc Re-Genesis process) was Kulupu's era switch at era 0 block 320,000. The process was almost like above, but everything was done manually (with a new node released after Re-Genesis block).

Edgeware also considered Re-Genesis for its first runtime upgrade, but decided against it due to UX concerns.

The text was updated successfully, but these errors were encountered:

sorpaas · 2020-10-29T09:15:55Z

cc @andresilva

Swader · 2020-10-29T09:45:04Z

What happens to past era extrinsics / events for the purpose of auditing (tax etc)? Can people still rebuild the previous era with archive nodes?

sorpaas · 2020-10-29T09:49:19Z

@Swader They should always be able to do that.

Right now I'm thinking about each era using different networking identifier and storage location for simplicity (that is, if we indeed decided to go towards the Re-Genesis direction), but the UX definitely can be improved.

Swader · 2020-10-29T09:52:44Z

The thing I'm wondering is, right now a full node can become an archive node without needing any communication from other nodes, just based on its extrinsics which it keeps no matter the pruning mode. A full node of era 1 will not be able to do that, presumably. Would this potentially cause an availability rift if no one were to be running a full node of era 0 any more?

sorpaas · 2020-10-29T09:57:12Z

@Swader Yeah indeed. But the chance that not a single person runs era 0 full node is quite slim, IMO.

Swader · 2020-10-29T10:00:10Z

Agreed, just putting it out there as a there is a chance.

I think this functionality is interesting, and I'd like to see it in Substrate. I don't think Polkadot would use this (because of the slight chance of missing past era availability), but I could definitely see Kusama undergo a new era launch every 5 million blocks or so 👍

andresilva · 2020-10-29T10:22:52Z

@Swader That is the same problem we will have when we implement warp syncing since nodes will stop downloading the history from before the snapshot point (or at least that was the case with our implementation in parity-ethereum). Normal node operation would still be to sync through all eras and import everything (potentially to different database locations on-disk but that's an implementation detail), so all the data would have the same availability guarantees it has today. The main driving point of this feature is as a potential implementation for swappable consensus, which we'd want to use in the future in Polkadot (e.g. for migrating from BABE to SASSAFRAS).

Light client

I think the light client would just have to start syncing from the latest era. I think this is OK since on PoS chains the light clients already cannot be trusted from genesis due to weak subjectivity.

Right now I'm thinking about each era using different networking identifier

This might make it harder to allow serving clients on all eras, but didn't check what changes would be needed on networking.

UX issues

I think ideally we'd want to avoid resetting the block numbers and just keep incrementing them across eras. From the client-side this might be doable just by maintaining an offset. For the runtime though not sure if that is enough since we might have state entries referencing block numbers from previous eras. I think we might need to remove the assumption that the genesis block is #0, and instead pickup the block number from the last era.

tomaka · 2020-10-29T13:28:27Z

Ultimately the networking should be capable of "connecting" to multiple different chains (#3310), in other words to support multiple different chains/eras at the same time, provided each chain/era has a different protocolId.

If however we don't reset the block number to 0, there's no change required on the networking.

apopiak · 2020-10-30T10:33:46Z

Is the name "era" intentionally similar to staking eras? If not I would suggest different naming to avoid confusion.

Swader · 2020-10-30T11:04:00Z

An eon is a unit that's bigger than era and is composed of eras, so that sounds appropriate.

jak-pan · 2021-03-29T21:01:40Z

We're in the same place now with HydraDX.

We've selected default epoch length from the Substrate repo of 10 minutes not realizing that it can have implications for network stability (i.e. no blocks for 10 minutes means stalling network) and also UX for validators - getting kicked out from the set and losing nominations if offline for 10 mins.

As epoch length cannot be changed after the fact the chain started, we're now either forced to restart from #0 with old state, or risk the stalling for now, prepare for this migration and restart after the fact.

The UX now however is not ideal as this is looking like a simple property change in the first place, but we're forced to upgrade all 200+ waiting validators, +even more nodes, make sure to purge their state and either wait for them to re-indicate validation/nomination by purging the validator set state from storage, or risk starting the chain and believe that they have done everything right on time.

Also going back to 0 doesn't look good from UX standpoint, since we're indeed continuing the chain.

jordy25519 · 2021-03-29T22:01:44Z

FWIW CENNZnet is in the same boat. We setup a system to move session keys to hot stand by nodes incase a validator is detected restarting or stalled etc.
This has kept the network running smoothly for the most part. The occasional slow block does tend to cause chaos for the network, emergency elections due to offline offences etc.

with changes like this it seems possible to increase epoch duration, maybe some one off hack like setting a specific epoch will be required: #8072

jak-pan · 2021-03-29T22:12:15Z

FWIW CENNZnet is in the same boat. We setup a system to move session keys to hot stand by nodes incase a validator is detected restarting or stalled etc.
This has kept the network running smoothly for the most part. The occasional slow block does tend to cause chaos for the network, emergency elections due to offline offences etc.

with changes like this it seems possible to increase epoch duration, maybe some one off hack like setting a specific epoch will be required: #8072

That is actually very good to hear that it's working for you, and there is a light at the end of the tunnel. I guess we could try to live with it at least during the first part of the incentivized testnet. We've already postponed slashing during this phase to 27 days and plan to revert slashes automatically, so I guess we'll have larger validator turnout since they'll need to get re-elected often, but that's actually not bad for testing phase.

jak-pan · 2021-04-01T15:45:56Z

So we've come quite far with our re-genesis galacticcouncil/hydration-node#191 but are now stuck at a chicken and egg problem here. polkadot-js/extension#687 (comment)

TLDR; We either need to stop the chain until extension is updated and then re-start (still could have problems), or deal with two separate instances of one chain which is kind of PITA since we already have quite a lot of users.

Anybody has any better idea how to tackle this problem?

stale · 2021-07-07T19:13:20Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

tomaka · 2021-07-08T07:19:46Z

Issue still relevant and important.

AurevoirXavier · 2022-04-19T01:06:12Z

Looking forward to this.
And I think this is also pretty useful for the long-term testnet.

rithythul · 2022-06-08T13:01:58Z

Has anyone try this concept out yet?

We face issue that Substrate era is not ended.

AurevoirXavier · 2022-06-08T14:07:29Z

Has anyone try this concept out yet?

We face issue that Substrate era is not ended.

You could take a look at https://github.com/darwinia-network/fork-off-substrate

ggwpez · 2022-06-08T22:28:40Z

I tried out the fork-off-substrate but its hitting JS limits maxsam4/fork-off-substrate#87
Not sure if the Darwinia fork fixed that?

rithythul · 2022-06-09T08:56:04Z

Thanks @AurevoirXavier, we tried once but didn't work.

Thanks so much for the help anyway.

AurevoirXavier · 2022-06-09T08:58:20Z

Thanks @AurevoirXavier, we tried once but didn't work.

Thanks so much for the help anyway.

Weird, our state is more than 1g.

rithythul · 2022-06-09T09:06:31Z

It works for you?
Did era not ended happen to Darwinia before too?
If so do you know the root causes?

In our case, we suspect that it could be not hardware specs are a bit low and we only have around 20 nodes + 13 validator. But still figuring out the root causes to prevent the next issue.

tomaka mentioned this issue Dec 17, 2020

Idea: Replace protocolId with the genesis hash #7746

Closed

apopiak mentioned this issue Mar 30, 2021

add more notes on changing epoch duration #8491

Merged

jak-pan mentioned this issue Mar 30, 2021

Prepare re-genesis galacticcouncil/hydration-node#191

Closed

16 tasks

nuke-web3 mentioned this issue Apr 7, 2021

How to export chainspec from already running node? paritytech/subport#137

Closed

This was referenced Apr 14, 2021

Re-genesis chain - account matching polkadot-js/extension#687

Closed

Add re-genesis information w3f/polkadot-wiki#1949

Closed

jak-pan mentioned this issue May 4, 2021

Prepare genesis 3 galacticcouncil/hydration-node#243

Closed

12 tasks

stale bot added the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 7, 2021

stale bot removed the A5-stale Pull request did not receive any updates in a long time. No review needed at this stage. Close it. label Jul 8, 2021

DrW3RK mentioned this issue Sep 14, 2021

Add Re-Genesis to Glossary w3f/polkadot-wiki#2590

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-Genesis #7458

Re-Genesis #7458

sorpaas commented Oct 29, 2020 •

edited

Loading

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

andresilva commented Oct 29, 2020

tomaka commented Oct 29, 2020

apopiak commented Oct 30, 2020

Swader commented Oct 30, 2020

jak-pan commented Mar 29, 2021 •

edited

Loading

jordy25519 commented Mar 29, 2021

jak-pan commented Mar 29, 2021

jak-pan commented Apr 1, 2021

stale bot commented Jul 7, 2021

tomaka commented Jul 8, 2021

AurevoirXavier commented Apr 19, 2022 •

edited

Loading

rithythul commented Jun 8, 2022 •

edited

Loading

AurevoirXavier commented Jun 8, 2022

ggwpez commented Jun 8, 2022

rithythul commented Jun 9, 2022 •

edited

Loading

AurevoirXavier commented Jun 9, 2022

rithythul commented Jun 9, 2022

Re-Genesis #7458

Re-Genesis #7458

Comments

sorpaas commented Oct 29, 2020 • edited Loading

Rationale

Design

Choosing the Re-Genesis block

Stopping the old era chain

Starting the new era chain

Discussions

Light client

Missed time

UX issues

Prior usages

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

sorpaas commented Oct 29, 2020

Swader commented Oct 29, 2020

andresilva commented Oct 29, 2020

tomaka commented Oct 29, 2020

apopiak commented Oct 30, 2020

Swader commented Oct 30, 2020

jak-pan commented Mar 29, 2021 • edited Loading

jordy25519 commented Mar 29, 2021

jak-pan commented Mar 29, 2021

jak-pan commented Apr 1, 2021

stale bot commented Jul 7, 2021

tomaka commented Jul 8, 2021

AurevoirXavier commented Apr 19, 2022 • edited Loading

rithythul commented Jun 8, 2022 • edited Loading

AurevoirXavier commented Jun 8, 2022

ggwpez commented Jun 8, 2022

rithythul commented Jun 9, 2022 • edited Loading

AurevoirXavier commented Jun 9, 2022

rithythul commented Jun 9, 2022

sorpaas commented Oct 29, 2020 •

edited

Loading

jak-pan commented Mar 29, 2021 •

edited

Loading

AurevoirXavier commented Apr 19, 2022 •

edited

Loading

rithythul commented Jun 8, 2022 •

edited

Loading

rithythul commented Jun 9, 2022 •

edited

Loading