Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop broadcasting transactions when the block cannot be solidified #5562

Closed
lvs007 opened this issue Oct 31, 2023 · 4 comments · Fixed by #5643
Closed

Stop broadcasting transactions when the block cannot be solidified #5562

lvs007 opened this issue Oct 31, 2023 · 4 comments · Fixed by #5643
Assignees

Comments

@lvs007
Copy link
Collaborator

lvs007 commented Oct 31, 2023

Rationale

The block production mechanism adopts the minimum participation strategy. When the number of SRs is lower than the minimum participation rate, SRs stop producing blocks. The default configuration is minParticipationRate = 15. As long as more than 15% of SRs work normally, SRs can produce blocks as usual.

The block production thread judgment logic is as follows.

int participation = consensusDelegate.calculateFilledSlotsCount();
int minParticipationRate = dposService.getMinParticipationRate();
if (participation < minParticipationRate) {
  return State.LOW_PARTICIPATION;
}

Block solidification mechanism: The block can be solidified only after it is confirmed by 70% of SRs, the codes are shown below.

private void updateSolidBlock() {
  List<Long> numbers = consensusDelegate.getActiveWitnesses().stream()
      .map(address -> consensusDelegate.getWitness(address.toByteArray()).getLatestBlockNum())
      .sorted()
      .collect(Collectors.toList());
  long size = consensusDelegate.getActiveWitnesses().size();
  int position = (int) (size * (1 - SOLIDIFIED_THRESHOLD * 1.0 / 100));
  long newSolidNum = numbers.get(position);
  long oldSolidNum = consensusDelegate.getLatestSolidifiedBlockNum();
  if (newSolidNum < oldSolidNum) {
    logger.warn("Update solid block number failed, new: {} < old: {}", newSolidNum, oldSolidNum);
    return;
  }
  CommonParameter.getInstance()
      .setOldSolidityBlockNum(consensusDelegate.getLatestSolidifiedBlockNum());
  consensusDelegate.saveLatestSolidifiedBlockNum(newSolidNum);
  logger.info("Update solid block number to {}", newSolidNum);
}

Background

There is a problem between the block producing mechanism and the solidification mechanism. When the number of participating SRs is greater than 15% and less than 70% of the total, SRs can continue to generate blocks. Since the blocks cannot be solidified and are kept in the memory, it may cause the memory to be exhausted.

Experiment Process

The experiment uses 27 mainnet SRs to construct an unfinalized scenario with maximum SR block production.

  • Hardware environment: 16 cores, 32G server, 24G heap memory for each SR.
  • Network environment: Keep 17 SRs, stop 10 SRs.
  • Experiment process:
    Starting from block 52873440, massively stress test transactions, up to block 52874137, and found that the SR block producing performance dropped to around 300 txs/block. Continue stress testing and found that the packaging performance continues to decline. Finally, after 5 days, the SR packaging performance became 2-3 txs per block, and eventually, an OOM occurred (SR014 on the morning of July 25).

Experiment Results

Conclusion 1: in the unfinalized scenario, after 697 blocks (840,000 transactions), the SR packaging performance dropped below 300 txs per block.
Conclusion 2: in the unfinalized scenario, the SR packaging performance declines linearly. When the single block packaging performance is 2 transactions, the OOM phenomenon occurs. A total of about 1.5 million transactions were packaged.

If there are too many blocks that cannot be solidified, the chain recovery speed will be plodding due to the need to synchronize a large number of unsolidified blocks after the node is restarted.

The following log comes from an online node.

00:00:00.006 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50355146, cost/txs: 132/125 false.
00:30:00.146 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50364955, cost/txs: 201/280 false.
01:00:00.313 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50370343, cost/txs: 406/448 false.
05:59:59.828 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50435815, cost/txs: 283/312 false.
12:18:48.894 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50531191, cost/txs: 123/154 false.
23:59:59.883 INFO  [sync-handle-block] [DB](Manager.java:1341) PushBlock block number: 50692835, cost/txs: 214/278 false.

The synchronization block time statistics are as follows.

cost(hour) Number of synchronized blocks
0.5 9809
1 15197
6 80669
12 176045
24 337689

Implementation

When the number of blocks that cannot be solidified reaches the threshold, transaction broadcasting is stopped to prevent SR from packaging too many transactions that cannot be solidified. This has the following benefits:

  • Avoid caching too many transactions that cannot be solidified which may cause memory exhaustion.
  • With fewer block transactions, block execution speed will be faster, block synchronization speed will be boosted, and chain recovery will be faster.
  • Avoid introducing too much dirty data, making data rollback easier.

The implementation is as follows.

Add solid block check function.

  public boolean unsolidifiedBlockCheck() {
    if (!unsolidifiedBlockCheck) {
      return false;
    }
    long headNum = chainBaseManager.getHeadBlockNum();
    long solidNum = chainBaseManager.getSolidBlockId().getNum();
    return headNum - solidNum >= maxUnsolidifiedBlocks;
  }

When broadcasting transactions, if the blocks that cannot be solidified reach the threshold, failure information will be returned directly.

  if (tronNetDelegate.unsolidifiedBlockCheck()) {
    logger.warn("Broadcast transaction {} has failed, block unsolidified.", txID);
    return builder.setResult(false).setCode(response_code.BLOCK_UNSOLIDIFIED)
          .setMessage(ByteString.copyFromUtf8("Bock unsolidified."))
          .build();
  }

When processing the inventory message, if the blocks that cannot be solidified reach the threshold, the message will no longer be processed.

  if (type.equals(InventoryType.TRX) && tronNetDelegate.unsolidifiedBlockCheck()) {
      logger.warn("Drop inv: {} size: {} from Peer {}, block unsolidified",
          type, size, peer.getInetAddress());
      return false;
  }
@xxo1shine xxo1shine changed the title When the network cannot be finalized for a long time, optimize the recovery speed Stop broadcasting transactions when the block cannot be solidified Oct 31, 2023
@jwrct
Copy link
Contributor

jwrct commented Nov 2, 2023

At what point does the number of blocks that cannot be solidified cause transaction broadcasting to stop? After all, the number of transactions in a block is uncertain, and I am concerned that setting the threshold too low will easily trigger this state.

@lxcmyf
Copy link
Contributor

lxcmyf commented Nov 3, 2023

What is the maximum memory allowed for block participation in solidification? The current threshold represents the height difference between the block head and the solidified block, and it should be linearly related to the memory. Otherwise, setting such a threshold has little significance.

@xxo1shine
Copy link
Contributor

@jwrct If do not consider DDoS attacks, it can refer to the block synchronization time statistics. Need to consider the chain recovery time. If it is half an hour, you can set the threshold to 10,000. If it is 10 minutes, you can set it to 3,000.

@xxo1shine
Copy link
Contributor

@lxcmyf The OOM phenomenon occurs, a total of about 1.5 million transactions were packaged. If there are no transactions in this block, it is considered that the memory usage of this block can be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants