Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix datarace in blocksync SyncStatus() and the new blocksync design #3890

Closed
wants to merge 26 commits into from

Conversation

millken
Copy link
Contributor

@millken millken commented Jun 19, 2023

Description

fix SyncStatus() datarace, the blocksync(v2) speed-up 4x

Fixes #3889

Type of change

Please delete options that are not relevant.

  • [] Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • [] Code refactor or improvement
  • [] Breaking change (fix or feature that would cause a new or changed behavior of existing functionality)
  • [] This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • [] make test
  • [] fullsync
  • [] Other test (please specify)

Test Configuration:

  • Firmware version:
  • Hardware:
  • Toolchain:
  • SDK:

Checklist:

  • [] My code follows the style guidelines of this project
  • [] I have performed a self-review of my code
  • [] I have commented my code, particularly in hard-to-understand areas
  • [] I have made corresponding changes to the documentation
  • [] My changes generate no new warnings
  • [] I have added tests that prove my fix is effective or that my feature works
  • [] New and existing unit tests pass locally with my changes
  • [] Any dependent changes have been merged and published in downstream modules

@millken millken requested a review from a team as a code owner June 19, 2023 07:34
@codecov
Copy link

codecov bot commented Jun 19, 2023

Codecov Report

Merging #3890 (2932389) into master (e1f0636) will increase coverage by 0.86%.
The diff coverage is 75.23%.

❗ Current head 2932389 differs from pull request most recent head 7d3799f. Consider uploading reports for the commit 7d3799f to get more accurate results

@@            Coverage Diff             @@
##           master    #3890      +/-   ##
==========================================
+ Coverage   75.38%   76.24%   +0.86%     
==========================================
  Files         303      326      +23     
  Lines       25923    27818    +1895     
==========================================
+ Hits        19541    21210    +1669     
- Misses       5360     5516     +156     
- Partials     1022     1092      +70     
Impacted Files Coverage Δ
action/protocol/execution/evm/evm.go 43.52% <0.00%> (-2.95%) ⬇️
action/protocol/execution/evm/evmstatedbadapter.go 66.77% <ø> (ø)
action/protocol/poll/consortium.go 0.00% <0.00%> (ø)
action/protocol/poll/staking_committee.go 43.85% <0.00%> (ø)
...tion/protocol/staking/contractstake_bucket_type.go 0.00% <0.00%> (ø)
api/grpcserver.go 86.40% <0.00%> (ø)
api/loglistener.go 25.00% <0.00%> (ø)
api/web3server_marshal.go 93.21% <ø> (ø)
api/websocket.go 5.17% <0.00%> (ø)
blockchain/config.go 74.54% <ø> (ø)
... and 51 more

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -330,13 +330,14 @@ func (bs *blockSyncer) syncStageChecker() {
func (bs *blockSyncer) SyncStatus() (uint64, uint64, uint64, string) {
var syncSpeedDesc string
syncBlockIncrease := atomic.LoadUint64(&bs.syncBlockIncrease)
increase := syncBlockIncrease
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't comprehend the cause of this issue, nor the reason of this modification to resolve it

Copy link
Member

@dustinxie dustinxie Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, syncBlockIncrease is a local var, same as the new increase you added, can you dig to find out the root-cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it's my fault. I will test and dig it

go func() {
defer wait.Done()
startingHeight, tipHeight, targetHeight, syncSpeedDesc := SyncStatus()
t.Logf("BlockSync startingHeight: %d, tipHeight: %d, targetHeight: %d, %s", startingHeight, tipHeight, targetHeight, syncSpeedDesc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to verify sync status?

@millken millken marked this pull request as draft June 20, 2023 10:10
@@ -11,7 +11,7 @@ import (

const (
// PeriodicReportInterval is the interval for generating periodic reports
PeriodicReportInterval = 5 * time.Minute
PeriodicReportInterval = 20 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you think 5 min report cycle is a bit long, 2 minutes is a proper choice, 20 sec is too often.

@@ -500,7 +500,7 @@ func (builder *Builder) buildBlockSyncer() error {
log.L().Debug("Failed to commit block.", zap.Error(err), zap.Uint64("height", blk.Height()))
return err
}
log.L().Info("Successfully committed block.", zap.Uint64("height", blk.Height()))
//log.L().Info("Successfully committed block.", zap.Uint64("height", blk.Height()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

@@ -220,7 +220,7 @@ func (d *IotxDispatcher) actionHandler() {
case a := <-d.actionChan:
d.handleActionMsg(a)
case <-d.quit:
log.L().Info("action handler is terminated.")
//log.L().Info("action handler is terminated.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert

@millken millken changed the title fix datarace in blocksync SyncStatus() fix datarace in blocksync SyncStatus() and the new blocksync design Jul 10, 2023
@millken millken marked this pull request as ready for review July 10, 2023 06:34
"go.uber.org/zap"
)

type blockSyncerV2 struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the struct and funcs are mostly same as current code, we don't need a V2? can just change on current code

Copy link
Contributor Author

@millken millken Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, added V2 is to compare it with the current code.

func (bs *blockSyncerV2) syncWork() {
bs.sync()
for range bs.trigger {
time.Sleep(1 * time.Second) //limit the frequency of sync
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the current interval in the config, so it can be easily adjusted

}
if bs.cfg.Interval != 0 {
bs.syncTask = routine.NewRecurringTask(bs.sync, bs.cfg.Interval)
bs.syncTask = routine.NewDelayTask(bs.syncWork, bs.cfg.Interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bs.syncTask = NewTriggerTask(bs.sync(), bs.trigger, bs.cfg.Interval)

@@ -236,6 +247,9 @@ func (bs *blockSyncer) TargetHeight() uint64 {
// Start starts a block syncer
func (bs *blockSyncer) Start(ctx context.Context) error {
log.L().Debug("Starting block syncer.")
if err := bs.TurnOn(); err != nil {
return err
}
Copy link
Member

@dustinxie dustinxie Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after L260

go time.AfterFunc(bs.cfg.Interval, func() {
		bs.syncTask.Trigger()
	})

blocksync/blocksync.go Outdated Show resolved Hide resolved
@@ -22,6 +23,7 @@ type Config struct {
// DefaultConfig is the default config
var DefaultConfig = Config{
Interval: 30 * time.Second,
RateLimitInterval: 1 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do some testing to see the sync speed? for 1 sec and 2 sec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1sec: 29.6 blks/sec
2sec: 14.4 blks/sec

@@ -291,9 +295,15 @@ func (bs *blockSyncer) ProcessBlock(ctx context.Context, peer string, blk *block
bs.lastTip = syncedHeight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we should also use atomic.SwapUint64 to handle bs.lastTip

@@ -76,9 +76,8 @@ type (

startingHeight uint64 // block number this node started to synchronise from
lastTip uint64
lastTipUpdateTime time.Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep this for now

  1. so we have a record of the current code/algorithm
  2. maybe the new algorithm can utilize this info in the future to further improve

bs.mu.RLock()
defer bs.mu.RUnlock()

return bs.lastTipUpdateTime, bs.targetHeight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it for now

}

func (bs *blockSyncer) checkSync(syncedHeight uint64) {
requestMaxHeight := atomic.LoadUint64(&bs.lastRequestHeight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastReqHeight :=

Copy link
Member

@dustinxie dustinxie Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we can merge above 3 lines of code into here

if syncedHeight > bs.lastTip {
		bs.lastTip = syncedHeight
		bs.lastTipUpdateTime = time.Now()
}

lastTip (and updateTime) is related to decide if we should send sync request

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also an earlier comment

think we should also use atomic.SwapUint64 to handle bs.lastTip

// we need to trigger a sync task
if tipHeight == bs.syncStageHeight {
bs.syncTask.Trigger()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to report the status, let's not do the check and trigger sync here (which is the work of the new TriggerTask)

Comment on lines +81 to +82
t.mu.Lock()
defer t.mu.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need a mutex? channel should be goroutine-safe

for {
select {
case <-ctx.Done():
goto done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is generally recommended to use break instead of goto to break out of a two-level loop in go

Comment on lines 83 to 86
select {
case t.ch <- struct{}{}:
default:
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a common package, it would be better to return whether the trigger was successful or not

Comment on lines 231 to 233
go time.AfterFunc(bs.cfg.Interval, func() {
bs.syncTask.Trigger()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not very reliable that ensuring peer ready by waiting a specific period. maybe could trigger it periodically

@iotexproject iotexproject locked and limited conversation to collaborators Jul 18, 2023
@iotexproject iotexproject unlocked this conversation Jul 19, 2023
@sonarcloud
Copy link

sonarcloud bot commented Jul 20, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@raullenchai
Copy link
Member

Close b/c it has been inactive for 1+yr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogSummary shows BlockSync speed 0.0 blocks/sec
4 participants