-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix datarace in blocksync SyncStatus() and the new blocksync design #3890
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3890 +/- ##
==========================================
+ Coverage 75.38% 76.24% +0.86%
==========================================
Files 303 326 +23
Lines 25923 27818 +1895
==========================================
+ Hits 19541 21210 +1669
- Misses 5360 5516 +156
- Partials 1022 1092 +70
... and 4 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
blocksync/blocksync.go
Outdated
@@ -330,13 +330,14 @@ func (bs *blockSyncer) syncStageChecker() { | |||
func (bs *blockSyncer) SyncStatus() (uint64, uint64, uint64, string) { | |||
var syncSpeedDesc string | |||
syncBlockIncrease := atomic.LoadUint64(&bs.syncBlockIncrease) | |||
increase := syncBlockIncrease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't comprehend the cause of this issue, nor the reason of this modification to resolve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, syncBlockIncrease
is a local var, same as the new increase
you added, can you dig to find out the root-cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, it's my fault. I will test and dig it
blocksync/blocksync_test.go
Outdated
go func() { | ||
defer wait.Done() | ||
startingHeight, tipHeight, targetHeight, syncSpeedDesc := SyncStatus() | ||
t.Logf("BlockSync startingHeight: %d, tipHeight: %d, targetHeight: %d, %s", startingHeight, tipHeight, targetHeight, syncSpeedDesc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to verify sync status?
This reverts commit 575fcaa.
server/itx/nodestats/nodestats.go
Outdated
@@ -11,7 +11,7 @@ import ( | |||
|
|||
const ( | |||
// PeriodicReportInterval is the interval for generating periodic reports | |||
PeriodicReportInterval = 5 * time.Minute | |||
PeriodicReportInterval = 20 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you think 5 min report cycle is a bit long, 2 minutes is a proper choice, 20 sec is too often.
chainservice/builder.go
Outdated
@@ -500,7 +500,7 @@ func (builder *Builder) buildBlockSyncer() error { | |||
log.L().Debug("Failed to commit block.", zap.Error(err), zap.Uint64("height", blk.Height())) | |||
return err | |||
} | |||
log.L().Info("Successfully committed block.", zap.Uint64("height", blk.Height())) | |||
//log.L().Info("Successfully committed block.", zap.Uint64("height", blk.Height())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
@@ -220,7 +220,7 @@ func (d *IotxDispatcher) actionHandler() { | |||
case a := <-d.actionChan: | |||
d.handleActionMsg(a) | |||
case <-d.quit: | |||
log.L().Info("action handler is terminated.") | |||
//log.L().Info("action handler is terminated.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
blocksync/blocksync_v2.go
Outdated
"go.uber.org/zap" | ||
) | ||
|
||
type blockSyncerV2 struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the struct and funcs are mostly same as current code, we don't need a V2? can just change on current code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, added V2 is to compare it with the current code.
blocksync/blocksync_v2.go
Outdated
func (bs *blockSyncerV2) syncWork() { | ||
bs.sync() | ||
for range bs.trigger { | ||
time.Sleep(1 * time.Second) //limit the frequency of sync |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the current interval
in the config, so it can be easily adjusted
blocksync/blocksync.go
Outdated
} | ||
if bs.cfg.Interval != 0 { | ||
bs.syncTask = routine.NewRecurringTask(bs.sync, bs.cfg.Interval) | ||
bs.syncTask = routine.NewDelayTask(bs.syncWork, bs.cfg.Interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bs.syncTask = NewTriggerTask(bs.sync(), bs.trigger, bs.cfg.Interval)
blocksync/blocksync.go
Outdated
@@ -236,6 +247,9 @@ func (bs *blockSyncer) TargetHeight() uint64 { | |||
// Start starts a block syncer | |||
func (bs *blockSyncer) Start(ctx context.Context) error { | |||
log.L().Debug("Starting block syncer.") | |||
if err := bs.TurnOn(); err != nil { | |||
return err | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after L260
go time.AfterFunc(bs.cfg.Interval, func() {
bs.syncTask.Trigger()
})
@@ -22,6 +23,7 @@ type Config struct { | |||
// DefaultConfig is the default config | |||
var DefaultConfig = Config{ | |||
Interval: 30 * time.Second, | |||
RateLimitInterval: 1 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do some testing to see the sync speed? for 1 sec and 2 sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1sec: 29.6 blks/sec
2sec: 14.4 blks/sec
blocksync/blocksync.go
Outdated
@@ -291,9 +295,15 @@ func (bs *blockSyncer) ProcessBlock(ctx context.Context, peer string, blk *block | |||
bs.lastTip = syncedHeight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we should also use atomic.SwapUint64
to handle bs.lastTip
@@ -76,9 +76,8 @@ type ( | |||
|
|||
startingHeight uint64 // block number this node started to synchronise from | |||
lastTip uint64 | |||
lastTipUpdateTime time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep this for now
- so we have a record of the current code/algorithm
- maybe the new algorithm can utilize this info in the future to further improve
bs.mu.RLock() | ||
defer bs.mu.RUnlock() | ||
|
||
return bs.lastTipUpdateTime, bs.targetHeight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep it for now
blocksync/blocksync.go
Outdated
} | ||
|
||
func (bs *blockSyncer) checkSync(syncedHeight uint64) { | ||
requestMaxHeight := atomic.LoadUint64(&bs.lastRequestHeight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastReqHeight :=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think we can merge above 3 lines of code into here
if syncedHeight > bs.lastTip {
bs.lastTip = syncedHeight
bs.lastTipUpdateTime = time.Now()
}
lastTip (and updateTime) is related to decide if we should send sync request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also an earlier comment
think we should also use atomic.SwapUint64 to handle bs.lastTip
// we need to trigger a sync task | ||
if tipHeight == bs.syncStageHeight { | ||
bs.syncTask.Trigger() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just to report the status, let's not do the check and trigger sync here (which is the work of the new TriggerTask)
t.mu.Lock() | ||
defer t.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need a mutex? channel should be goroutine-safe
pkg/routine/triggertask_test.go
Outdated
for { | ||
select { | ||
case <-ctx.Done(): | ||
goto done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is generally recommended to use break
instead of goto
to break out of a two-level loop in go
pkg/routine/triggertask.go
Outdated
select { | ||
case t.ch <- struct{}{}: | ||
default: | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a common package, it would be better to return whether the trigger was successful or not
blocksync/blocksync.go
Outdated
go time.AfterFunc(bs.cfg.Interval, func() { | ||
bs.syncTask.Trigger() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not very reliable that ensuring peer ready by waiting a specific period. maybe could trigger it periodically
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Close b/c it has been inactive for 1+yr. |
Description
fix SyncStatus() datarace, the blocksync(v2) speed-up 4x
Fixes #3889
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Test Configuration:
Checklist: