Skip to content

Commit

Permalink
Merge #33573 #33582
Browse files Browse the repository at this point in the history
33573: roachtest: add MinVersion to scrub tests r=lucy-zhang a=lucy-zhang

The `scrub` tests have been passing on master, but failing on 2.1 because there
was a bug (fixed in #32908).

Release note: None

33582: storage: bump RaftDelaySplitToSuppressSnapshotTicks r=knz a=knz

The admin split path must accommodate a scenario where a range with
not-yet-replicated followers is being manually split multiple
times (eg. IMPORT during TPCC test fixtures). This scenario results in
a bunch of replicas that all need to be populated with
snapshots. To avoid backing up the raft snapshot queue, a heuristic
was put in place (#32594) to delay the admin split if there is another
snapshot being applied already.

As shown by investigation in a failing test, there is a mismatch
between the configured max delay for this heuristic (20s) and the
actual duration of the snapshot application - the latter is limited by
the max bandwidth for snapshots, 2 MB/s resulting in 32s applications
in the worst case. We (Tobias and I) suspect that the heuristic thus
fails to wait long enough to have the protective effect it was
designed to provide.

The current patch increases this delay to exceed this snapshot
application duration estimate to about 50s.

Note that this scenario is not likely to occur now that #32782 has
been merged (this reduces the need for raft snapshots during splits);
however in release-2.1 where that patch was not applied, the scenario
is more likely.

Release note (bug fix): resolve a cluster degradation scenario that
could occur during IMPORT/RESTORE operations, manifested through a
high number of pending Raft snapshots.

Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
  • Loading branch information
3 people committed Jan 9, 2019
3 parents 4988177 + 6a295c8 + 3ca51df commit 3075109
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 3 deletions.
9 changes: 6 additions & 3 deletions pkg/base/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -538,9 +538,12 @@ func (cfg *RaftConfig) SetDefaults() {
// ticks. Add a generous amount of ticks to make sure even a backed up
// Raft snapshot queue is going to make progress when a (not overly
// concurrent) amount of splits happens.
// The resulting delay configured here is north of 20s by default, which
// experimentally has shown to be enough.
cfg.RaftDelaySplitToSuppressSnapshotTicks = 3*cfg.RaftElectionTimeoutTicks + 60
// The generous amount should result in a delay sufficient to
// transmit at least one snapshot with the slow delay, which
// with default settings is max 64MB at 2MB/s, ie 32 seconds.
//
// The resulting delay configured here is about 50s.
cfg.RaftDelaySplitToSuppressSnapshotTicks = 3*cfg.RaftElectionTimeoutTicks + 200
}
}

Expand Down
1 change: 1 addition & 0 deletions pkg/cmd/roachtest/scrub.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,5 +88,6 @@ func makeScrubTPCCTest(
Duration: length,
})
},
MinVersion: "v2.2.0",
}
}

0 comments on commit 3075109

Please sign in to comment.