[3.4] fix revision loss issue caused by compaction - 17780 #17864

fuweid · 2024-04-24T04:03:56Z

Backport:

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

NOTE:

I didn't see the log in main branch. It seems that etcd replays the WAL after restart.

Run the command EXPECT_DEBUG=true CPU=4 FAILPOINTS='true' make test-e2e GO_TEST_FLAGS="-run TestReproduce17780 -count=1 -v" and will see that following log.

// restored compaction 
../../bin/etcd-413597: 2024-04-24 11:27:46.450662 I | mvcc: resume scheduled compaction at 11

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

Signed-off-by: Wei Fu <fuweid89@gmail.com>

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 7173391) Signed-off-by: Wei Fu <fuweid89@gmail.com>

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 9ea2349) Signed-off-by: Wei Fu <fuweid89@gmail.com>

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid · 2024-04-24T04:09:23Z

tests/e2e/reproduce_17780_test.go

+	// NOTE: The proc panics and exit code is 2. It's impossible to restart
+	// that etcd proc because last exit code is 2 and Restart() refuses to
+	// start new one. Using IsRunning() function is to cleanup status.
+	require.False(t, clus.procs[targetIdx].IsRunning())


Side note:

// The proc panics and exit code is 2. It's impossible to restart // that etcd proc because last exit code is 2 and Restart() refuses to // start new one. Using IsRunning() function is to cleanup status.

fuweid · 2024-04-24T07:11:29Z

Does the test pass without the fix?

No, it doesn't. Still checking why main/3.5 branch don't show the error.

ahrtr

LGTM

Thanks

ahrtr · 2024-04-24T08:51:46Z

cc @serathius @spzala

fuweid · 2024-04-24T10:45:08Z

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

Hi @ahrtr @serathius

For the log, 3.4 release only saves consistentIndex into db when there is transcation with changes and snapshot.

etcd/mvcc/kvstore_txn.go

Lines 103 to 108 in 48b0c49

    
           // only update index if the txn modifies the mvcc state. 
        
           if len(tw.changes) != 0 { 
        
           	tw.s.saveIndex(tw.tx) 
        
           	// hold revMu lock to prevent new read txns from opening until writeback. 
        
           	tw.s.revMu.Lock() 
        
           	tw.s.currentRev++

etcd/etcdserver/server.go

Lines 2469 to 2477 in 48b0c49

    
           func (s *EtcdServer) snapshot(snapi uint64, confState raftpb.ConfState) { 
        
           	clone := s.v2store.Clone() 
        
           	// commit kv to write metadata (for example: consistent index) to disk. 
        
           	// KV().commit() updates the consistent index in backend. 
        
           	// All operations that update consistent index must be called sequentially 
        
           	// from applyAll function. 
        
           	// So KV().Commit() cannot run in parallel with apply. It has to be called outside 
        
           	// the go routine created below. 
        
           	s.KV().Commit()

That test doesn't trigger snapshot so that compaction change doesn't save consistent index.
So, after restart, the index of entry from WAL is higher than backend one and then etcd server will apply the entry again.
So we see this log in v3.4 release.

However, for the main or 3.5 release, we save index in PreCommit hook. So ForceCommit will save index.
Before panic, the server has force commited that index. So we don't see that log.

etcd/server/storage/hooks.go

Lines 44 to 52 in a2911b4

    
           func (bh *BackendHooks) OnPreCommitUnsafe(tx backend.UnsafeReadWriter) { 
        
           	bh.indexer.UnsafeSave(tx) 
        
           	bh.confStateLock.Lock() 
        
           	defer bh.confStateLock.Unlock() 
        
           	if bh.confStateDirty { 
        
           		schema.MustUnsafeSaveConfStateToBackend(bh.lg, tx, &bh.confState) 
        
           		// save bh.confState 
        
           		bh.confStateDirty = false 
        
           	}

etcd/server/etcdserver/server.go

Lines 312 to 321 in a7a8fb8

    
           func (bh *backendHooks) OnPreCommitUnsafe(tx backend.BatchTx) { 
        
           	bh.indexer.UnsafeSave(tx) 
        
           	bh.confStateLock.Lock() 
        
           	defer bh.confStateLock.Unlock() 
        
           	if bh.confStateDirty { 
        
           		membership.MustUnsafeSaveConfStateToBackend(bh.lg, tx, &bh.confState) 
        
           		// save bh.confState 
        
           		bh.confStateDirty = false 
        
           	} 
        
           }

It looks safe to merge this.

ahrtr · 2024-04-24T12:41:13Z

/hold

ahrtr · 2024-04-24T13:26:03Z

Run the command EXPECT_DEBUG=true CPU=4 FAILPOINTS='true' make test-e2e GO_TEST_FLAGS="-run TestReproduce17780 -count=1 -v" and will see that following log.

// restored compaction 
../../bin/etcd-413597: 2024-04-24 11:27:46.450662 I | mvcc: resume scheduled compaction at 11

// replay WAL
../../bin/etcd-413597: 2024-04-24 11:27:46.500414 W | etcdserver: failed to apply request "compaction:<revision:11 physical:true > header:<ID:5939560774486836239 > " with response "" took (11.026µs) to execute, err is mvcc: required revision has been compacted

Analysis

Somehow I missed this info. Confirmed that basically your above comment/analysis is valid.

To be clearer, the root cause of the error message mvcc: required revision has been compacted is that the compaction against the same revision (11 in this case) were executed twice.

The first execution was performed by etcdserver itself on bootstrap. It also updated the s.compactMainRev.

etcd/mvcc/kvstore.go

Line 262 in 48b0c49

s.compactMainRev = rev

The second execution was driven by the WAL record replaying (because the latest consistentIndex wasn't persisted before crashing). Since the s.compactMainRev has already been updated in the first compaction, so this time the condition rev <= s.compactMainRev is true because both values is 11, so it returned ErrCompacted.

etcd/mvcc/kvstore.go

Lines 250 to 256 in 48b0c49

    
           if rev <= s.compactMainRev { 
        
           	ch := make(chan struct{}) 
        
           	f := func(ctx context.Context) { s.compactBarrier(ctx, ch) } 
        
           	s.fifoSched.Schedule(f) 
        
           	s.revMu.Unlock() 
        
           	return ch, ErrCompacted 
        
           }

Solution

Ideally, we shouldn't perform compaction on bootstrap. The compaction is always coming from raft/applying, so in this case it makes more sense only to trigger the compaction via the WAL replaying logic.

But since it's harmless to trigger the compaction twice, so let's keep it as it's for now. I think it's safe to let this PR in for now.

We don't want any surprise before we get rid of the OnPreCommitUnsafe and LockInsideApply/LockOutsideApply on the main branch, and before it's well understood by more contributors.

ahrtr · 2024-04-24T13:37:02Z

/hold cancel

spzala

Thanks @fuweid @ahrtr

fuweid added 5 commits April 24, 2024 10:30

mvcc: introduce compactBeforeSetFinishedCompact failpoint

dcda47d

Signed-off-by: Wei Fu <fuweid89@gmail.com>

tests/e2e: support CompactionBatchLimit flag

844c4b0

Signed-off-by: Wei Fu <fuweid89@gmail.com>

tests/e2e: reproduce etcd-io#17780

a9727e6

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 7173391) Signed-off-by: Wei Fu <fuweid89@gmail.com>

mvcc: update currentRev if scheduledCompact > currentRev

41eb03a

Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 9ea2349) Signed-off-by: Wei Fu <fuweid89@gmail.com>

mvcc: should update currentRev in revMu

4cb197e

Signed-off-by: Wei Fu <fuweid89@gmail.com>

fuweid commented Apr 24, 2024

View reviewed changes

fuweid mentioned this pull request Apr 24, 2024

Revision decreasing after panic during compaction #17780

Open

6 tasks

ahrtr approved these changes Apr 24, 2024

View reviewed changes

spzala approved these changes Apr 24, 2024

View reviewed changes

ahrtr merged commit 1d02c16 into etcd-io:release-3.4 Apr 25, 2024
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.4] fix revision loss issue caused by compaction - 17780 #17864

[3.4] fix revision loss issue caused by compaction - 17780 #17864

fuweid commented Apr 24, 2024 •

edited

Loading

fuweid Apr 24, 2024

fuweid commented Apr 24, 2024 •

edited

Loading

ahrtr left a comment

ahrtr commented Apr 24, 2024

fuweid commented Apr 24, 2024

ahrtr commented Apr 24, 2024

ahrtr commented Apr 24, 2024 •

edited

Loading

ahrtr commented Apr 24, 2024

spzala left a comment

[3.4] fix revision loss issue caused by compaction - 17780 #17864

[3.4] fix revision loss issue caused by compaction - 17780 #17864

Conversation

fuweid commented Apr 24, 2024 • edited Loading

fuweid Apr 24, 2024

Choose a reason for hiding this comment

fuweid commented Apr 24, 2024 • edited Loading

ahrtr left a comment

Choose a reason for hiding this comment

ahrtr commented Apr 24, 2024

fuweid commented Apr 24, 2024

ahrtr commented Apr 24, 2024

ahrtr commented Apr 24, 2024 • edited Loading

Analysis

Solution

ahrtr commented Apr 24, 2024

spzala left a comment

Choose a reason for hiding this comment

fuweid commented Apr 24, 2024 •

edited

Loading

fuweid commented Apr 24, 2024 •

edited

Loading

ahrtr commented Apr 24, 2024 •

edited

Loading