Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: fix WAL replay in read-only mode #524

Merged
merged 1 commit into from
Feb 10, 2020
Merged

Conversation

petermattis
Copy link
Collaborator

@petermattis petermattis commented Feb 6, 2020

The addition of memTable.logSeqNum broke WAL replay in read-only mode
because we were initializing memTable.logSeqNum too early and setting
it to the visible sequence number rather than the first sequence number
in the WAL being replayed. This would result in an error during Open
that looked like:

   pebble: batch seqnum 2 is less than memtable creation seqnum 3

@petermattis
Copy link
Collaborator Author

This change is Reviewable

@petermattis
Copy link
Collaborator Author

The new TestOpenWALReplayReadOnly currently fails with:

    open_test.go:484: pebble: batch seqnum 2 is less than memtable creation seqnum 3

The problem is that we're creating the memtable in Open with the value of versionSet.logSeqNum which is larger than the seqnum in the log we're replaying. I initially started to fix this by reworking startup so that we wouldn't create the mutable memtable in read-only mode until we know the first seqnum in the log.

As I looked more, I think there is a more serious problem that could affect compatibility with RocksDB. versionSet.logAndApply sets VersionEdit.LastSeqNum = atomic.LoadUint64(&vs.logSeqNum). When we're rotating the memtable/WAL because the memtable is full, we've already incremented vs.logSeqNum. So we're setting VersionEdit.LastSeqNum to a value that is larger than the first batch that will be written to the new log. We're not strict about filtering out these batches during replay, but I think RocksDB might be.

I suspect what we have to do is to pass in the last unflushed seqnum to logAndApply. Conveniently we now have this available in memTable.logSeqNum.

@itsbilal and @sumeerbhola can you scrutinize this logic and see if I'm missing anything.

@petermattis
Copy link
Collaborator Author

I suspect what we have to do is to pass in the last unflushed seqnum to logAndApply. Conveniently we now have this available in memTable.logSeqNum.

This was remarkably easy to do.

@petermattis petermattis closed this Feb 6, 2020
@petermattis petermattis reopened this Feb 7, 2020
@petermattis
Copy link
Collaborator Author

Oops, accidentally clicked the close button some how.

This was remarkably easy to do.

I spoke too soon. This actually has a lot of negative effects. Surprisingly many. Which makes me think RocksDB is not doing this. I'm going to dive into the RocksDB code and figure out exactly how and when it sets LastSeqNum.

@petermattis petermattis changed the title [WIP] demonstrate problem with VersionEdit.LastSeqNum value db: fix WAL replay in read-only mode Feb 7, 2020
@petermattis
Copy link
Collaborator Author

Ok, this is all fixed up and ready for a look. I left a comment on where we assign VersionEdit.LastSeqNum explaining my learnings from the rabbit hole I went down and also that the current assignment is exactly what RocksDB does.

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 1 unresolved discussion (waiting on @itsbilal and @sumeerbhola)


db.go, line 1366 at r1 (raw file):

		if b != nil {
			logSeqNum = b.SeqNum()
			if b.flushable != nil {

This appears to have been a bug. I'm still trying to write a test which triggers it.

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @itsbilal and @sumeerbhola)


db.go, line 1366 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This appears to have been a bug. I'm still trying to write a test which triggers it.

Ok, test added. I believe these was innocuous. The effect was that memTable.logSeqNum would be smaller than the minimum sequence number added to the memTable. This would cause the memTable to prevent L0->L0 compactions in very rare circumstances, and cause the memTable to be accessed on reads unnecessarily in very rare circumstances.

The addition of `memTable.logSeqNum` broke WAL replay in read-only mode
because we were initializing `memTable.logSeqNum` too early and setting
it to the visible sequence number rather than the first sequence number
in the WAL being replayed. This would result in an error during `Open`
that looked like:

   pebble: batch seqnum 2 is less than memtable creation seqnum 3
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 5 files at r1, 4 of 4 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @sumeerbhola)

@petermattis
Copy link
Collaborator Author

TFTR!

@petermattis petermattis merged commit 7ad59b7 into master Feb 10, 2020
@petermattis petermattis deleted the pmattis/open-read-only branch February 10, 2020 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants