Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(internal/database): badger v3 and memory implementations #3005

Closed
wants to merge 25 commits into from

Conversation

qdm12
Copy link
Contributor

@qdm12 qdm12 commented Dec 13, 2022

Changes

ℹ️ This is only the standalone database implementation. I need the in-memory implementation for #2831 which in turn unblocks #2981 which will unblock the full replacement of chaindb. This doesn't affect the existing codebase for now.

  • internal/database shared (return) interfaces and sentinel errors
  • internal/database/memory
    • implementation
    • integration test
    • full unit test coverage
    • thread safety test
  • internal/database/badger - based on badger/v3
    • implementation
    • integration test
    • full unit test coverage
    • thread safety test
  • Add race tests to CI

Tests

go test github.com/ChainSafe/gossamer/internal/database/...

Issues

Related #1973 / will create an issue for replacing the database implementation

Primary Reviewer

@timwu20

internal/database/badger/writebatch.go Outdated Show resolved Hide resolved
Comment on lines +11 to +18
// Database is an in-memory database implementation.
type Database struct {
Copy link
Contributor Author

@qdm12 qdm12 Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the in-memory database implementation package is to force the internal/database code to be flexible for additional implementations.

On top of that, I would suggest us to use this one instead of the in-memory badger database, so the badger database is only on disk. Never mind stupid idea, we should test using the badger code (if badger is the default prod db) to make sure it works with the badger underlying code. Added an InMemory setting field for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the in-memory database implementation package is to force the internal/database code to be flexible for additional implementations.

Does that mean it is not getting used anywhere?

Copy link
Contributor Author

@qdm12 qdm12 Jan 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could if we add a config field for the database implementation later. Anyway for now both implementations are not used.

The memory implementation serves as a good example for another database implementation (as in it's a 'dumb' base just using Go std types).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tests can use the badger in memory version, do we really need this in memory implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep it as a base for another future implementation. For example badger uses the badger/v3 batch implementation, but some other implementations might need to implement their own batching, and it's already there in the (zero-dependency) memory implementation. I can create an issue to remove it later on once we have at least another implementation?

internal/database/memory/writebatch.go Show resolved Hide resolved
internal/database/badger/database.go Outdated Show resolved Hide resolved
internal/database/badger/database.go Outdated Show resolved Hide resolved
internal/database/badger/settings.go Show resolved Hide resolved
@qdm12 qdm12 force-pushed the qdm12/internal/database branch 2 times, most recently from d685c56 to f886b62 Compare December 15, 2022 17:29
Copy link
Contributor

@kishansagathiya kishansagathiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

Yet to look at test files.

// It defaults to the current directory if left unset.
Path string
// InMemory is whether to use an in-memory database.
InMemory *bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt look like it has to be a pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the point of this is to detect when it's left unset, since false is not 'unset'. That way we can set defaults as we want with a method.

More details on https://github.com/qdm12/gosettings#settings-struct I have been using this for about a year, and it has been quite scalable and nice to use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check if it is set or unset?

Copy link
Contributor Author

@qdm12 qdm12 Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to set defaults if it's left unset (and it extends beyond eventually, such as merging/overriding settings structs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is necessary. IMO InMemory should default to false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to set defaults if it's left unset (and it extends beyond eventually, such as merging/overriding settings structs).

I don't think I understand this completely.

it extends beyond eventually, such as merging/overriding settings structs

May be elaborate this part. Do we need to merge or override this setting at some point?

What I am really not able to understand is that why can't we treat false as unset.

Say someone assigned a false value to inMemory, why do we care if it was set or it is there because it is default value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am really not able to understand is that why can't we treat false as unset.

Because false or true are set values. The user can specify false.
Say we want to read from two configuration sources such as flags and toml, and we want flags to take precedence.
The flag specifies --inmemory=false and the toml config specifies inmemory = true. If we consider false as unset, then we would end up with a final value of true whereas it should be false due to precedence.
Same applies with merging/overriding settings.

It's perhaps out of scope and not used yet, but it's good code writing imo and there is no point neglecting this.

Copy link
Contributor

@kishansagathiya kishansagathiya Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider false as unset, then we would end up with a final value of true whereas it should be false due to precedence

If I want flag to take precedence, I would just take flag value and assign it to inMemory. It doesn't matter if it is set or not. So, if the flag says false, the value of inMemory will become false.

The way we deal with such setting is,

  • initialise with default settings with, say Default().
  • Override with say toml config (we don't need to know existing values in settings here)
  • Override with say flag (we don't need existing values here as well)

At no point we need to know existing settings values.

I don't think we need inMemory to be pointer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we would want to support detecting an unset value for this attribute. I see no reason why the default value for InMemory should not be false.

Copy link
Contributor Author

@qdm12 qdm12 Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read #3005 (comment) again. Empty value should be invalid (and for a boolean, false and true are both 'set values').

If I want flag to take precedence, I would just take flag value and assign it to inMemory. It doesn't matter if it is set or not. So, if the flag says false, the value of inMemory will become false.

Say the flag is --inmemory=false, then your field is set to false. Then your next source (i.e. toml) sets inmemory = true. How does it know if it should override this false with its true?? You can have ugly logic for every field to do this, but it's much more modular to just use a boolean pointer and if it's nil, then just set it to the value read from the settings source, otherwise leave it to its non-nil set value

EDIT: if it's such a blocker for this PR, I can switch it to a boolean but it's kinda the wrong approach for new code imo

internal/database/interfaces.go Outdated Show resolved Hide resolved
Comment on lines +11 to +18
// Database is an in-memory database implementation.
type Database struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the in-memory database implementation package is to force the internal/database code to be flexible for additional implementations.

Does that mean it is not getting used anywhere?

}

// Close closes the database.
func (db *Database) Close() (err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this getting used anywhere other than tests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might make sense to use mutex here to make sure that we don't close it while someone is using the db.

Copy link
Contributor Author

@qdm12 qdm12 Jan 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this getting used anywhere other than tests?

Both implementations are not used anywhere in this PR. The badger implementation will be used in production code and in tests in the next PR. The in memory implementation could be used in tests, but really we should test using the in-memory badger implementation. So at the end of the day, that memory implementation is really just a base to serve for another implementation (zero dependency, just Go code with what should be expected in terms of implementation)

Also, it might make sense to use mutex here to make sure that we don't close it while someone is using the db.

Totally, added it 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fully replaced in #3088 now 😉 (based on this PR)

internal/database/memory/table.go Show resolved Hide resolved
internal/database/memory/writebatch.go Show resolved Hide resolved
Copy link
Contributor

@kishansagathiya kishansagathiya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the code looks good.

  • Considering we have two database implementations, we should have the same test testing for both the implementations.
  • It not nice that none of these functions are not getting used in this PR. Ideally, it will be nice if we have a db interface that is getting used. Then, In the same PR we can implement a db and plug it will rest of the codebase to use it to test if things are working fine. Let me know if that can't be done at the moment.

internal/database/badger/database_test.go Outdated Show resolved Hide resolved
internal/database/badger/database_test.go Outdated Show resolved Hide resolved
internal/database/badger/database_test.go Outdated Show resolved Hide resolved
internal/database/badger/database_test.go Outdated Show resolved Hide resolved
internal/database/badger/database_test.go Outdated Show resolved Hide resolved
internal/database/badger/race_test.go Show resolved Hide resolved
// It defaults to the current directory if left unset.
Path string
// InMemory is whether to use an in-memory database.
InMemory *bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check if it is set or unset?

@qdm12
Copy link
Contributor Author

qdm12 commented Jan 4, 2023

It not nice that none of these functions are not getting used in this PR. Ideally, it will be nice if we have a db interface that is getting used. Then, In the same PR we can implement a db and plug it will rest of the codebase to use it to test if things are working fine. Let me know if that can't be done at the moment.

Yeah I get your point; using this newer interface/packages is blocked by #2831 which I'm working on now. Once this is merged, I'll rebase this branch on it and make a separate PR based on this one, which we can merge into this PR. Trying to keep this PR and the future interface-replacement PR separate to the amount of deltas in each (for reviewers), but we should merge it as a single commit in development indeed 👍

@qdm12 qdm12 force-pushed the qdm12/internal/database branch 2 times, most recently from 50fad42 to 3ea9abf Compare January 9, 2023 15:57
internal/database/badger/database.go Show resolved Hide resolved
internal/database/badger/database.go Outdated Show resolved Hide resolved
internal/database/badger/database.go Show resolved Hide resolved
internal/database/badger/helpers.go Outdated Show resolved Hide resolved
// It defaults to the current directory if left unset.
Path string
// InMemory is whether to use an in-memory database.
InMemory *bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is necessary. IMO InMemory should default to false.

internal/database/badger/writebatch.go Show resolved Hide resolved
Comment on lines +11 to +18
// Database is an in-memory database implementation.
type Database struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tests can use the badger in memory version, do we really need this in memory implementation?

@timwu20
Copy link
Contributor

timwu20 commented Jan 17, 2023

I think we should wait to merge this into development. We should probably do some testing to ensure things are working as expected before merging in. Can you create a release branch that will contain this PR and the PR that utilises this code?

@qdm12 qdm12 force-pushed the qdm12/internal/database branch 2 times, most recently from 0f73b2d to be520e3 Compare January 27, 2023 14:13
@qdm12 qdm12 changed the base branch from development to qdm12/dep-inject-db January 27, 2023 14:13
@qdm12 qdm12 force-pushed the qdm12/internal/database branch 2 times, most recently from 1b91d82 to 5875dc3 Compare March 11, 2023 11:34
qdm12 and others added 25 commits March 30, 2023 17:34
- Path cannot be the empty string, defaults to `.`
- Deleting a non-existing key returns nil from badger
- Remove `newTable` unneeded constructor
- Add `makePrefixedKey` function, `append` is WRONG to use
Co-authored-by: Kishan Sagathiya <kishansagathiya@gmail.com>
- Add settings helper method `WithPath`
- Add settings helper method `WithInMemory`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants