p2p/discover: persistent node database #793

karalabe · 2015-04-23T16:33:35Z

This PR introduces a seed cache database containing all the nodes that passed the discovery ping-pong procedure. Whenever ethereum starts up and there are no known nodes, the first 10 seeds are retrieved (and deleted) from the cache; and are used beside the bootstrap servers for connecting to the network.

The reason for the immediate deletion of the seed nodes is self cleanup: all seeds are evacuated when probing, but live ones get added back after the ping-pong, resulting in stale data gradually disappearing.

It might make sense to put an additional upper bound on the total number of peers we'd like to cache and always drop the oldest ones, but I'd vote to see how this mechanism behaves and polish it afterwards.

@fjl Please check if this is what you had in mind :)

…ones

…to eth.

fjl · 2015-04-24T09:10:21Z

This is almost what I had in mind, but not quite. I didn't explain it well enough before you started.

The purpose of discover.nodeDB is tracking nodes which have been bonded with (we should put this
into the type's comment). findnode requests from unknown nodes are discarded.

Calling it Cache or SeedXXX would indicate that its use is purely seeding the table. These names would be a good choice if it were so. But it isn't. Using it for seeding is only a nice side effect.

My idea going forward is as follows:

The node DB needs to have an efficient way of updating a timestamp of the last ping received and ping sent for a node. The mechanism should be geared towards allowing other metadata to be stored later.
Inactive nodes should expire after a reasonable amount of time. I think 24h would be reasonable.
This should happen in some kind of background loop.
Note that there is no need to be very exact about when a node expires. It does not matter
if nodes are still in the database even 30 minutes after they should've been expired.
When bootstrapping from a non-empty DB, we should attempt to insert the most recent nodes into the table before expiring items. The scenario to keep in mind is that a client might not be started for more than 24h. In this case, we should still try to insert some of the nodes because they might be up.
The node DB is not supposed to be part of the public API of p2p/discover. The long term plan is to also store reputation values and RLPx session resumption tokens in this database. When we start doing that, the DB will move to its own package. I'd like to keep it unexported for now.

I suggest that the implementation of the DB should store metadata items as separate keys prefixed with the node ID. Example: for a single node with ID AAAA, LevelDB would contain these items:

key	value
`version`	database version
`n:AAAA:discover`	RLP encoded `discover.Node`
`n:AAAA:discover:lastping`	UNIX timestamp of last ping sent
`n:AAAA:discover:lastpong`	UNIX timestamp of last pong sent
`n:AAAA:reputation`	...

With this scheme, deleting a node simply means deleting everything with the ID as prefix.
Updating is fast because only the key that is updated needs to change. Scanning for expired nodes
is slow but we don't care about that.

API-wise, it could look like this:

func (*nodeDB) node(NodeID) *Node
func (*nodeDB) updateNode(*Node)
func (*nodeDB) lastPing(NodeID) time.Time
func (*nodeDB) updateLastPing(NodeID, time.Time) 
...
func (*nodeDB) expire(from time.Time)

fjl · 2015-04-24T11:29:45Z

I updated the comment above with stricter name-spacing of keys.

karalabe · 2015-04-24T11:44:28Z

The database structure and details are imho perfectly reasonable and fine.

One issue I'm seeing with the API proposal however are circular dependencies. The moment you move this nodedb out into it's own package, that will depend on a lot of stuff from discover, but discover itself will depend on nodedb for the seeding.

Edit: One potential solution I can imagine is to have the based nodedb database for storing, querying and expiring items according to some schema, and then have client packages (e.g. discover provide a public adapter to it to fetch data specifically generated by itself). However, I don't fully see the dependency implications yet of such a solution.

fjl · 2015-04-24T11:51:36Z

We will address that when we get there. My guess is that Node and NodeID would move into the new package. I am not really planning to build a generic database for everyone's node metadata. Maybe the new package will be p2p/internal/nodedb.

karalabe · 2015-04-24T15:10:13Z

I've updated the design to use the fancier db layout/schema. However, entry expiration is not yet done, neither have I spend time to even marginally test it beside passing the system tests. If you have time @fjl , take a glace to make sure it's going in the right direction.

PS: Since leveldb doesn't have any querying mechanism other than iterating over the entire database, the current seed query is very sub-optimal. Ideas?

Edit: I have to run, so I won't have time until Monday to finish up this new version.

fjl · 2015-04-24T20:29:02Z

p2p/discover/database.go

+		return time.Time{}
+	}
+	var unix int64
+	if err := rlp.DecodeBytes(blob, &unix); err != nil {


Integer values don't need to use RLP. Note also that package rlp will refuse to encode or decode int64. Let's use binary.BigEndian.Uint64 or go fancy and use binary.Varint (as for the version number).

Maybe we should forgo having {fetch,store}Time and only have {fetch,store}Int64 instead. It's
easy to do time.Unix(db.fetchInt64(key(...)), 0) and db.storeInt64(key(...), t.Unix())

fjl · 2015-04-24T20:41:39Z

This looks good. 🎉

Since leveldb doesn't have any querying mechanism other than iterating over the entire database, the current seed query is very sub-optimal. Ideas?

It doesn't matter how efficient the query is. If it becomes a problem, we can track the most recent nodes by maintaining an index (with a different key prefix). We could also roll the seed query into the initial expiration because it needs to scan the database anyway.

Discovery startup can take up to 2 seconds because it waits for package nat to figure out the external IP address. We can run the query concurrently at that time. I doubt it'll take more than a second to scan all nodes.

obscuren · 2015-04-24T21:33:05Z

leveldb supports prefixes and range sets if you need them

fjl · 2015-04-24T21:37:45Z

p2p/discover/database.go

+	field = string(item[len(id):])
+
+	return id, field
+}


These two don't need to be methods. They can be plain functions.

Dunno why GitHub doesn't close this diff, it's been updated.

karalabe · 2015-04-27T14:49:09Z

@obscuren @fjl Hey all, I think this PR's mostly done, so we could do another round of reviews on it.

obscuren · 2015-04-27T22:14:47Z

👍

karalabe · 2015-04-28T07:10:38Z

Ah, good catch with the lockup. Didn't know about the blocking behavior. Will update in 3 mins.

karalabe · 2015-04-28T07:34:45Z

PTAL

p2p/discover: persistent node database

…port EIP-4337 Bundled Transactions (ethereum#945) * added new api to support conditional transactions (EIP-4337) (ethereum#700) * Refactored the code and updated the miner to check for the validity of options (ethereum#793) * refactored the code and updated the miner to check for the validity of options * added new errors -32003 and -32005 * added unit tests * addressed comments * Aa 4337 update generics (ethereum#799) * poc * minor bug fix * use common.Hash * updated UnmarshalJSON function (reference - tynes) * fix * done * linters * with test * undo some unintentional changes --------- Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> * handelling the block range and timestamp range, also made timestamp a pointer --------- Co-authored-by: Evgeny Danilenko <6655321@bk.ru> * Added filtering of conditional transactions in txpool (ethereum#920) * added filtering of conditional transactions in txpool * minor fix in ValidateKnownAccounts * bug fix * Supporting nil knownAccounts * lints * bundled transactions are not announced/broadcasted to the peers * fixed after upstream merge * few fixes * sentry reject conditional transaction * Changed the namespace of conditional transaction API from `eth` to `bor` (ethereum#985) * added conditional transaction to bor namespace * test comit * test comit * added conditional transaction * namespapce changed to bor * cleanup * cleanup * addressed comments * reverted changes in ValidateKnownAccounts * addressed comments and removed unwanted code * addressed comments * bug fix * lint * removed licence from core/types/transaction_conditional_test.go --------- Co-authored-by: Evgeny Danilenko <6655321@bk.ru>

obscuren added in progress labels Apr 23, 2015

fjl and others added 5 commits April 24, 2015 11:23

p2p/discover: store nodes in leveldb

936c8e1

cmd, eth, p2p, p2p/discover: init and clean up the seed cache

5f735d6

p2p/discovery: use the seed table for finding nodes, auto drop stale …

af923c9

…ones

p2p/discovery: fix broken tests due to API update

971702e

cmd/bootnode, eth, p2p, p2p/discover: clean up the seeder and mesh in…

6def110

…to eth.

karalabe force-pushed the discovery-node-cache branch from cbf18b1 to 6def110 Compare April 24, 2015 08:34

fjl mentioned this pull request Apr 24, 2015

PoC9 changes to P2P #240

Closed

14 tasks

fjl changed the title ~~Discovery node cache~~ p2p/discover: persistent node database Apr 24, 2015

fjl assigned karalabe Apr 24, 2015

cmd/bootnode, eth, p2p, p2p/discover: use a fancier db design

8646365

obscuren modified the milestone: Frontier Apr 24, 2015

fjl reviewed Apr 24, 2015
View reviewed changes

karalabe added 4 commits April 27, 2015 10:19

p2p/discovery: fix issues raised in the nodeDb PR

0201c04

p2p/discover: write the basic tests, catch RLP bug

8de8f61

p2p/discover: use iterator based seeding, drop old protocol test

85b4b44

p2p/discover: wrap the pinger to update the node db too

706da56

karalabe added 3 commits April 27, 2015 15:06

p2p/discover: drop a superfluous warning

75fd738

p2p/discover: parametrize nodedb version, add persistency tests

a136e2b

p2p/discover: add node expirer and related tests

437cf4b

karalabe removed the in progress label Apr 27, 2015

p2p/discover: fix goroutine leak due to blocking on sync.Once

4992765

obscuren added a commit that referenced this pull request Apr 28, 2015

Merge pull request #793 from karalabe/discovery-node-cache

91cb8cd

p2p/discover: persistent node database

obscuren merged commit 91cb8cd into ethereum:develop Apr 28, 2015

fjl mentioned this pull request Apr 30, 2015

p2p/discover: node database followup issues #838

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

p2p/discover: persistent node database #793

p2p/discover: persistent node database #793

karalabe commented Apr 23, 2015

fjl commented Apr 24, 2015

fjl commented Apr 24, 2015

karalabe commented Apr 24, 2015

fjl commented Apr 24, 2015

karalabe commented Apr 24, 2015

fjl Apr 24, 2015

fjl Apr 24, 2015

fjl commented Apr 24, 2015

obscuren commented Apr 24, 2015

fjl Apr 24, 2015

karalabe Apr 27, 2015

karalabe commented Apr 27, 2015

obscuren commented Apr 27, 2015

karalabe commented Apr 28, 2015

karalabe commented Apr 28, 2015

p2p/discover: persistent node database #793

p2p/discover: persistent node database #793

Conversation

karalabe commented Apr 23, 2015

fjl commented Apr 24, 2015

fjl commented Apr 24, 2015

karalabe commented Apr 24, 2015

fjl commented Apr 24, 2015

karalabe commented Apr 24, 2015

fjl Apr 24, 2015

Choose a reason for hiding this comment

fjl Apr 24, 2015

Choose a reason for hiding this comment

fjl commented Apr 24, 2015

obscuren commented Apr 24, 2015

fjl Apr 24, 2015

Choose a reason for hiding this comment

karalabe Apr 27, 2015

Choose a reason for hiding this comment

karalabe commented Apr 27, 2015

obscuren commented Apr 27, 2015

karalabe commented Apr 28, 2015

karalabe commented Apr 28, 2015