Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate Queries Correctly #290

Closed
raulk opened this issue Mar 8, 2019 · 6 comments
Closed

Terminate Queries Correctly #290

raulk opened this issue Mar 8, 2019 · 6 comments
Assignees
Labels

Comments

@raulk
Copy link
Member

raulk commented Mar 8, 2019

Related to #287.

GetClosestPeers() will return in either of two conditions:

  1. We have no more peers to query.
  2. Our context times out.

IIUC:

We traverse the network by creating a heap of peers ordered by the XOR distance function to the key, and sending the query in ascending order of distance (i.e. closest to furthest).

In all cases we are looking for at most KValue peers, but the algorithm does not take this into account. We should stop after we've encountered a slice of KValue peers such that all remaining unqueried ones are worse than the set we already have (i.e. furthest away from the key).

What's happening now is that we zero in to the region of the DHT thanks to the heap, and then zoom back out because we have no stop condition, only to later select the peers we initially zeroed in on.

CC @Stebalien

Edit (@jacobheun):

Design notes

According to Kademlia, a query terminates when it finds the K (20) closest peers to the target key. Specifically:

  • Setup: Pick the K closest peers to the target key.
  • Recursive step:
    • Select the alpha closest known peers.
    • Query these peers for K nodes each.
    • Repeat until we get no closer peers, i.e. the XOR distances of new peers we find are identical or lower to the closest distance we’ve seen so far.
  • Final step:
    • Query the K closest peers.

However, there are a few tricks/options (output of many discussions with Adin):

  • The final step may reveal closer peers. Instead of splitting the recursive/final step, we need to combine them:
    • Select the alpha closest known peers.
    • Query these peers.
    • If we get no new information, query the K closest peers.
    • If we get new information, repeat.
  • The naive purely recursive approach relies on waiting for all alpha peers to either respond or timeout on each round. Ideally, we wouldn't:
    • If we get an answer to a single query, we should immediately use that information. The "most correct" approach would be to query one peer as one query "worker" is now available. However, it might make sense to immediately query alpha (we should test this).
    • If a query doesn't return within the expected time period (assuming ~200ms RTT), we should experiment with initiating a new query (without canceling that one). This means we wouldn't have to wait the entire timeout.
    • When moving on, we should cancel queries that are no longer useful. For example, if we have successfully queried alpha closer peers, we should consider aborting the query.

The "dial queue": For the moment, we will be removing the dial queue to speed up experimentation. Given the current WIP implementation and all the suggested improvements above, we'd expect fewer parallel dials so the dial queue may not be necessary. Once we've taken concrete measurements and compared the current DHT with the experimental patches, we can determine if we need to put it back in.

Testing mechanics

  • Test parameter: i (iterations). Call GetClosestPeers as many times as i, recording the following:
    • Number of dials generated.
    • Number of queries made.
    • Per query, record the XOR distance of the peer we queried.
    • Wall clock time to find the peer.
  • We expect:
    • The number of dials to decrease substantially.
    • The number of queries to decrease substantially.
    • The wall clock time to decrease substantially.
    • Average number of peers at each XOR distance step to be lower.
    • Query convergence to be faster.

Success Criteria

  • Queries terminate after log(n) steps
@Stebalien
Copy link
Member

Interesting... we should apply the same logic to timing out on IPNS queries but that's less of an issue. I'm a bit worried that we'll end up going down a bunch of dead-end paths due to how many undialable nodes we have in the DHT but maybe that's not an issue. It's something we can test pretty easily.

@raulk raulk changed the title Return from GetClosestPeers() determinastically Make GetClosestPeers() return more intelligently Mar 8, 2019
@Stebalien
Copy link
Member

Actually, we shouldn't be using the KValue, we should be using AlphaValue. That is, we should never query a peer further than the AlphaValue furthest peer we've successfully queried. Once we've run out of nodes to query, we need to widen our search to the K best nodes seen.

(I'm getting close to a working implementation, I'll post a patch when I'm done)

Stebalien added a commit that referenced this issue Mar 8, 2019
This is actually still incorrect. We _should_ be limiting our query to
AlphaValue peers and then expanding to KValue peers once we run out of peers to
query. However, this is still much better and we can do that in a followup
commit.

Considerations: We may not want to merge this until we get the multipath lookup
patch. It turns out, our current DHT effectively explores _all_ paths.

fixes #290
@ghost ghost assigned Stebalien Mar 8, 2019
@ghost ghost added the status/in-progress In progress label Mar 8, 2019
Stebalien added a commit that referenced this issue Mar 12, 2019
This is actually still incorrect. We _should_ be limiting our query to
AlphaValue peers and then expanding to KValue peers once we run out of peers to
query. However, this is still much better and we can do that in a followup
commit.

Considerations: We may not want to merge this until we get the multipath lookup
patch. It turns out, our current DHT effectively explores _all_ paths.

fixes #290
@dirkmc
Copy link
Contributor

dirkmc commented Apr 9, 2019

@Stebalien I'm looking into making this same change in the js-land DHT. Could you explain your last comment a little more - I'm not sure I understand the difference from what @raulk is proposing?

@Stebalien
Copy link
Member

Basically, I'm just proposing that we actually implement kademlia. Don't try understanding the go DHT, just read that paper.

@dirkmc
Copy link
Contributor

dirkmc commented Apr 9, 2019

For posterity I will quote the relevant section of the Kademlia paper, which is a little confusing but I think it's basically saying the same thing as you described above:

Kademlia employs a recursive algorithm for node lookups. The lookup initiator starts by picking α nodes from its closest non-empty k-bucket (or, if that bucket has fewer than α entries, it just takes the α closest nodes it knows of). The initiator then sends parallel, asynchronous FIND NODE RPCs to the α nodes it has chosen. α is a system-wide concurrency parameter, such as 3. In the recursive step, the initiator resends the FIND NODE to nodes it has learned about from previous RPCs. (This recursion can begin before all α of the previous RPCs have returned). Of the k nodes the initiator has heard of closest to the target, it picks α that it has not yet queried and resends the FIND NODE RPC to them. Nodes that fail to respond quickly are removed from consideration until and unless they do respond. If a round of FIND NODEs fails to return a node any closer than the closest already seen, the initiator resends the FIND NODE to all of the k closest nodes it has not already queried. The lookup terminates when the initiator has queried and gotten responses from the k closest nodes it has seen.

@raulk raulk changed the title Make GetClosestPeers() return more intelligently Make GetClosestPeers() return more intelligently (cutoff/termination criteria) Apr 11, 2019
Stebalien added a commit that referenced this issue May 2, 2019
This is actually still incorrect. We _should_ be limiting our query to
AlphaValue peers and then expanding to KValue peers once we run out of peers to
query. However, this is still much better and we can do that in a followup
commit.

Considerations: We may not want to merge this until we get the multipath lookup
patch. It turns out, our current DHT effectively explores _all_ paths.

fixes #290
Stebalien added a commit that referenced this issue Jun 1, 2019
This is actually still incorrect. We _should_ be limiting our query to
AlphaValue peers and then expanding to KValue peers once we run out of peers to
query. However, this is still much better and we can do that in a followup
commit.

Considerations: We may not want to merge this until we get the multipath lookup
patch. It turns out, our current DHT effectively explores _all_ paths.

fixes #290
@jacobheun jacobheun added the Epic label Jan 23, 2020
@jacobheun jacobheun added this to the Working Kademlia milestone Jan 23, 2020
@jacobheun jacobheun changed the title Make GetClosestPeers() return more intelligently (cutoff/termination criteria) Terminate Queries Correctly Jan 23, 2020
@jacobheun jacobheun assigned aschmahmann and unassigned Stebalien Jan 23, 2020
@aschmahmann
Copy link
Contributor

Tackling this in #436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants