Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Improve Request Sharding #167

Closed
dirkmc opened this issue Aug 2, 2019 · 8 comments
Closed

Improve Request Sharding #167

dirkmc opened this issue Aug 2, 2019 · 8 comments
Assignees

Comments

@dirkmc
Copy link
Contributor

dirkmc commented Aug 2, 2019

#165 outlines session request sharding in Bitswap. Each session maintains a list of peers that have responded to a wantlist request with a block, ordered by latency. Requests are split such that groups of CIDs (wants) are sent to groups of peers. As responses come in, the session adjusts the split factor up or down to maintain a balance between redundancy and too many duplicates.

Responses are processed on a block-by-block basis

The live want queue has 32 slots. Once the queue is full, then each time a response block is processed

  1. one slot in the live want queue opens up (the slot the response block CID occupied)
  2. the slot is filled with one CID (want) from the secondary queue
  3. the CID is sent out as a new want request

However the splitter is designed to work over groups of blocks, not a single block. So regardless of the split factor, blocks will always be sent to the same peers until the split factor changes.

For example if there are 8 peers (A - H) and the split factor is 3, the peers will be split into three groups. The CID will be sent to the peers in the first group and the other two groups will be disregarded:

cid        ->  A D G
<none>     ->  B E H
<none>     ->  C F

All blocks will be sent to peers A, D and G until the split factor changes. Regardless of the split factor, blocks will always be sent to peer A.

Shard wants on a streaming basis

It would be better for a few reasons if all blocks in the response were processed as a group, however quite often the response contains a small number of blocks anyway, so any solution should be able to handle a single block as well as several blocks.

There is a WIP PR to adjust the peer list order probabilistically, which should mitigate the problem, as the order of peers will change somewhat with each request. However it may be easier to reason about the problem if wants are instead sharded on a streaming basis.

Sharding should

  • Probabilistically select N peers to send the block to.
    N is analogous to the split factor: it should vary to find a balance between redundancy and too many duplicates.
  • Select peers with low latency and high probability of having the target block.
  • Balance load so peers don't get over-loaded.
  • Ask nearby peers for blocks even if they didn't have any so far.
    Asking nearby peers we are already connected to for a block is a relatively cheap operation. If several peers are downloading the same data, we want them to quickly start downloading from each other so as to
    • reduce load on the "seed" peer and the network as a whole
    • increase bandwidth (download from several peers at once)
@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 2, 2019

Asking nearby peers we are already connected to for a block is a relatively cheap operation

@Stebalien do you know if this is true? It seems like it should be but I'm not sure

@Stebalien
Copy link
Member

Stebalien commented Aug 2, 2019

The live want queue has 32 slots.

IMO, this is the primary issue. This limit should be per-peer. We may also want some kind of global wantlist rate limit but thats a different issue.

When a we run out of want limit slots for a peer, that peer should be put on hold till a slot opens up.

Asking nearby peers we are already connected to for a block is a relatively cheap operation
@Stebalien do you know if this is true? It seems like it should be but I'm not sure

In general, it is.

@hannahhoward
Copy link
Contributor

hannahhoward commented Aug 5, 2019

@Stebalien the live want queue should increase, but I think @dirkmc is right that the current approach tends to mean we aren't actually sharding because we only get one new want at a time. And that would be true even if the live wants increased. I like the streaming idea.

I do also like the idea of removing the session live want list and tracking on a per peer baseis.

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 8, 2019

The following proposal takes a step in the direction of a more sophisticated model, but assumes that for now there are no protocol extensions (in #166 we discuss extending the protocol).

The goals are:

  • Limit want requests per-peer
    (rather than per-session)
  • Treat wants as a stream
    So that they can be processed one-by-one (rather than in groups)
  • Move towards a throughput model
    (rather than tracking latency)

Session

Discovery

Follow the current model:

  • broadcast first CID to all connected peers
  • search for providers of first CID (using eg DHT)
  • periodically search for providers of random CID (using eg DHT)

Want Potential

The session orders wants by potential (greatest - lowest), then by FIFO.
The session want potential starts at 2 and varies as the local node receives blocks / duplicates.
The session changes an individual want's potential by

  • -1 when the want is sent to a peer
  • +1 when there's a timeout for the want

Per-peer rate limiting (for session)

Note: This section discusses rate-limiting peers on a per-session basis. Below we discuss rate-limiting requests from all sessions to a peer.

We want to trade-off

  • getting data from responsive nodes
  • spreading requests across many nodes (overall higher bandwidth)

The session increases the rate-limit for a peer exponentially as it receives blocks. When a peer is discovered:

  • the session sets the want limit to N wants for the peer
  • once the session receives a block from the peer, it increases the limit to 2N wants
  • once the session receives N blocks from the peer, it increases the limit to 4N wants etc

Want Stream

For each want, the session selects a peer probabilistically such that the probability

  • increases with blocks received from the peer (for this session)
  • decreases with timeouts from the peer (for this session)
  • decreases with the size of the peer's queue (across sessions)
  • zero if the want is already in the peer's queue (across sessions)
  • zero for peers the session sent the want to in the last T interval (then increases slowly)
  • increases for newly discovered peers (for this session)

Peer Broker

Peers have a limited-size want request queue (the limit applies to all requests from all sessions). The Peer Broker manages the want queue for each peer.

The Peer Broker performs matching when both these conditions are true:

  • there is a peer with free space in its queue
  • there is a session with wants to send

The Peer Broker repeatedly asks the sessions in round robin fashion if they are interested in the currently available peers.

The peer is allowed to send a message each time

  • the peer's want queue is full
  • there are no sessions with wants to send

Note that this implies that sessions must decide quickly if they are interested in sending a want to one of the candidate peers, so as not to block peers that are waiting to send out a message.

@Stebalien
Copy link
Member

Want Potential

"Expected wants"?

Per-peer rate limiting (for session)

We'd also like to avoid forcing the peer to maintain a massive wantlist for us. Although I guess that might be the job of the peer broker.

The session increases the rate-limit for a peer exponentially as it receives blocks. When a peer is discovered:

What's the rational behind this?

decreases with the size of the peer's queue (across sessions)

Note: For the purposes of throughput, separate sessions might as well be
separate peers. We can't know how much load other peers are putting on this node so I wouldn't worry about this that much.

Instead, we should optimize for downloading the content as fast as we can. If we do that, we'll automatically pick the peers with the most available bandwidth.

(Ok, that kind of assumes that these peers are getting their bandwidth maxed out. Doing some local load balancing across peers is still useful, we just shouldn't focus on it too hard).

zero if the want is already in the peer's queue (across sessions)

This should be 1.

zero for peers the session sent the want to in the last T interval (then increases slowly)

?

increases for newly discovered peers (for this session)

You mean starts higher?

Peer Broker

I think this section mixes peer/session a few times.

The Peer Broker repeatedly asks the sessions in round robin fashion if they are interested in the currently available peers.

That sounds really wasteful.

@Stebalien
Copy link
Member

There are a lot of knobs here that could behave in ways we're not expecting. When possible, let's try to map them directly onto desired effects. Ideally, each one would follow the following rule:

  1. Do X
  2. Throughput increases, do X.
  3. Throughput decreases, do less of X.
  4. Throughput stays the same, do Y (which may be X in some cases).

"Want Potential" does this really well: increase duplicate requests until we start getting duplicate responses, then decrease.

Let's reason through how all the other knobs correlate to throughput (or fairness).

@dirkmc
Copy link
Contributor Author

dirkmc commented Aug 11, 2019

The session increases the rate-limit for a peer exponentially as it receives blocks

What's the rational behind this?

If we don't rate-limit per session, then the session may request all its blocks from the first peer that responds. For example in a scenario where the per-peer rate-limit is 128, and there haven't been any requests yet:

  1. The session broadcasts a want for the root block CID
  2. Peer A responds after 100ms
  3. The session sends a want for the 20 children of the root block to Peer A
  4. Peer B responds after 200ms

Perhaps in this scenario overall throughput would have been higher if the local node requested 10 blocks from Peer A and 10 blocks from Peer B. So I think we may need to have two rate-limits:

  • per peer for each session (increasing as blocks come in)
  • per peer across sessions (so we can max out the peers and send to the ones that free up space in their queues fastest)

I proposed an exponentially increasing session rate-limit per-peer as a way to try to spread the requests across peers in the discovery stage of a session's requests, then quickly move to optimizing for the fastest responding peers as we move out of discovery. I'm open to other ways of doing this.

For each want, the session selects a peer probabilistically such that the probability

  • zero if the want is already in the peer's queue (across sessions)
  • zero for peers the session sent the want to in the last T interval (then increases slowly)

Here I'm suggesting that

  • if the want is already in the queue for a peer, then the probability we choose to send the want to that peer (again) should be zero
  • if the want is not in the peer's queue, but we recently sent this want to the peer and got a timeout (eg in the last second) then the probability of choosing the peer for the want should be zero for some time interval T (eg 5 x latency). After that time interval, the probability should slowly increase.
  • increases for newly discovered peers (for this session)

You mean starts higher?

Yes. My thinking is that if nearby peers are downloading the same file from the seed, we should send wants to them so as to maximize throughput.

The Peer Broker repeatedly asks the sessions in round robin fashion if they are interested in the currently available peers.

That sounds really wasteful.

The idea is that we want to process blocks as a stream (one-by-one). The Peer Broker knows how much space each peer has, and repeatedly asks each Session if it would like to send to any of the peers with free space. As the want lists for the peers fill up, the list of candidate peers will change. The peer can send its message only once no more sessions want blocks, or once a peer's want list is full (so that we're not sending lots of messages with a single want).

Let's reason through how all the other knobs correlate to throughput (or fairness).

Sounds like a good approach 👍

@dirkmc
Copy link
Contributor Author

dirkmc commented Mar 4, 2020

ipfs/kubo#6782

@dirkmc dirkmc closed this as completed Mar 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants