Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Fail to get block from connected peer #99

Closed
bruinxs opened this issue Mar 14, 2019 · 9 comments · Fixed by #106
Closed

Fail to get block from connected peer #99

bruinxs opened this issue Mar 14, 2019 · 9 comments · Fixed by #106
Assignees

Comments

@bruinxs
Copy link

bruinxs commented Mar 14, 2019

I have a node that is connected to another node <peer.ID Qm*ZtVMeB>, <peer.ID Qm*ZtVMeB> has a data block QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE, but the request has not been successful, the log is printed under loop mistake.

12:30:29.307  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:29.308 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:29.308 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:29.309 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:29.391 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:30.990 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:30.991 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:30.993 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:30.993 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:31.831 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.677  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:32.677 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.678 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:32.678 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:32.683 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:32.683 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:33.097 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
......

Executing the following command on the <peer.ID Qm*ZtVMeB> node is expected output

 > ipfs bitswap wantlist -p QmWrvYBRZm2jeffxyVtomoa3e8amVMXVyzzkdKXkzHRUV1
QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE

I am using a private network, the ipfs verion:

ipfs version --all
go-ipfs version: 0.4.19-
Repo version: 7
System version: amd64/darwin
Golang version: go1.12
@bruinxs
Copy link
Author

bruinxs commented Mar 14, 2019

duplicate #65

@Stebalien
Copy link
Member

So, the following could be happening:

  1. Peer A connects to peer B, ends up with two connections.
  2. Peer A sends want to peer B.
  3. Peer B sends block to peer A and removes the block from the wantlist.
  4. The connection used in (3) gets killed and the block gets lost in transit.

However, the peers are never disconnected so peer A never resends the wantlist.


Alternative theory:

  1. Peer A connects to peer B.
  2. Peer A sends its wantlist to peer B.
  3. Peer A disconnects from peer B.
  4. Peer B notices the disconnect, forgets peer A's wantlist.
  5. Peer A immediately reconnects to peer B.
  6. Peer A notices the disconnect, sees that it still has a connection, doesn't resend it's wantlist.

There are probably more...

@Stebalien
Copy link
Member

For issue 1:

  • When sending a block, we could move the want to some "suspended" wantlist. If we don't receive an unwant from that user within a period of time, we could send the block again.
  • We should better track block send errors. As far as I can tell, we never retry but we really should. Ideally, if we keep failing, we'd close the peer's inbound wantlist stream to signal "go away".

For issue 2:

The only thing I can think of is the following: whenever the stream we're using to send wantlists closes, we (a) open a new one and (b) send the complete wantlist. Unfortunately, I really wanted to eventually move away from keeping a stream open entirely but, we'll, I can't think of a better solution.

Really... bitswap should have sessions, sequence numbers, and acks (reliable but out of order message delivery).

@Stebalien
Copy link
Member

(cc @hannahhoward)

@bruinxs
Copy link
Author

bruinxs commented Apr 3, 2019

  • we could send the block again
  • we'd close the peer's inbound wantlist stream to signal "go away"

Great, but I think the receiving node should not be sent repeatedly, this should be the requesting node to maintain wantlist.

In fact, when the requesting node waits for the data block response, it knows the provider information. Maybe we just resend the wantlist to the provider node periodically.

  1. Broadcast block request (wantlist)

  2. Find provider for a block

  3. If we keep failing and get a provider, resend the block wantlist to the provider periodically

@Stebalien
Copy link
Member

Great, but I think the receiving node should not be sent repeatedly, this should be the requesting node to maintain wantlist.

The issue here is that the responding node is preemptively removing the block from the requesting node's wantlist because it has sent the block. I'm just suggesting that we shouldn't do that and, instead, wait for the requesting node to update their own wantlist.

Maybe we just resend the wantlist to the provider node periodically.

We should probably do this anyways just to be safe.

@hannahhoward
Copy link
Contributor

I think a good first step, and this has come up a few times, is to just periodically rebroadcast the wantlist. The wantlist itself (as opposed to the blocks) is not a ton of data so sending it periodically can't be that expensive and will get the receiver an opportunity to keep their wantlist in sync with the requestor. I'll go ahead and implement this.

hannahhoward added a commit that referenced this issue Apr 4, 2019
Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds

fix #99, fix #65
@ghost ghost assigned hannahhoward Apr 4, 2019
@ghost ghost added the status/in-progress In progress label Apr 4, 2019
@ghost ghost removed the status/in-progress In progress label Apr 10, 2019
@Stebalien
Copy link
Member

Reopening as this is still a valid issue, just less of an issue.

@Stebalien
Copy link
Member

I've forked this into ipfs/boxo#97 and ipfs/boxo#96.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants