Fail to get block from connected peer #99

bruinxs · 2019-03-14T05:11:21Z

I have a node that is connected to another node <peer.ID Qm*ZtVMeB>, <peer.ID Qm*ZtVMeB> has a data block QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE, but the request has not been successful, the log is printed under loop mistake.

12:30:29.307  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:29.308 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:29.308 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:29.309 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:29.391 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:30.990 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:30.991 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:30.993 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:30.993 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:31.831 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.677  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:32.677 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.678 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:32.678 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:32.683 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:32.683 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:33.097 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
......

Executing the following command on the <peer.ID Qm*ZtVMeB> node is expected output

 > ipfs bitswap wantlist -p QmWrvYBRZm2jeffxyVtomoa3e8amVMXVyzzkdKXkzHRUV1
QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE

I am using a private network， the ipfs verion:

ipfs version --all
go-ipfs version: 0.4.19-
Repo version: 7
System version: amd64/darwin
Golang version: go1.12

The text was updated successfully, but these errors were encountered:

bruinxs · 2019-03-14T05:13:32Z

duplicate #65

Stebalien · 2019-03-25T20:02:13Z

So, the following could be happening:

Peer A connects to peer B, ends up with two connections.
Peer A sends want to peer B.
Peer B sends block to peer A and removes the block from the wantlist.
The connection used in (3) gets killed and the block gets lost in transit.

However, the peers are never disconnected so peer A never resends the wantlist.

Alternative theory:

Peer A connects to peer B.
Peer A sends its wantlist to peer B.
Peer A disconnects from peer B.
Peer B notices the disconnect, forgets peer A's wantlist.
Peer A immediately reconnects to peer B.
Peer A notices the disconnect, sees that it still has a connection, doesn't resend it's wantlist.

There are probably more...

Stebalien · 2019-04-01T22:35:01Z

For issue 1:

When sending a block, we could move the want to some "suspended" wantlist. If we don't receive an unwant from that user within a period of time, we could send the block again.
We should better track block send errors. As far as I can tell, we never retry but we really should. Ideally, if we keep failing, we'd close the peer's inbound wantlist stream to signal "go away".

For issue 2:

The only thing I can think of is the following: whenever the stream we're using to send wantlists closes, we (a) open a new one and (b) send the complete wantlist. Unfortunately, I really wanted to eventually move away from keeping a stream open entirely but, we'll, I can't think of a better solution.

Really... bitswap should have sessions, sequence numbers, and acks (reliable but out of order message delivery).

Stebalien · 2019-04-01T22:35:15Z

(cc @hannahhoward)

bruinxs · 2019-04-03T02:08:39Z

we could send the block again

we'd close the peer's inbound wantlist stream to signal "go away"

Great, but I think the receiving node should not be sent repeatedly, this should be the requesting node to maintain wantlist.

In fact, when the requesting node waits for the data block response, it knows the provider information. Maybe we just resend the wantlist to the provider node periodically.

Broadcast block request (wantlist)
Find provider for a block
If we keep failing and get a provider, resend the block wantlist to the provider periodically

Stebalien · 2019-04-03T17:56:24Z

Great, but I think the receiving node should not be sent repeatedly, this should be the requesting node to maintain wantlist.

The issue here is that the responding node is preemptively removing the block from the requesting node's wantlist because it has sent the block. I'm just suggesting that we shouldn't do that and, instead, wait for the requesting node to update their own wantlist.

Maybe we just resend the wantlist to the provider node periodically.

We should probably do this anyways just to be safe.

hannahhoward · 2019-04-04T19:02:10Z

I think a good first step, and this has come up a few times, is to just periodically rebroadcast the wantlist. The wantlist itself (as opposed to the blocks) is not a ton of data so sending it periodically can't be that expensive and will get the receiver an opportunity to keep their wantlist in sync with the requestor. I'll go ahead and implement this.

Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds fix #99, fix #65

Stebalien · 2019-04-16T05:45:04Z

Reopening as this is still a valid issue, just less of an issue.

Stebalien · 2019-04-17T05:31:42Z

I've forked this into ipfs/boxo#97 and ipfs/boxo#96.

Stebalien mentioned this issue Mar 25, 2019

possible bitswap stall issue ipfs/kubo#5183

Closed

hannahhoward added a commit that referenced this issue Apr 4, 2019

feat(messagequeue): rebroadcast wantlist

076f709

Provide a failsafe to losing wants on other end by rebroadcasting a wantlist every thirty seconds fix #99, fix #65

hannahhoward mentioned this issue Apr 4, 2019

feat(messagequeue): rebroadcast wantlist #106

Merged

ghost assigned hannahhoward Apr 4, 2019

ghost added the status/in-progress In progress label Apr 4, 2019

Stebalien closed this as completed in #106 Apr 10, 2019

ghost removed the status/in-progress In progress label Apr 10, 2019

Stebalien reopened this Apr 16, 2019

This was referenced Jan 27, 2023

[ipfs/go-bitswap] Wantlist Race A ipfs/boxo#97

Open

Wantlist Race B #109

Open

Stebalien closed this as completed Apr 17, 2019

This was referenced Jan 27, 2023

[ipfs/go-bitswap] Wantlist Race B ipfs/boxo#96

Open

[ipfs/go-bitswap] Wantlist Race B ipfs/boxo#103

Open

[ipfs/go-bitswap] Wantlist Race B ipfs/boxo#136

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to get block from connected peer #99

Fail to get block from connected peer #99

bruinxs commented Mar 14, 2019 •

edited

Loading

bruinxs commented Mar 14, 2019

Stebalien commented Mar 25, 2019

Stebalien commented Apr 1, 2019

Stebalien commented Apr 1, 2019

bruinxs commented Apr 3, 2019

Stebalien commented Apr 3, 2019

hannahhoward commented Apr 4, 2019

Stebalien commented Apr 16, 2019

Stebalien commented Apr 17, 2019

Fail to get block from connected peer #99

Fail to get block from connected peer #99

Comments

bruinxs commented Mar 14, 2019 • edited Loading

bruinxs commented Mar 14, 2019

Stebalien commented Mar 25, 2019

Stebalien commented Apr 1, 2019

Stebalien commented Apr 1, 2019

bruinxs commented Apr 3, 2019

Stebalien commented Apr 3, 2019

hannahhoward commented Apr 4, 2019

Stebalien commented Apr 16, 2019

Stebalien commented Apr 17, 2019

bruinxs commented Mar 14, 2019 •

edited

Loading