Strange Behavior With Provider Correlated With Long Running Contexts #7

bonedaddy · 2019-06-12T01:24:35Z

Currently, when instantiating the provider a context is passed in. This context is then used internally when providing records to the network.

In my usage of this library, the context that gets passed in during initialization is never cancelled until my custom node service stops. The issue appears to be that if the context being used when providing a record to the network never cancels, then the provide call doesn't actually complete. This has lead to strange behaviour that I've noticed if I'm starting+stopping my custom node multiple times, using different peerID's each time. And seemingly each time the node "starts", records that were previously being announced to the network started showing up.

Digging a little further this isn't actually the case. What's happening is that only when the node service stops, is the context cancelled, thus triggering the Provide call to finish.

Digging a little further, I forked the repository to introduce some custom behaviour which is to initialize a temporary context with a timeout of 1 minute. By doing this, records will be provided to the network as expected and everything is okay.

So the root of the issue appears to be that records aren't actually being provided to the network, until the context that's passed into the Provide call is cancelled.

The text was updated successfully, but these errors were encountered:

lanzafame · 2019-06-12T05:44:33Z

This has lead to strange behaviour that I've noticed if I'm starting+stopping my custom node multiple times, using different peerID's each time. And seemingly each time the node "starts", records that were previously being announced to the network started showing up.

I have noticed this behaviour when testing go-libp2p-dht-overlay, I ended up persisting the identity of the peers as workaround.

lanzafame · 2019-06-12T05:47:28Z

@postables which content router are you passing to NewProvider in your custom nodes?

bonedaddy · 2019-06-12T06:51:17Z

Just a regular *dht.IpfsDHT

These are the exact options I use when building my libp2p host, and dht. The datastore I'm using is a map datastore with a mutex wrap, and an in memory peerstore. Additionally I'm using a forked version of the interface connection manager

	opts = append(opts,
		libp2p.Identity(hostKey),
		libp2p.ListenAddrs(listenAddrs...),
		// disabled because it is racy
		// see https://github.com/libp2p/go-nat/issues/11
		//	libp2p.NATPortMap(),
		libp2p.EnableRelay(circuit.OptHop),
		libp2p.ConnectionManager(
			connmgr.NewConnManager(
				ctx,
				wg,
				logger,
				200,
				600,
				time.Minute,
			),
		),
		libp2p.DefaultMuxers,
		libp2p.DefaultTransports,
		libp2p.DefaultSecurity,
	)
	h, err := libp2p.New(ctx, opts...)
	if err != nil {
		return nil, nil, err
	}

	idht, err := dht.New(ctx, h,
		dopts.Validator(record.NamespacedValidator{
			"pk":   record.PublicKeyValidator{},
			"ipns": ipns.Validator{KeyBook: pstore},
		}),
		dopts.Datastore(dstore),
	)

hsanjuan · 2019-06-12T08:30:51Z

I understand what is hanging is dht.Provide(). This calls GetClosestPeers which returns a channel which might be hanging waiting for more peers (until the context is closed) (it says it gets K peers).

git blame shows @whyrusleeping name all over this logic written 5 years ago. I wonder if K=20 actually makes the Provide calls hang on smaller DHTs which do not reach 20 peers (I would hope not but the logic is not super straightforward and I haven't had time to read through it).

michaelavila · 2019-06-12T17:24:20Z

So the root of the issue appears to be that records aren't actually being provided to the network, until the context that's passed into the Provide call is cancelled.

@postables cancelling the context is one way to complete the Provide() call, but it's a short-circuit intended to be used in the event that we need to kill ipfs before a Provide() has returned. I think the thing to focus on is the fact that Provide() itself does not return in a timely manner.

I suspect something along the lines of what @hsanjuan mentioned.

michaelavila · 2019-06-12T18:13:23Z

@postables I've added a timeout to Provide() to match what is done in go-bitswap. I'm going to do some root cause analysis today, but this should alleviate the never returning bug you pointed out.

#8

bonedaddy · 2019-06-13T01:03:04Z

Ok I've done some brief testing and these are my findings:

As expected, if no timeout is given the issue still stands
Providing a timeout, provide submissions to the dht work as expected
Providing here, vs providing via go-bitswap seems to be a bit slower. I have no "hard proof" of this, and am merely basing this off my non-scientific observations so take this with a grain of salt.

I'll also take a look into the dht code, since it may help to have some extra eyes on the issue.

edit 1: i'll post this on the PR as welll

edit 2: to clarify what I mean by "slower", it seems like message propagation that node X is providing content Y is slower when using go-ipfs-provider, than if using bitswap.

michaelavila · 2019-07-01T16:13:34Z

Merged.

michaelavila mentioned this issue Jun 12, 2019

Add timeout to Provide call #8

Merged

michaelavila closed this as completed Jul 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange Behavior With Provider Correlated With Long Running Contexts #7

Strange Behavior With Provider Correlated With Long Running Contexts #7

bonedaddy commented Jun 12, 2019

lanzafame commented Jun 12, 2019

lanzafame commented Jun 12, 2019

bonedaddy commented Jun 12, 2019

hsanjuan commented Jun 12, 2019

michaelavila commented Jun 12, 2019 •

edited

Loading

michaelavila commented Jun 12, 2019

bonedaddy commented Jun 13, 2019 •

edited

Loading

michaelavila commented Jul 1, 2019

Strange Behavior With Provider Correlated With Long Running Contexts #7

Strange Behavior With Provider Correlated With Long Running Contexts #7

Comments

bonedaddy commented Jun 12, 2019

lanzafame commented Jun 12, 2019

lanzafame commented Jun 12, 2019

bonedaddy commented Jun 12, 2019

hsanjuan commented Jun 12, 2019

michaelavila commented Jun 12, 2019 • edited Loading

michaelavila commented Jun 12, 2019

bonedaddy commented Jun 13, 2019 • edited Loading

michaelavila commented Jul 1, 2019

michaelavila commented Jun 12, 2019 •

edited

Loading

bonedaddy commented Jun 13, 2019 •

edited

Loading