Skip to content
This repository has been archived by the owner on May 26, 2022. It is now read-only.

enable TCP keepalives #73

Merged
merged 2 commits into from
Feb 24, 2021
Merged

enable TCP keepalives #73

merged 2 commits into from
Feb 24, 2021

Conversation

marten-seemann
Copy link
Contributor

Since this shouldn't be the responsibility of the stream muxer any more, see libp2p/go-yamux#44.

@willscott
Copy link
Contributor

Is it worth exposing a way to tune keep-alive interval?

@marten-seemann
Copy link
Contributor Author

Ideally, this should just work. In practice, I expect this to be platform-dependent (and dependent on your NAT, of course). We don't really have a good way to measure the effects of different values at this point.

Copy link

@vyzo vyzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need a way to set the keepalive interval.
Many residential NATs reset the stream faster than the default 75s interval.

tcp.go Outdated Show resolved Hide resolved
@marten-seemann
Copy link
Contributor Author

marten-seemann commented Feb 16, 2021

We definitely need a way to set the keepalive interval.
Many residential NATs reset the stream faster than the default 75s interval.

@vyzo How do you suggest we do that? Should we use a shorter value here, or do we want to expose an API for that? Would it be a good idea to only enable keepalives if we're behind a NAT?

@vyzo
Copy link

vyzo commented Feb 16, 2021

Not clear how to best do it.
I think the best approach is to always set it on, and when we detect we are behind a NAT set the keep alive interval to something shorter (say 15s or so).
Not the easiest thing to implement, but I think it's the right solution.

@marten-seemann
Copy link
Contributor Author

I'm not really sure how to implement with without violating our layers here.
We return a transport.CapableConn when dialing and listening, and I don't think adding a SetKeepAlivePeriod is the right way to go. Any ideas?

@vyzo
Copy link

vyzo commented Feb 16, 2021

Yeah, it's tricky business. Maybe we can utilize the event bus?

@Stebalien thoughts?

tcp.go Outdated Show resolved Hide resolved
tcp.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member

@Stebalien thoughts?

15s seems really short. Do we have any data on this? I'd rather just set it to every 30s and leave it at that if it works for us.

I mean, the right way to do this is to learn, but that's hard:

  1. Start keepalives at 15s.
  2. Keep increasing the timeout until idle connections start failing.

But it's much simpler to just pick 30s and be done with it. The overhead should be almost nothing.

@vyzo
Copy link

vyzo commented Feb 16, 2021

But it's much simpler to just pick 30s and be done with it. The overhead should be almost nothing.

Fair enough, let's just do that. If we get reports of connections crapping out, we can consider reducing it (or provide some interface to do so).

Re: data

That's just empirical at that point, and I have also seen it mentioned in NAT traversal papers.

tcp.go Show resolved Hide resolved
log.Errorf("Can't set TCP keepalives.")
}
if err := keepAliveConn.SetKeepAlive(keepAlive); err != nil {
log.Errorf("Failed to enable TCP keepalive: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to start off with an error, but this may have false positives (e.g., connection closed?). However, I assume that this method won't fail until we actually call close ourselves.

Also, return?

tcp.go Show resolved Hide resolved
@marten-seemann marten-seemann merged commit 2b0b384 into master Feb 24, 2021
@marten-seemann marten-seemann deleted the keep-alives branch March 27, 2021 05:11
@aschmahmann aschmahmann mentioned this pull request May 14, 2021
71 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants