Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS resolution should be moved to the transports #1597

Closed
Tracked by #1630 ...
marten-seemann opened this issue Jun 11, 2022 · 7 comments · Fixed by #1719
Closed
Tracked by #1630 ...

DNS resolution should be moved to the transports #1597

marten-seemann opened this issue Jun 11, 2022 · 7 comments · Fixed by #1719
Assignees
Labels
effort/hours Estimated to take one or several hours exp/novice Someone with a little familiarity can pick up kind/maintenance Work required to avoid breaking changes or harm to project's status quo

Comments

@marten-seemann
Copy link
Contributor

We currently resolve multiaddrs in the basichost:

func (h *BasicHost) resolveAddrs(ctx context.Context, pi peer.AddrInfo) ([]ma.Multiaddr, error) {

This leads to problems with transports that, for example, validate TLS certificates, as they're kept ignorant about the domain name, and only learn the resolved address. This applies to WebSockets (thanks to @aschmahmann for bringing it up in #1592) and will apply to WebTransport as well (although we will mostly use certificate hashes there).

Architecturally, address resolution should probably live within the transport. This has a few advantages:

  • it gives full control what to do with a multiaddr to that transport
  • it makes address resolution part of the dial timeout
  • it speeds up things, since we don't need to resolve all the peer's addresses before we start dialing
  • it parallelises address resolution (as dials are run concurrently)
  • it moves logic out of the basic host

However, there are caching implications. Currently, we implement some rudimentary caching for a const 2 minutes, not based on the TTL of the DNS record:

h.Peerstore().AddAddrs(pi.ID, resolved, peerstore.TempAddrTTL)

Arguably, caching should live inside the DNS resolver, so one could argue that this is safe to remove.

@Stebalien @vyzo @aschmahmann @MarcoPolo Thoughts?

@marten-seemann marten-seemann added exp/novice Someone with a little familiarity can pick up P1 High: Likely tackled by core team if no one steps up kind/maintenance Work required to avoid breaking changes or harm to project's status quo effort/hours Estimated to take one or several hours and removed P1 High: Likely tackled by core team if no one steps up labels Jun 11, 2022
@vyzo
Copy link
Contributor

vyzo commented Jun 11, 2022

I am not convinced its the transport's responsibility to do dns resolution; and we can pass the domain in a contrext option for websockets.

For caching, we should probably rely on the resolver, might make sense to make a caching layer for the vanilla/OS resolver as the DoH resolver already does it.

@aschmahmann
Copy link
Collaborator

This leads to problems with transports that, for example, validate TLS certificates, as they're kept ignorant about the domain name, and only learn the resolved address

IIUC this isn't technically what happens instead we get both /dns4/foo.tld/tcp/443/wss/p2p/QmFoo and /ip4/1.2.3.4/tcp/443/wss/p2p/QmFoo in the peerstore and then are stuck with the decision of dialing both (whether sequentially or in parallel) or choosing one.

At the moment WSS chooses the worst option by only choosing the IP one due to the filtering here

var dialMatcher = mafmt.And(mafmt.IP, mafmt.Base(ma.P_TCP), mafmt.Or(mafmt.Base(ma.P_WS), mafmt.Base(ma.P_WSS)))

I am not convinced its the transport's responsibility to do dns resolution; and we can pass the domain in a contrext option for websockets.

IIUC the issue above is not so much that we're missing the DNS ID when dialing websockets its that we've effectively thrown away information by resolving DNS -> IP and keeping them as two equally valid addresses, when in reality one of them derives from the other. This means we're stuck trying to figure out which of /dns4/foo.tld/tcp/443/wss/p2p/QmFoo and /ip4/1.2.3.4/tcp/443/wss/p2p/QmFoo is the "real" address.

A couple ways I can see approaching this:

  1. Only keep around top-level addresses (i.e. only resolve /p2p/QmFoo but no other address pieces) in the set of addresses to dial coming from the peerstore. Then let transports do resolution or ask the host to do resolution if needed.
  2. Keep the top-level addresses in the peerstore as the set of peers to dial, but also pass around a list of "derived addresses" that the transport can also choose to make use of

I suspect 1 is easier to do, but either seem plausible.

@MarcoPolo
Copy link
Collaborator

It seems to me that a transport should encapsulate all the logic around (at least a part of) the multiaddr. It seems weird that the host knows how to resolve a dns name to an ip address, shouldn't this be fully the transport's responsibility?

I think I agree with @marten-seemann here. As for caching, isn't the OS doing this already? Have we seen a need to handle this caching ourselves?

@aschmahmann

Only keep around top-level addresses (i.e. only resolve /p2p/QmFoo but no other address pieces) in the set of addresses to dial coming from the peerstore. Then let transports do resolution or ask the host to do resolution if needed.

I'm not sure I follow, do you mean only keep the peer id in the peer store? Or only keep the unresolved /dns4/... address?

@MarcoPolo
Copy link
Collaborator

As for caching, isn't the OS doing this already? Have we seen a need to handle this caching ourselves?

Ah we support custom dns resolvers. Okay I agree the cache should live with these resolvers.

@aschmahmann
Copy link
Collaborator

I'm not sure I follow, do you mean only keep the peer id in the peer store? Or only keep the unresolved /dns4/... address?

I meant keeping the addresses that libp2p learns about either from the caller calling something like host.Connect() or learned via some other protocol like identify. So in this case the /dns4/....

The comment around the peer ID was just to say that if the caller does something like host.Connect on /p2p/PeerID which causes some resolver (e.g. the IPFS Public DHT) to do a lookup and learn about /dns4/... we should count that one too and not just say the /p2p/PeerID was the top level and /dns4/... was a resolution of that address.

@marten-seemann marten-seemann mentioned this issue Jun 18, 2022
41 tasks
@marten-seemann
Copy link
Contributor Author

To clarify, this proposal only applies to /dnsX addresses. The host still needs to resolve /dnsaddrs.
Ideally, we find a way to that concurrently with other dials.

@marten-seemann
Copy link
Contributor Author

Just started to play around with this, and ran into the following issue: madns.Resolver.Resolve returns a []ma.Multiaddr. This is correct, a single DNS name can resolve to multiple IP addresses.

This however prevents us from just doing a call to Resolve in transport.Dial. Dial is expected to dial a single connection, and shouldn't have to deal with concurrent dials (that logic belongs in the swarm).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
effort/hours Estimated to take one or several hours exp/novice Someone with a little familiarity can pick up kind/maintenance Work required to avoid breaking changes or harm to project's status quo
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants