Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle duplicate node names & support UUID based proxying. #3340

Closed
wants to merge 8 commits into from

Conversation

fspmarshall
Copy link
Contributor

@fspmarshall fspmarshall commented Feb 11, 2020

This PR consists of two parts. First, attempting to ssh into an ambiguous target results in an error. Second, nodes can now be unambiguously connected to via UUID. Ex:

$ tsh ssh alice@example.com
error: ambiguous host could match multiple nodes
  
Node  Name  Node ID                              Address        Labels
----------- ------------------------------------ -------------- ---------
example.com ae79f375-633e-4a81-941e-83faf4a3c966 127.0.0.1:3022 foo=bar
example.com 17a980a9-ce32-4c95-a5c4-c4b4717851e9 127.0.0.1:4022 spam=eggs            
  
Hint: try addressing the node by unique id (ex: tctl ssh -U user@node-id)

$ tsh ssh -U alice@17a980a9-ce32-4c95-a5c4-c4b4717851e9
...

In order to implement the above, a new format for the proxy subsystem was added:

ssh -o "ForwardAgent yes" \
    -o "ProxyCommand ssh -o 'ForwardAgent yes' -p 3023 %r@example.com -s proxy:conn\(uuid=%h\)" \
    alice@17a980a9-ce32-4c95-a5c4-c4b4717851e9

Fixes #2396

tool/tsh/tsh.go Outdated Show resolved Hide resolved
tool/tsh/tsh.go Show resolved Hide resolved
@fspmarshall fspmarshall force-pushed the fspmarshall/handle-duplicate-node-names branch from 5aa69eb to dcd84d9 Compare February 11, 2020 21:52
lib/client/api.go Outdated Show resolved Hide resolved
lib/web/terminal.go Outdated Show resolved Hide resolved
lib/web/terminal.go Outdated Show resolved Hide resolved
lib/web/terminal.go Outdated Show resolved Hide resolved
lib/client/client.go Outdated Show resolved Hide resolved
for i := range servers {
// If nodeId is supplied, we must unambiguously match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this logic where we loop over servers with all the breaks and continues is getting too unwieldy due to all the special cases we now have. What do you think about breaking it up? So something like:

var servers services.Server
if t.nodeID != "" {
   servers = getServersByUUID(...)
} else {
   servers = getServers(...)
}

// If multiple servers matched, return an error that the client
can use to present a "server-is-ambiguous" error.
if len(servers) > 1 {
   return trace.NotFound("err-server-is-ambiguous")
}

lib/utils/utils.go Outdated Show resolved Hide resolved
return fmt.Sprintf("%s:%s", p.Host, port)
}

func (p *ProxyParams) Encode() string {
Copy link
Contributor

@russjones russjones Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about it, I don't think we should try to support connecting via UUID and host:port to our current proxy subsystem.

The current system is designed around connecting via host:port which is 99% of the use-cases and is going to really be the only one that works well with OpenSSH (where URL encoding probably won't be supported) and not something we can drop support for.

All the special casing to support UUID seems to indicate we're going down the wrong path.

What do you think about this: we leave the current proxy subsystem as-is and we add another subsystem proxy-uuid. When tsh connects via UUID it just requests the proxy-uuid subsystem. Parsing the old subsystem won't change at all and parsing the UUID subsystem should be much simpler.

What do you think? Would that help simplify the codebase?

Copy link
Contributor Author

@fspmarshall fspmarshall Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure... there are pros and cons to both.

In the short term, keeping the old subsystem separate is probably the easier choice. Definitely reduces the chances of accidentally introducing a regression.

Long-term, separate subsystems feels like more maintenance and more places for tests to accidentally miss an edge case. In terms of openssh, this new format seems just as compatible. Either way, the user is manually encoding the subsystem call (see original comment).

@klizhentas any thoughts on this?

@fspmarshall fspmarshall force-pushed the fspmarshall/handle-duplicate-node-names branch from 84777c8 to 6cc2300 Compare February 13, 2020 00:29
Copy link
Contributor

@russjones russjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some largely minor comments, overall looks good, let's discuss tomorrow.

uuid string
}

func (t targetNode) String() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// String returns either the UUID or host:port for the target host. UUID
// is given priority if provided.

if err != nil {
return nil, trace.Wrap(err)
}
// TODO(fspmarshall): Transition to using UUID here once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a similar comment to what we have for the web proxy.

// DELETE IN: 5.0
//
// This fallback will be unnecessary once all proxies support lookup
// by UUID.

}
}
if len(nodes) == 0 {
if tc.HostUUID != "" {
Copy link
Contributor

@russjones russjones Feb 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Client is attempting to connect via UUID.

if err != nil {
return trace.Wrap(err)
}
// TODO(fspmarshall): Upgrade this to use node uuid instead once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a similar comment to what we have for the web proxy.

// DELETE IN: 5.0
//
// This fallback will be unnecessary once all proxies support lookup
// by UUID.

@@ -1411,13 +1457,24 @@ func (tc *TeleportClient) ListNodes(ctx context.Context) ([]services.Server, err
return proxyClient.FindServersByLabels(ctx, tc.Namespace, tc.Labels)
}

// ListAllNodes is the same as ListNodes except that it ignores labels.
func (tc *TeleportClient) ListAllNodes(ctx context.Context) ([]services.Server, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used? Can't see it being called by anyone?

@@ -250,6 +250,197 @@ func SplitHostPort(hostname string) (string, string, error) {
return host, port, nil
}

type ProxyParams struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struct and all members need comments.

return net.JoinHostPort(host, port)
}

func (p *ProxyParams) Encode() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Encode takes the parameters given and creates the proxy subsystem string.
// For connections using host:port that means it returns something like
// "server:3022@default@example.com". For connections using UUID, it creates
// something like "conn(uuid=00000000-0000-0000-0000-000000000000&...)".

return EncodeConnParams(cp)
}

func (p *ProxyParams) legacyEncode() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// legacyEncode transforms the given parameters into the proxy format of
// "host:port@namespace@clusterName".

paramNodeID = "uuid"
)

func ParseProxyParams(request string) (ProxyParams, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// ParseProxyParams parses the proxy subsystem request string. Supports
// standard as well as URL encoded proxy subsystem request schemes.

return params, nil
}

func parseLegacyProxyParams(request string) (ProxyParams, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// parseLegacyProxyParams will parse the classic proxy subsystem request.

Copy link
Contributor

@benarent benarent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about tctl vs tsh, and comment about how much work to expand this to SCP

}
fmt.Fprintf(os.Stderr, "error: ambiguous host could match multiple nodes\n\n")
showNodes(nodes, true)
fmt.Fprintf(os.Stderr, "Hint: try addressing the node by unique id (ex: tctl ssh -U user@node-id)\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to change this from (ex: tctl ssh -U user@node-id)\n") to (ex: tsh ssh -U user@node-id)\n")

@benarent
Copy link
Contributor

benarent commented Feb 19, 2020

( also note to self to add this to CLI docs after )

@fspmarshall
Copy link
Contributor Author

Feedback Discussion

I've gathered a few rounds of feedback via various channels. I think its time to summarize the various questions/concerns/comments I've come across, my current thinking on them, and a proposal for where to go from here.

In no particular order...

Why not just start rejecting duplicate hostnames?

Because users may (and do) have multiple nodes that share the same hostname, rejecting duplicate hostnames would break existing deployments. Furthermore, rejecting duplicate hostnames poses too much of an issue when dealing with things like replication, Blue/Green deploys, rotations, etc... There are perfectly legitimate reasons to have multiple nodes with the same hostname.

Does UUID based routing need to be explicit? Why can't we make <user>@<uuid> "just work"?

If we take the approach of simply checking if the hostname parameter matches a node's unique id, we don't actually succeed in eliminating ambiguity. Take this perfectly legal example:

forrest@fspm:~/teleport$ tctl nodes ls
Nodename UUID                                 Address        Labels    
-------- ------------------------------------ -------------- --------- 
fspm     ae79f375-633e-4a81-941e-83faf4a3c966 127.0.0.1:3022           
fspm     fspm                                 127.0.0.1:4022

The proposed subsystem syntax is kinda ugly and requires escaping when invoked in a shell.

Agreed. If we go this route, a syntax which doesn't require escapes (at least in the trivial case) would be preferable. One simple change is to move from conn(<params>) to conn[<params>], though this doesn't change the fact that multiple keys introduces an ampersand which still needs escaping (e.g. conn[uuid=hello\&port=123]). Another option is to support a "pretty" format that is distinguished by subsystem prefix instead of key-value pairs (I'll cover this later).

Subsystem based proxying should be deprecated now that we support -J/JumpHost.

I'm of the opinion that the -J/JumpHost support isn't sufficient to deprecate subsystem based proxying. While using -J is definitely cleaner in the simple single-cluster case, I think the workflow starts is too limited once trusted clusters are involved.

Because clusters need to be specified by certificate when using JumpHost, the user is forced into making a difficult choice. Either never operate against multiple clusters simultaneously (very limiting when using ssh as part of your automation), or manually manage multiple teleport identity files (cumbersome, and potentially risky).

Requiring certificate-level parameters to resolve ambiguities also increases the likelihood of human error. Say Alice is working on node example.com in cluster testing. Bob asks Alice to assist him with something on node example.com in cluster prod. Alice runs tsh login prod, gives Bob a hand, and goes to lunch. Alice comes back, forgets to run tsh login testing, and proceeds to have a very bad day. There is a real benefit in having cluster be an explicit parameter: it forces users to acknowledge ambiguities and makes it easy for users to permanently resolve ambiguities.

The root issue here stems from the fact that JumpHost doesn't have great support for side-band information.

Why not add a new subsystem for this instead of overhauling the existing proxy: subsystem?

Seems reasonable. There are pros and cons to both approaches, but adding a new subsystem definitely cuts down on churn.

This does open up the possibility of side-stepping the issue of encoding by simply giving up on the idea of supporting key-value pairs. A dedicated subsystem for uuid based routing could be as simple as uuid:<uuid>[@<cluster>]. Not sure I like this though; I'd prefer new subsystem formats to support key-value pairs (even if they also support a more concise version for use in ssh config files).

Why not pass parameters via ENV variable instead of in the subsystem request string?

This is technically possible, but cumbersome to use with openssh. Version 7.8 of openssh introduced the SetEnv directive which improves the experience, but this version still isn't in wide circulation.


Revised Proposal

Leave the existing proxy: subsystem unchanged, with no support for uuid based routing or key-value style parameters.

Create a new subsystem named uuid (or maybe proxy-uuid if that seems too ambiguous) which accepts two alternative encodings, A "pretty" uuid@cluster style encoding, and a "verbose" conn[<params>] style encoding:

# pretty
ProxyCommand ssh -p 3023 %r@some-proxy -s uuid:some-uuid@some-cluster

# verbose
ProxyCommand ssh -p 3023 %r@some-proxy -s uuid:conn[uuid=some-uuid\&cluster=some-cluster]

The verbose encoding will be used by tsh, while the pretty encoding will be the recommended encoding when invoking the uuid subsystem via openssh.

For all future subsystems moving forward, ensure that they accept a key-value pair encoding.

@klizhentas
Copy link
Contributor

@benarent @fspmarshall @russjones still seems like a solution that works, but at a very high cost of adding extra subsystem, deprecating the old subsystem and not supporting -J. I would like us to keep looking for a better solution.

@klizhentas
Copy link
Contributor

Requiring certificate-level parameters to resolve ambiguities also increases the likelihood of human error. Say Alice is working on node example.com in cluster testing. Bob asks Alice to assist him with something on node example.com in cluster prod. Alice runs tsh login prod, gives Bob a hand, and goes to lunch. Alice comes back, forgets to run tsh login testing, and proceeds to have a very bad day. There is a real benefit in having cluster be an explicit parameter: it forces users to acknowledge ambiguities and makes it easy for users to permanently resolve ambiguities.

@fspmarshall this argument is also valid for any tsh login <cluster> scenario, so it does not really change the UX.

Continue to brainstorming, so still I can't understand why the following can't work:

$ tsh ssh alice@example.com
error: ambiguous host could match multiple nodes
  
Node  Name  Node ID                              Address        Labels
----------- ------------------------------------ -------------- ---------
example.com ae79f375-633e-4a81-941e-83faf4a3c966 127.0.0.1:3022 foo=bar
example.com 17a980a9-ce32-4c95-a5c4-c4b4717851e9 127.0.0.1:4022 spam=eggs            
  
Hint: try addressing the node by unique id (ex: tctl ssh user@node-id)

$ tsh ssh  alice@17a980a9-ce32-4c95-a5c4-c4b4717851e9
...

@webvictim
Copy link
Contributor

@fspmarshall this argument is also valid for any tsh login scenario, so it does not really change the UX.

Yes, and this has been on the “we should fix this” list for months. Several people requested that we improve the tsh workflow when using multiple clusters: #3089

@russjones
Copy link
Contributor

Closing this in favor of #3377.

@russjones russjones closed this Mar 5, 2020
@zmb3 zmb3 deleted the fspmarshall/handle-duplicate-node-names branch August 4, 2022 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ssh to wrong node
5 participants