Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s server and agent nodeattestor plugin #557

Merged
merged 5 commits into from
Oct 19, 2018

Conversation

enricoschiattarella
Copy link
Contributor

Pull Request check list

  • [ X] Commit conforms to CONTRIBUTING.md?
  • [ X] Proper tests/regressions included?
  • [ X] Documentation updated?

Affected functionality

Description of change

Provides a new node attestation plugin for server and agent aimed at Kubernetes environments.
The agent plugin uses the Kubernetes Certificates API to submit a CSR and get it signed by Kubernetes API Server. Kubernetes API Server signs it using its CA and returns the certificate to the agent.
The agent presents the certificate to the server as an identity document.
Server verifies that agent actually possesses corresponding private key and returns a SVID if validation succeeds.

NOTE: The change in glide.yaml (moving package github.com/stretchr/testify from testImport to import section) was done to work around a glide issue. Without that change, glide would report an error " Failed to generate lock file: Generating lock produced conflicting versions of github.com/stretchr/testify. import (~1.1.3), testImport (~1.1.0)"

There are a couple of open glide issues related to this:
Masterminds/glide#597
Masterminds/glide#564

Another way to work around it would be to not use glide for k8s.io/client-go dependency but do go get ... from Makefile/build script

to access the Kubernetes API Server and submit a Certificate Signing Request (CSR).
The credentials consist of a private key and a certificate stored on disks.
It also needs a root certificate to validate the TLS certificate presented
ny the Kubernetes API Server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the Kubernetes...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


In order to retrieve the identity document, the plugin needs user credentials
to access the Kubernetes API Server and submit a Certificate Signing Request (CSR).
The credentials consist of a private key and a certificate stored on disks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stored on disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

func (p *K8sPlugin) setConfig(c *K8sConfig) {
p.m.Lock()
defer p.m.Unlock()
p.c = c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should reset p.kubeClient so that it can be recreated with a potentially changed config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return nil, errors.New("k8s: not configured")
}

if p.kubeClient == nil {
Copy link
Member

@azdagron azdagron Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is racy in the face of more than one attestation happening at a time. i would suggest moving the client initialization into the Configure call. that would do two things 1) fails the configure call if bad paths are provided, and 2) allows you to provide the client as part of the configuration which is retrieved under a log (i.e. getConfig)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work @enricoschiattarella!! Left a few comments, curious to hear your thoughts on them

and is in the form:

```
spiffe://<trust domain>/spire/agent/k8s/system:node:<host name>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The : character is defined as reserved in RFC 3986. SPIFFE IDs must conform with this RFC. It states that if reserved characters are used outside of their special meaning(s), then they must be percent-encoded.

I recommend simply avoiding the use of : altogether

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I liked the idea of carrying-over opaquely from Kubernetes RBAC encoding but I 100% agree that this is looking for trouble. If it looks ok, I will change it to "/spire/agent/k8s/system/node/hostname".
The reason to maintain the "system/node" part is that if we ever use Kubernetes to attest anything else using the certs api, there will be no conflicts.

| `k8s_private_key_path` | The path to the private key on disk (PEM encoded PKCS1 or PKCS8) used to authenticate the agent to the Kubernetes API Server| |
| `k8s_certificate_path` | The path to the certificate bundle on disk. Used to authenticate the agent to the Kubernetes API Server | |
| `k8s_ca_certificate_path` | The root certificate used to validate the certificate presented by the Kubernetes API Server | |
| `kubeconfig_path` | Optional. The path to the kubeconfig file containing Kubernetes cluster access information | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this optional, and what do I get if I specify it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "standard" sequence to determine the location kubeconfig, if not provided explicitly.
First try env variable KUBECONFIG, then try home directory, then return error.
It is the sequence followed by kubectl.
I will add an explanation.

func getAgentName() string {
name, err := os.Hostname()
if err != nil {
name = "unknown"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an assumption in SPIRE that all agents can be uniquely addressed... as such, I think it would be best to raise an error here instead of set unknown since it seems that this name is ultimately used in the agent's spiffe id? Also brings up questions about hosts that might have the same hostname? Curious to hear your thoughts on this in the context of kubernetes cluster

More info here #558

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a Kubernetes environment we can rely on each node (and pod) having a unique IP address because essentially NAT is prohibited, so if the trust domain coincides with the Kubernetes cluster, we could use the IP address.

https://kubernetes.io/docs/concepts/cluster-administration/networking/

If the trust domain includes multiple Kubernetes clusters, then I think the best we can do is to include a fingerprint of the CA certificate associated to the kubernetes cluster.
Something like "spire/agent/k8s/cluster/XXXX/system:node:".

As a last resort, we can completely forget about the hostname and use a fingerprint of the cert, same as the x509pop attestation does.

I'm not sure what's the current thinking is on the question you referred.
Agent naming uniqueness is a nice property...
Human-readable SVIDs are nice too...

I will be happy to implement whatever you guys believe is the best overall solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this topic on the SIG-Kubernetes call today. We concluded that it is probably OK for the resulting ID to be less human friendly than it could be. It is likely that a SPIRE server will manage multiple kubernetes clusters, so including the thumbprint of the CA seems like a good idea. The proposed path including thumbprint and node name is 👍.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server side node attestors can optionally kick back selectors which can be used to group agents together. In this case, it would be awesome if we provide a selector for CA thumbprint k8s:ca_thumbprint:XXXXX. Doing so would allow users to create SPIFFE IDs which address entire clusters, allowing them to register workloads to run in one cluster or another.

// Still no luck, try default (home)
home := os.Getenv("HOME")
if home != "" {
kubeConfigFilePath = path.Join(home, ".kube", "config")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good if we can document this search behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

return nil, fmt.Errorf("Error creating private key: %v", err)
}

certsIntf := kubeClient.CertificatesV1beta1().CertificateSigningRequests()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the beta1 suffix here - how stable is this API? Do we expect it to be the same a year or more from now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature that depends on this API is going GA in the next release, so in the long term the mechanics of node attestation should be stable.
The API itself will not graduate yet, though, so minor changes are possible.
This is the issue for tracking and discussion:

kubernetes/enhancements#43 (comment)

}

certsIntf := kubeClient.CertificatesV1beta1().CertificateSigningRequests()
cert, err := csr.RequestNodeCertificate(certsIntf, key, types.NodeName(getAgentName()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the CSR signing process in kubernetes do anything to assert the correct value in NodeName? In other words, could I get kubernetes to sign a CSR with the name of a different node here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now it does not. It just checks that CN conforms to the expected format and key usage is what is expected. There are plans to strengthen this and support proof-of-ownership, TPMs, etc.

Options for improving over the baseline are:

  • Disable automatic CSR approval, rely on Kubernetes admin to verify the content of the CSR manually before approving it
  • Create a custom CSR signer that will perform extra checks/challenges
  • K8s attestor in SPIRE server checks the IP address of the agent connection and compares it against what is in the Kubernetes certificate

I think all options are doable but IMO, unless this is a showstopper, the best course of action is to live with the limitation for the time being and leverage the improvements of the Kubernetes node attestation mechanism as they become available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this topic on the SIG-Kubernetes call today. The biggest drawback here is that we are essentially asserting the ability of the caller to get a CSR approved, and not asserting anything about the actual identity of the caller. In certain situations, this can result in a non-agent workload being able to obtain the identities of all other workloads in the cluster.

To mitigate this, we should try to assert that the caller is in fact an agent. We discussed using a service account token to do this, configuring the server node attestor with the namespace and service account that the agent runs as. This token can either 1) be used as credentials to submit the CSR, and verified by the server, or 2) passed directly to the server.

Taking this approach gives stronger assurance that the caller we are attesting is an agent, however it still allows for a dishonest agent (or anything that can run as the named service account) to claim that it is a different node/agent. To mitigate this, we could use the kubelet identity to submit the CSR rather than the service account. The server attestor can then verify that the kubelet identity which submitted the CSR matches the SPIFFE ID we are about to issue. IMO, this is the ideal approach, and provides the strongest level of attestation of all the options. @enricoschiattarella has mentioned that obtaining the kubelet identity may be non-trivial, but perhaps it warrants some further research.

To avoid the non-agents-could-obtain-arbitrary-identities problem, we should either add the service account token logic, or use the kubelet identity directly for CSR submission, with a preference for the latter. If we choose the service account path, it probably warrants a direct call out in the plugin documentation (perhaps in the form of a Security Considerations section?). SPIRE is known for its ability to strongly attest and issue identity, but some sacrifices are made in SA token approach (namely, that we are not actually asserting the identity of a node, rather we are asserting that the caller lives in a particular service account).

Some related notes:

  • Using SA token for CSR submission instead of sending it directly to the server avoids the need to pass a SA token to a system "outside" of the kubernetes control plane
  • @mattmoyer informed us that the identity used to submit the CSR is included in the signed certificate - this could be directly extracted by the server attestor for validation, as opposed to having to hit the api server to interrogate the identity of the submitter
  • Node identity may be embedded in SA tokens in the future, it is unclear whether the node identity will 1) be included in resulting signed certificates, 2) be discoverable through interrogation of the API server, looking at the CSR approval stuff. If neither 1 or 2, we would end up having to pass the token directly to SPIRE anyways.

func (p *K8sPlugin) loadConfigData() (*configData, error) {
config := p.getConfig()
if config == nil {
return nil, errors.New("k8s: not configured")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be a good idea to be more specific with these errors, like k8s node attestor: not configured or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

return newError("not configured")
}

if dataType := req.AttestationData.Type; dataType != pluginName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to ensure that req.AttestationData is not nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add the check

}

attestationData := new(x509pop.AttestationData)
if err := json.Unmarshal(req.AttestationData.Data, attestationData); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here - what if req.AttestationData is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix

}

// receive and validate the challenge response
responseReq, err := stream.Recv()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I am curious if this can be abused by a client that never sends the challenge response... or triggered by clients which crash before being able to do so? Perhaps this Recv should be guarded with a select block on context cancellation?

@azdagron does the streaming shim have a default timeout on the context here? Is this a problem elsewhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Recv() call should return with an error in the case of context cancellation. You bring up a good point though. We should probably define timeouts for all plugin related calls. I don't think there is anything specific to do in the plugin code.

@enricoschiattarella enricoschiattarella force-pushed the nodeattestor branch 2 times, most recently from ecc354e to 8b9b152 Compare August 13, 2018 23:12
@enricoschiattarella
Copy link
Contributor Author

Hi @azdagron and @evan2645, thanks for reviewing.

I have uploaded a new version of the code that incorporates your comments, except the one regarding uniqueness of agent names.

PTAL a look. We can continue here or discuss during the next K8s SIG call (Friday 8/17 10AM).

@azdagron
Copy link
Member

azdagron commented Oct 4, 2018

After discussion, we came to consensus that for now using the service account token was good enough and likely to get better if more identity components end up included in the service account token (like node name, for example). I've rebased this on top of current master and pushed a commit that transitions the code to use the service account token instead.

The `k8s_sat` plugin attests nodes running in inside of Kubernetes. The server
validates the signed service account token provided by the agent. It extracts
the service account name and namespace from the token claims. The server uses a
one-time UUID provided by the agent for generate a SPIFFE ID with the form:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"by the agent to generate"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

doc/plugin_agent_nodeattestor_k8s_sat.md Outdated Show resolved Hide resolved
The SPIFFE ID with the form:

```
spiffe://<trust domain>/spire/agent/k8s_sat/<UUID>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should cluster name be included here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.


In addition, this plugin generates the following selectors of type `k8s_sat` for each node:

| Value | Example | Description |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notion that selectors are typed is something that is not necessarily exposed to users. For instance, the registration CLI takes -selector k8s_sat:cluster-name:MyCluster. In the interest of simplicity, I'd lean towards s/Value/Selector/ and include the k8s_sat type prefix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

doc/plugin_agent_nodeattestor_k8s_sat.md Show resolved Hide resolved

token, err := loadTokenFromFile(config.tokenPath)
if err != nil {
return satError.New("unable to get token value: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps "unable to read token from %v: %v", config.tokenPath, err or something similar will provide more information?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


func verifyTokenSignature(clusters []*clusterConfig, token *jwt.JSONWebToken) (cluster *clusterConfig, claims *k8s.SATClaims, err error) {
var lastErr error
for _, cluster := range clusters {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make users configure the cluster name on the agent attestor, maybe we could avoid this bit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good call

p.config = config
}

func verifyTokenSignature(clusters []*clusterConfig, token *jwt.JSONWebToken) (cluster *clusterConfig, claims *k8s.SATClaims, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that alg is correctly set somewhere in here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifiers in square's go-jose library make sure that the algorithm in the token lines up with the verification key that is passed. we're good here.

Selectors: []*common.Selector{
makeSelector("cluster:name", cluster.name),
makeSelector("service-account:namespace", claims.Namespace),
makeSelector("service-account:name", claims.ServiceAccountName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we could simplify this by dropping the suffixes?

k8s_sat:cluster:foo
k8s_sat:service-account:bar
k8s_sat:namespace:baz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

BaseSPIFFEID: k8s.AgentID(config.trustDomain, attestationData.UUID),
Selectors: []*common.Selector{
makeSelector("cluster:name", cluster.name),
makeSelector("service-account:namespace", claims.Namespace),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the plugin name (and hence beginning of all selectors) is k8s_sat which makes me lean towards snake case for service account too

k8s_sat:service_account

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Enrico Schiattarella <enrico.schiattarella@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚴 🌾

@azdagron azdagron merged commit b47a0ba into spiffe:master Oct 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants