Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for cross account mount #470

Merged
merged 1 commit into from
Jun 8, 2021

Conversation

kbasv
Copy link

@kbasv kbasv commented Jun 3, 2021

Is this a bug fix or adding new feature?

New Feature

What is this PR about? / Why do we need it?

Support for cross-account mount.

A new mount option was introduced to mount-helper mounttargetip. This option allows users to pass the ip address of the desired mount target of the efs file system. This option also facilitates cross account mount support provided vpc-peering has been established.

This PR adds cross account support using above mount option.

Pre-requisite steps to perform cross account mounts with EKS cluster in Account A & EFS in Account B:

  1. Perform vpc-peering between EKS cluster vpc and EFS vpc.
  2. Create an IAM role in Account B which has a trust relationship with Account A and add an EFS policy with permissions to call DescribeMountTargets. This role will be used by CSI-Driver to determine the mount targets for file system in account B
  3. Create kubenetes secret with awsRoleArn as the key and the role from step 2 as the value. For example, kubectl create secret generic x-account --namespace=default --from-literal=awsRoleArn='arn:aws:iam::1234567890:role/EFSCrossAccountAccessRole'
  4. Create a service account with IAM role for the EKS cluster and attach it node-deamonset. Attach this IAM role with EFS client mount permission policy.
  5. Create a file system policy for file system in account B which allows account A to perform mount on it.

Additionally, a new storage class parameter az allows users to specify preferred mount target for mount. If preferred AZ is unavailable, a random AZ will chosen.

Sample Storage class:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
mountOptions:
  - iam
  - az=us-east-1f # optional, allows cross AZ mount
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-01234abcd
  directoryPerms: "700"
  gidRangeStart: "1000" # optional
  gidRangeEnd: "2000" # optional
  basePath: "/dynamic_provisioning" # optional
  az: us-east-1f # optional, used by controller to provide a preferred mount-target AZ
  csi.storage.k8s.io/provisioner-secret-name: x-account # optional, required for cross-account mount. Should be unique per storage class
  csi.storage.k8s.io/provisioner-secret-namespace: kube-system # option, required for cross-account mount

What testing is done?

Tested by spinning up EKS cluster in account A and performing cross account mount on file systems in different accounts.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 3, 2021
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 3, 2021
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
pkg/cloud/cloud.go Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
Comment on lines 323 to 344

func getAvailableMountTargets(mountTargets []*efs.MountTargetDescription) []*efs.MountTargetDescription {
availableMountTargets := []*efs.MountTargetDescription{}
for _, mt := range mountTargets {
if *mt.LifeCycleState == "available" {
availableMountTargets = append(availableMountTargets, mt)
}
}

return availableMountTargets
}

func getMountTargetForAz(mountTargets []*efs.MountTargetDescription, azName string) *efs.MountTargetDescription {
for _, mt := range mountTargets {
if *mt.AvailabilityZoneName == azName {
return mt
}
}
klog.Infof("There is no mount target match %v. Will pick a random mount target.", azName)
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea how we can avoid to implement the same logic on both efs-utils side and csi-driver side?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way is for efs-utils to accept role as a mount option.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess comparing with this approach, adding a role option should be preferred, that will reduce the dup logic in the driver side, also on efs-utils side we provide the output & log message like the ip cannot be connected yet, no available mount target in the AZ, we can reuse it here.

So is this a temp workaround? Should we deprecate the duplicates once the role arn option can be used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While role mount option is preferred to reduce the dup logic, I'm not sure if csi-driver can capture all the messages from efs-utils. CSI driver only captures the message(s) shown on the console after mount attempt and not messages logged in mount.log. I feel it is an overhead to reach mount.log and would rather throw the messages in the driver logs.

What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mount attempt will output the message to the console, e.g. https://github.com/aws/efs-utils/blob/master/src/mount_efs/__init__.py#L1789-L1790 here we use fatal_error to throw back and output the error message.

It will be hard if we change the common logic in two place in the future, also if we add a logic, we need to implement the same logic in the driver, that will be not easy to maintain, and csi-driver should be mainly responsible for managing.

So IMO for this feature we can use the logic here, once we add the role feature, we should explore to optimize the code path and reduce the common logic as much as possible. Let efs-utils handle most of the mount work and logic.

pkg/driver/controller.go Outdated Show resolved Hide resolved
azName = value
}

// Fetch aws role ARN for cross account mount from secrets
Copy link

@Cappuccinuo Cappuccinuo Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this part needs more comment, how/where the cross-account mount credential is retrieved and what is secrets?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added information on how to setup the secrets in the description of the PR.

You'd need to pass the secret into storage class parameter and external-provisioner will pull the information and adds it to CreateVolume request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to add some documentation on readMe on how it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, need user-facing docs basically as you have already written in the PR description. for dev-facing code comment, include a link to https://kubernetes-csi.github.io/docs/secrets-and-credentials.html#csi-operation-secrets

if roleArn != "" {
localCloud, err = cloud.NewCloudWithRole(roleArn)
if err != nil {
return nil, status.Errorf(codes.Unauthenticated, "Unable to initialize aws cloud: %v. Please verify role has the correct AWS permissions for cross account mount", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should the roleArn be configured? in .config file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The roleArn should be provided via secret

Copy link

@Cappuccinuo Cappuccinuo Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the secret some specific configuration for kubenetes storage class?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secret is an object which lets you store information, in this case we rely on users to configure roleArn through secrets.

K8s allows secrets to be accessed by a pod in multiple ways. In our context, if a user provides inputs secret names & namespace into the storage class, external-provisioner will extract it and pass it on to csi-driver's Controller service rpc, provided it has right RBAC rule to access the secret

More info on how secrets work for CSI operations: https://kubernetes-csi.github.io/docs/secrets-and-credentials.html#csi-driver-secrets

if err != nil {
klog.Warningf("Failed to describe mount targets for file system %v. Skip using `mounttargetip` mount option: %v", accessPointsOptions.FileSystemId, err)
} else {
volContext[MountTargetIp] = mountTarget.IPAddress

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we don't set this mounttarget ip address and just rely on efs-utils to find the mounttarget? Will that fail?

Copy link
Author

@kbasv kbasv Jun 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would fail since mount happens on node-deamonset which is a separate process.

Since you cannot setup config/credential files on the driver pod, driver pod will not know how to switch credentials for different AWS accounts and efs-utils will utilize the credentials from default instance role or service account.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if roleArn is empty, AzName is ignored. should AzName be respected for cross-AZ mount? Trying to make sense of all scenarios' behavior and make sure we don't break the current behavior. If neither roleArn nor az are provided, then efs-utils should find the mount target.

cross-AZ + cross-Account (both az and roleArn have been specified)
cross-AZ + same-Account (only az has been specified)
same-AZ + cross-Account (only roleArn has been specified)
same-AZ + same-Account (neither have been specified)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user wants to use cross-az/same-az for same account, they can make use of az mount option, for example they can pass az=<desired az> under storage class mount options and efs-utils will figure out the az/mount-target-ip to mount.

However, for cross-account mount, due to the inability to pass credentials to node service / efs-utils, we have to pre-determine the az/mount-target-ip and pass it to efs-utils.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I didn't even notice az in both mountOptions and parameters tbh. Do you mind putting your comment in a code comment?

I double checked and if both mounttargetip and az set then efs-utils will just ignore az instead of trying to determine.

regarding the inability to pass credentials, there might be a way (in the future) to use the token feature https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1855-csi-driver-service-account-token so that the node service pod can use the client pod's token to get AWS API creds. but that's another big issue.

@kbasv kbasv force-pushed the x-account branch 3 times, most recently from 0099bd1 to 9efd937 Compare June 7, 2021 14:44
Dockerfile Outdated Show resolved Hide resolved
Comment on lines 323 to 344

func getAvailableMountTargets(mountTargets []*efs.MountTargetDescription) []*efs.MountTargetDescription {
availableMountTargets := []*efs.MountTargetDescription{}
for _, mt := range mountTargets {
if *mt.LifeCycleState == "available" {
availableMountTargets = append(availableMountTargets, mt)
}
}

return availableMountTargets
}

func getMountTargetForAz(mountTargets []*efs.MountTargetDescription, azName string) *efs.MountTargetDescription {
for _, mt := range mountTargets {
if *mt.AvailabilityZoneName == azName {
return mt
}
}
klog.Infof("There is no mount target match %v. Will pick a random mount target.", azName)
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mount attempt will output the message to the console, e.g. https://github.com/aws/efs-utils/blob/master/src/mount_efs/__init__.py#L1789-L1790 here we use fatal_error to throw back and output the error message.

It will be hard if we change the common logic in two place in the future, also if we add a logic, we need to implement the same logic in the driver, that will be not easy to maintain, and csi-driver should be mainly responsible for managing.

So IMO for this feature we can use the logic here, once we add the role feature, we should explore to optimize the code path and reduce the common logic as much as possible. Let efs-utils handle most of the mount work and logic.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Cappuccinuo, kbasv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
pkg/driver/controller.go Outdated Show resolved Hide resolved
@@ -303,6 +303,19 @@ func TestNodePublishVolume(t *testing.T) {
mountArgs: []interface{}{volumeId + ":/", targetPath, "efs", []string{}},
mountSuccess: true,
},
{
name: "success: normal with volume context populated from dynamic provisioning",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an e2e test for cross Account mount but it's going to be hard since we need 2 accounts and to store creds somewhere. Any ideas how to achieve this? we could store the creds in secrets manager in the existing CI account and construct the secret by retrieving fro msecrets manager as part of the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to give this some thought. Apart from having to manage credentials for two accounts, Cross-account mount requires vpc peering. I'm not sure how to achieve this on the fly for the integ tests.

@wongma7
Copy link
Contributor

wongma7 commented Jun 7, 2021

lgtm pending doc/README/example update for the new parameter/secret

@wongma7
Copy link
Contributor

wongma7 commented Jun 7, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 7, 2021
@wongma7 wongma7 merged commit b0955cb into kubernetes-sigs:master Jun 8, 2021
Comment on lines +50 to +53
RUN yum -y install wget
RUN wget https://bootstrap.pypa.io/get-pip.py -O /tmp/get-pip.py
RUN python3 /tmp/get-pip.py
RUN pip3 install botocore || /usr/local/bin/pip3 install botocore

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly recommend to do these all in a single RUN command to improve the layer caching in docker. Also you should remove the get-pip.py file after installing it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, yeah we should follow the RUN x && \ y pattern and clean stuff up like yum cache, the size went up a lot. issue here: #481

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants