Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route53’s CrossAccountZoneDelegationRecord is broken in v2.87+ for opt-in region to non-opt-in region #26593

Closed
chuanconggao opened this issue Aug 1, 2023 · 4 comments · Fixed by #26666
Labels
aws-cdk-lib Related to the aws-cdk-lib package bug This issue is a bug. effort/medium Medium work item – several days of effort node18-upgrade Any work (bug, feature) related to Node 18 upgrade p1 sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only.

Comments

@chuanconggao
Copy link

chuanconggao commented Aug 1, 2023

Describe the bug

Route53’s CrossAccountZoneDelegationRecord is broken in v2.87+, when source (child) domain is in opt-in region and target (parent) domain is not, with access denied from STS when CDK bundled custom resource (in source account) assumes specified role (in target account). Role's trust policy is based on account ID.

It seems the underlaying switch to JS SDK V3 is causing the issue.

Downgrading to v2.86 fixes the issue.

This is blocking our production and likely would affect many others. Please fix ASAP.

Related to #26562 and #26325

Expected Behavior

Able to deploy when source domain is in opt-in region and target domain is not, as this is existing behavior and the role pattern is supported by IAM.

Current Behavior

Fails to deploy with access denied from STS.

Reproduction Steps

As described in issue.

Possible Solution

Need to revert to previous behavior (when using JS SDK V2) by setting credentials region and/or STS endpoint against target region when assuming role

Additional Information/Context

No response

CDK CLI Version

2.88

Framework Version

No response

Node.js Version

18

OS

macOS

Language

Typescript

Language Version

No response

Other information

No response

@chuanconggao chuanconggao added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 1, 2023
@github-actions github-actions bot added the aws-cdk-lib Related to the aws-cdk-lib package label Aug 1, 2023
@chuanconggao chuanconggao changed the title Route53’s CrossAccountZoneDelegationRecord is broken in v2.87+ Route53’s CrossAccountZoneDelegationRecord is broken in v2.87+ for opt-in region to non-opt-in region Aug 2, 2023
@pahud pahud added p1 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2023
@chuanconggao
Copy link
Author

chuanconggao commented Aug 5, 2023

Is there a timeline to fix? It is affecting our production and we cannot easily downgrade as we are also using other features from newer versions.

Why not just revert this one to old code on JS SDK V2 while figuring out working code on JS SDK V3?

MrArnoldPalmer added a commit that referenced this issue Aug 8, 2023
Fixes the cross region zone delegation custom resource in route53 by
reverting to using the v2 aws-sdk. This requires reverting the runtime
to use Node16 until we figure out the correct usage of the v3 sdk.

At this point I suspect it is an issue with the SDK but I have been
unable to reliably reproduce the bug, regardless of the parent or child
stack being in an opt-in region and vice versa

Fixes: #26593
@MrArnoldPalmer
Copy link
Contributor

@chuanconggao thanks for your patience. Are you able to provide a reproduction? I haven't able to reliably repro this which is annoying because I have at times gotten the access denied error, but then redeploying the same stacks I wont.

This is the stack I've been testing with. I've tested with both the child and the parent stack in an opt-in region, while the other is not, I've also tested both when UseRegionalStsEndpoint enabled and disabled.

Click Me
import * as iam from 'aws-cdk-lib/aws-iam';
import * as cdk from 'aws-cdk-lib';
import * as route53 from 'aws-cdk-lib/aws-route53';

const app = new cdk.App();

app.node.setContext('@aws-cdk/aws-route53:useRegionalStsEndpoint', false);
const stack = new cdk.Stack(app, 'aws-cdk-route53-cross-region-integ', {
  env: { region: 'us-east-1' },
  // env: { region: 'il-central-1' },
});

const parentZone = new route53.PublicHostedZone(stack, 'HostedZone', {
  zoneName: 'someexample.com',
});

const roleName = 'MyDelegationRole';
const crossAccountRole = new iam.Role(stack, 'CrossAccountRole', {
  roleName,
  assumedBy: new iam.AccountPrincipal(cdk.Aws.ACCOUNT_ID),
});

parentZone.grantDelegation(crossAccountRole);

const referringStack = new cdk.Stack(app, 'aws-cdk-route53-cross-region-integ-child-stack', {
  env: { region: 'il-central-1' },
  // env: { region: 'us-east-1' },
});

const subZone = new route53.PublicHostedZone(referringStack, 'SubZone', {
  zoneName: 'sub.someexample.com',
});

const delegationRoleArn = cdk.Stack.of(referringStack).formatArn({
  region: '',
  service: 'iam',
  account: cdk.Aws.ACCOUNT_ID,
  resource: 'role',
  resourceName: roleName,
});
const delegationRole = iam.Role.fromRoleArn(referringStack, 'DelegationRole', delegationRoleArn);

new route53.CrossAccountZoneDelegationRecord(referringStack, 'delegate', {
  delegatedZone: subZone
  parentHostedZoneName: 'someexample.com',
  delegationRole,
});

// Use explicit dependency since we don't want to test cross region reference CR in this test
referringStack.addDependency(stack);
app.synth();

At this point I suspect it's something similar to this bug aws/aws-sdk-js-v3#2958, where using the regional endpoint is not the issue, but the way the region with which the client is initiated. Writing our own credential provider and calling STS assume role ourselves may be an option, but need to repro first so we can make sure that works.

Rollback in the meantime

@chuanconggao
Copy link
Author

I have been able to reproduce multiple times in af-south-1 and il-central-1.

My setup is like below:

  • No deployment issue for parent account in us-east-1 with parent hosted zone and role for delegation
  • Deployment failure for child account in opt-in region (like af-south-1) with child hosted zone and delegation

The error message in child account's CFN is like below (with certain info redacted):

Received response status [FAILED] from custom resource. Message returned: AccessDenied: User: arn:aws:sts::{CHILD_ACCOUNT_ID}:assumed-role/***-CustomCrossAccountZoneDel-***/***-CustomCrossAccountZoneDel-*** is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::{PARENT_ACCOUNT_ID}:role/*** at throwDefaultError (/var/runtime/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8:22) at deserializeAws_queryAssumeRoleCommandError (/var/runtime/node_modules/@aws-sdk/client-sts/dist-cjs/protocols/Aws_query.js:148:51) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /var/runtime/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24 at async /var/runtime/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:13:20 at async StandardRetryStrategy.retry (/var/runtime/node_modules/@aws-sdk/middleware-retry/dist-cjs/StandardRetryStrategy.js:51:46) at async /var/runtime/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:6:22 at async /var/runtime/node_modules/@aws-sdk/credential-providers/dist-cjs/fromTemporaryCredentials.js:24:33 at async coalesceProvider (/var/runtime/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:14:24) at async SignatureV4.credentialProvider (/var/runtime/node_modules/@aws-sdk/property-provider/dist-cjs/memoize.js:33:24) (RequestId: ***)

@mergify mergify bot closed this as completed in #26666 Aug 8, 2023
mergify bot pushed a commit that referenced this issue Aug 8, 2023
Fixes the cross region zone delegation custom resource in route53 by
reverting to using the v2 aws-sdk. This requires reverting the runtime
to use Node16 until we figure out the correct usage of the v3 sdk.

At this point I suspect it is an issue with the SDK but I have been
unable to reliably reproduce the bug, regardless of the parent or child
stack being in an opt-in region and vice versa

Fixes: #26593

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

github-actions bot commented Aug 8, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@udaypant udaypant added the node18-upgrade Any work (bug, feature) related to Node 18 upgrade label Aug 30, 2023
@udaypant udaypant added the sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only. label Sep 1, 2023
mergify bot pushed a commit that referenced this issue Sep 5, 2023
…16 (#26980)

Updates the `CrossAccountZoneDelegationRecord` construct to use sdk v3 / node 18. 

This is identical to changes in #26212, except for hardcoding a region into the `assumeRole` sdk call. This may not be the ideal solution, but should not break specific configurations.

That specific configuration, as #26593 pointed out, was that the original update was a breaking change if the construct was deployed into an opt-in region, and the parent zone did not have that opt-in region enabled.

This PR removes the semi-hidden `@aws-cdk/aws-route53:useRegionalStsEndpoint` feature flag, as it was based on a confusion on why things used to work. We now pick the correct endpoint manually.

Closes #26976.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mikewrighton pushed a commit that referenced this issue Sep 14, 2023
…16 (#26980)

Updates the `CrossAccountZoneDelegationRecord` construct to use sdk v3 / node 18. 

This is identical to changes in #26212, except for hardcoding a region into the `assumeRole` sdk call. This may not be the ideal solution, but should not break specific configurations.

That specific configuration, as #26593 pointed out, was that the original update was a breaking change if the construct was deployed into an opt-in region, and the parent zone did not have that opt-in region enabled.

This PR removes the semi-hidden `@aws-cdk/aws-route53:useRegionalStsEndpoint` feature flag, as it was based on a confusion on why things used to work. We now pick the correct endpoint manually.

Closes #26976.

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-cdk-lib Related to the aws-cdk-lib package bug This issue is a bug. effort/medium Medium work item – several days of effort node18-upgrade Any work (bug, feature) related to Node 18 upgrade p1 sdk-v3-upgrade Tag issues that are associated to SDK V3 upgrade. Not limited to CR usage of SDK only.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants