Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-s3): bucket policy fails to create when bucket:arn is not yet available #28659

Open
biffgaut opened this issue Jan 10, 2024 · 19 comments
Open
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@biffgaut
Copy link
Contributor

Describe the bug

A dependency issue between S3 Buckets and Bucket Policies in the L2 Bucket class allows the Policy to access the arn of the bucket before it is available, causing the creation of the Bucket Policy to fail. Being a dependency issue, this is an intermittent issue and works correctly the vast majority of the time. When it fails, simply relaunching the stack usually works.

Expected Behavior

The L2 Bucket construct should launch successfully every time.

Current Behavior

testPolicy9D625504

CREATE_FAILED

Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found

Reproduction Steps

I created a simple CDK app with this code:

import * as cdk from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as s3 from 'aws-cdk-lib/aws-s3';

export class BucketPolicyDependencyStack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    new s3.Bucket(this, 'test', {
      removalPolicy: cdk.RemovalPolicy.DESTROY,
      autoDeleteObjects: true
    })
  }
}

I then set up a bash script that launched it 40 times, essentially simultaneously:

export constructs="
// Put any 30 values here, I just used 30 integers
"
for iteration in $constructs; do
  export STACK_NAME=stresstest$iteration
  cdk deploy -o stress$iteration --require-approval never &
done

On 1 of the 30 I saw the error I reference above.

Possible Solution

If I am interpreting the behavior correctly, it seems that adding a Dependency on the Bucket to the BucketPolicy in the L2 Construct would prevent the Policy from trying to access the bucket before it is ready. Perhaps here?

this.policy = new BucketPolicy(this, 'Policy', { bucket: this });

Additional Information/Context

We've seen it in several of our constructs (and newer versions of the CDK than what I cite below for the test above). Someone also mentioned they have seen it in aws-codepipline.

CDK CLI Version

2.108.0

Framework Version

2.108.0

Node.js Version

20.9.0

OS

MacOS Ventura 13.6.3

Language

TypeScript

Language Version

Typescript 5.2.2

Other information

Versions cited are for the test I cited, but it's been seen in other versions as well.

@biffgaut biffgaut added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 10, 2024
@github-actions github-actions bot added the @aws-cdk/aws-s3 Related to Amazon S3 label Jan 10, 2024
@biffgaut biffgaut changed the title (module name): (short issue description) (aws-s3): (bucket policy fails to create when bucket:arn is not yet available) Jan 10, 2024
@biffgaut biffgaut changed the title (aws-s3): (bucket policy fails to create when bucket:arn is not yet available) (aws-s3): bucket policy fails to create when bucket:arn is not yet available Jan 11, 2024
@pahud
Copy link
Contributor

pahud commented Jan 12, 2024

Unfortunately I can't reproduce this for a few attemps

export class Demo extends DemoStack {
	constructor(scope: Construct, id: string, props: StackProps) {
		super(scope, id, props);

			new s3.Bucket(this, 'test', {
				removalPolicy: RemovalPolicy.DESTROY,
				autoDeleteObjects: true

		})
		
	}
}

app.ts

for (let i=0; i<30; i++) {
    new Demo(app, `demo${i}stack`, { env });
}

And I deploy with

npx cdk deploy --all --require-approval never --concurrency 30

I didn't see any error after a few attempts.

Can you try it again?

@pahud pahud added p2 response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Jan 12, 2024
Copy link

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jan 14, 2024
@biffgaut
Copy link
Contributor Author

Working on replicating again. I'm at a loss for a method to recreate it deterministically - it appears to be triggered by the S3 Create Bucket being a bit slow. I'm going to try to set up a test that just keeps repeating the stress test indefinitely, hoping to catch the slower S3 behavior when it occurs.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Jan 15, 2024
@knihit
Copy link

knihit commented Jan 15, 2024

I am facing the exact issue as well. It seems that cloudformation tries to create the bucket policy before the bucket creation is complete. Its inconsistent but saw it a few times in the last 2-3 weeks.

@biffgaut
Copy link
Contributor Author

I was able to recreate it. I set set up an infinite loop that launched and destroyed 40 stacks in an infinite loop. Started the loop at around 11:30 AM and finally saw the issue recur at 11:34 AM. As I said, it is intermittent...

image

@JamesButtress
Copy link

I am also facing a similar issue. Seems to be happen intermittently and started becoming an issue just before Christmas. Note the buckets (and stacks they are in) haven't been changed for a few months, so seems like a fairly new problem.

@biffgaut
Copy link
Contributor Author

Talking to some coworkers, our theory is that the issue is not CDK per se - that a change in CloudFormation led to CloudFormation ceasing to recognize the dependency of the policy on the bucket from the context of the template (I'm running my tests using the generated template rather than the CDK program to confirm this).

If this is the case, then the issue is not necessarily within the CDK - but an update to the S3 Bucket construct to explicitly set the dependency would smooth over the CFN issue.

@kedbirhan
Copy link

kedbirhan commented Jan 16, 2024

i am having the same issue with just creating a bucket with an access policy as well.

 const logBucket = new Bucket(
            this,
            ${config.kitName}-alb-logs-bucket,
            {
                blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
                removalPolicy:RemovalPolicy.DESTROY,
                autoDeleteObjects: true
            }

        )

Unable to retrieve Arn attribute for AWS::S3::Bucket, with error message Bucket not found

@biffgaut
Copy link
Contributor Author

This is confirmed to be a CloudFormation issue. The word from AWS is:

Due to a recent change in internal workflow of CloudFormation, our development teams have identified an issue that can cause this error intermittently. They are currently working on deploying a fix for the same.

So it seems that there's no change to CDK needed, that for the moment we just retry after a failure and it clear up entirely - hopefully soon.

@whennemuth
Copy link

I am seeing this issue myself quite frequently.
As with everyone else who have commented, this is a new behavior that was not occurring before.

I am using the CDK BucketDeployment, which automatically generates a parallel construct containing a lambda function, IAM role and policy. It is the policy that is trying to reference the arn of the bucket with Fn::GetAtt in the synthesized output.
This seems to be failing about 50% if the time.
I can cope with this by retrying the stack creation and cloudformation will simply start where it left off and complete the rest of the way.

biffgaut, can you reference where you found the AWS issue being reported?
This is something I would want to monitor (and possibly bug them about - it's a pain).

Thanks.

@biffgaut
Copy link
Contributor Author

That message was from an internal ticket here at AWS - there isn't any further info available at the moment. I have not seen this issue referenced online anywhere but here, which is shocking to me as it has occurred on several workloads managed by our team so I would assume the impact is bigger than the few people monitoring this issue.

@dale-vendia
Copy link

dale-vendia commented Jan 31, 2024

As an FYI this has happened ~60 times in the last 60 days so @biffgaut you're not alone here.

We are also running into this issue with lambda function roles, I suspect it's not* isolated to bucket policies.

@whennemuth
Copy link

I opened a support ticket with the AWS cloudformation team.
They repeated to me the same thing they did to biffgaut.
They did say this was a high priority issue, so I'd like to think the resolution is imminent.
Support tickets are not allowed to be left open for more than 10 days for known bugs, but the AWS support rep did tell me that I could contact my organizations AWS account rep to ping me when the bug is fixed, or possibly the ticket might remain open until the fix is in because I asked for it to be.
In any event, it looks like I will get notified somehow.
When I do, I'll update this issue.

@shwetajoshi601
Copy link

shwetajoshi601 commented Feb 21, 2024

I am also facing the same problem. It is really annoying as it is hampering deployments.
Has anyone figured out a workaround?

@davidpintotrusst
Copy link

I am also experiencing the same issue.

@abdulkadirdere
Copy link

abdulkadirdere commented Mar 4, 2024

Work Around the Issue for now:
Option 1:

  • Create the S3 Bucket manually
  • Import the S3 bucket to CDK code using cdk import command
  • You can use cdk diff to see any differences between your manually created S3 Bucket and your CDK stack. Change your CDK stack to match that of the Config.
    Source: cdk import

Option 2:

  • create the bucket without any bucket policy options (such as removal_policy, enforce_ssl) using CDK. So create baseline bucket with a bucket name only.
  • Deploy the baseline bucket
  • Add bucket policy options to the CDK code and re-deploy the stack. (you can add any S3 Bucket parameters that don't need to delete and recreate the bucket)
  • This should not remove the bucket but append policy to it. Hence should be able to find the bucket ARN when it is being deployed.

@jshaw-decides
Copy link

Happening again yall...

@jshaw-decides
Copy link

jshaw-decides commented Mar 21, 2024

Hi so if you're running into this issue running a static site out of an s3 bucket via cloudfront you can split the code into 2 stacks for a more reliable CI/CD process.

Bucket Stack:

 /**
     * Content bucket
     */
    new s3.Bucket(this, 'SiteBucket', {
      bucketName: `${buildDomain(props.domainSegments)}`,
      websiteIndexDocument: 'index.html',
      websiteErrorDocument: 'index.html',
      // publicReadAccess: true,
      // autoDeleteObjects: true,
      // accessControl: BucketAccessControl.PUBLIC_READ,
      /**
       * The default removal policy is RETAIN, which means that cdk destroy will not attempt to delete
       * the new bucket, and it will remain in your account until manually deleted. By setting the policy to
       * DESTROY, cdk destroy will attempt to delete the bucket, but will error if the bucket is not empty.
       */
      // removalPolicy: cdk.RemovalPolicy.DESTROY, // NOT recommended for production code
    });

Distro Stack (with domain stuff):

/**
     * Hosted zone
     */
    const zone = route53.HostedZone.fromLookup(this, 'Zone', {
      domainName: props.domainSegments.domain,
    });
    new cdk.CfnOutput(this, 'URL', {
      value: `https://${util.buildDomain(props.domainSegments)}`,
    });

    /**
     * TLS certificate
     */
    const certificate = new acm.Certificate(this, 'Certificate', {
      domainName: `${util.buildDomain(props.domainSegments)}`,
      validation: acm.CertificateValidation.fromDns(zone),
    });

    new cdk.CfnOutput(this, 'CertificateOutput', {
      value: certificate.certificateArn,
    });

    const oai = new cloudfront.OriginAccessIdentity(this, 'OAI');
    const bucket = s3.Bucket.fromBucketName(
      this,
      'StaticSiteBucket',
      `${util.buildDomain(props.domainSegments)}`
    );

    bucket.grantPublicAccess();
    const bucketPolicy = new s3.BucketPolicy(this, 'BucketPolicy', {
      bucket,
    });

    // Grant public access through the bucket policy
    bucketPolicy.document.addStatements(
      new iam.PolicyStatement({
        actions: ['s3:GetObject'],
        resources: [bucket.arnForObjects('*')],
        principals: [
          new iam.CanonicalUserPrincipal(
            oai.cloudFrontOriginAccessIdentityS3CanonicalUserId
          ),
        ],
      })
    );
    new cdk.CfnOutput(this, 'SiteBucketOutput', { value: bucket.bucketName });

    /**
     * Cloudfront OAI
     */

    /**
     * CloudFront distribution that provides HTTPS
     */
    this.distribution = new cloudfront.Distribution(this, 'myDist', {
      defaultRootObject: 'index.html',
      minimumProtocolVersion: cloudfront.SecurityPolicyProtocol.TLS_V1_2_2021,
      defaultBehavior: {
        origin: new cloudfront_origins.S3Origin(bucket, {
          originAccessIdentity: oai,
        }),
        compress: true,
        allowedMethods: cloudfront.AllowedMethods.ALLOW_GET_HEAD_OPTIONS,
        viewerProtocolPolicy: cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
      },
      errorResponses: [
        {
          httpStatus: 403,
          responseHttpStatus: 403,
          responsePagePath: '/index.html',
          ttl: cdk.Duration.minutes(30),
        },
      ],
      domainNames: [`${util.buildDomain(props.domainSegments)}`],
      certificate: certificate,
    });
    new cdk.CfnOutput(this, 'DistributionIdOutput', {
      value: this.distribution.distributionId,
    });

    /**
     * Route53 alias record for the CloudFront distribution
     */
    new route53.ARecord(this, 'SiteAliasRecordOutput', {
      recordName: `${util.buildDomain(props.domainSegments)}`,
      target: route53.RecordTarget.fromAlias(
        new route53_targets.CloudFrontTarget(this.distribution)
      ),
      zone,
    });

    /**
     * Build sources depending on if there are more things that need to be added
     * Take the strings in extraSources and map them to extra sources
     */
    const sources = props.extraSources
      ? [
          ...props.extraSources.map((path) => s3_deployment.Source.asset(path)),
          s3_deployment.Source.asset(props.pathToAssets),
        ]
      : [s3_deployment.Source.asset(props.pathToAssets)];

    /**
     * Automated s3 deployment
     */
    new s3_deployment.BucketDeployment(this, 'DeployWithInvalidation', {
      sources: [...sources],
      destinationBucket: bucket,
      distribution: this.distribution,
      distributionPaths: ['/*'],
    });

Also, pay me.

@billyjbryant
Copy link

Is there any update to this? I am attempting to deploy a bucket and a stackset and the stackset fails because the bucket policy does not finish deploying, despite the policy not being built until after the bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-s3 Related to Amazon S3 bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

No branches or pull requests