Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning cluster with a cloudformation custom resource FAILS on warnings. #6407

Open
snemir2 opened this issue Aug 21, 2024 · 2 comments
Labels

Comments

@snemir2
Copy link

snemir2 commented Aug 21, 2024

Required Info:

  • AWS ParallelCluster version [e.g. 3.1.1]: 3.9.3
  • Full cluster configuration without any credentials or personal data.

Bug description and how to reproduce:
Configure a cluster without ssh key, or with ad/slurm db secrets, etc.

We have a working cluster configuration (can be created with pcluster, but throws warnings about custom ami (tags) and NOT having ssh keys.

However, when the same configuration is fed to the cloudformation cust resource, the cluster creation fails.
This is what makes it to the lambda logs

[DEBUG]	2024-08-21T18:16:40.433Z	055d99f4-71af-426e-ae04-149ed2cd0664	{
    "Status": "FAILED",
    "PhysicalResourceId": "PCCustomeResorucev393POC_PclusterCluster_ROHRJ80V",
    "StackId": "arn:aws:cloudformation:us-east-2:654225707598:stack/PCCustomeResorucev393POC/e69c0250-5fe8-11ef-a3d1-062b7185899d",
    "RequestId": "cbdf6360-209a-4b1b-83f9-7c4aa7815555",
    "LogicalResourceId": "PclusterCluster",
    "Reason": "c-PCCustomeResorucev393POC: ResourceCreationFailure (LogGroup: /aws/lambda/pcluster-cfn-eb556bb0-5fe8-11ef-9563-063a67fe5cf5)",
    "Data": {
        "validationMessages": "[{\"level\": \"WARNING\", \"type\": \"CustomAmiTagValidator\", \"message\": \"The custom AMI may not have been created by pcluster. You can ignore this warning if the AMI is shared or copied from another pcluster AMI. If the AMI is indeed not created by pcluster, cluster creation will fail. If the cluster creation fails, please go to https://docs.aws.amazon.com/parallelcluster/latest/ug/troubleshooting.html#troubleshooting-stack-creation-failures for troubleshooting.\"}, {\"level\": \"WARNING\", \"type\": \"AmiOsCompatibleValidator\", \"message\": \"Could not check node AMI ami-0f1e16e791ca50af0 OS and cluster OS ubuntu2204 compatibility, please make sure they are compatible before cluster creation and update operations.\"}, {\"level\": \"WARNING\", \"type\": \"PasswordSecretArnValidator\", \"message\": \"Cannot validate secret arn:aws:secretsmanager:us-east-2:654225707598:secret:/a2ai/a2ai-cloud/SlurmAccountingDB-admin-password-7yiQip due to lack of permissions. Please refer to ParallelCluster official documentation for more information.\"}, {\"level\": \"WARNING\", \"type\": \"PasswordSecretArnValidator\", \"message\": \"Cannot validate secret arn:aws:secretsmanager:us-east-2:654225707598:secret:/a2ai/a2ai-cloud/ad_bind_user_password-0oxi9e due to lack of permissions. Please refer to ParallelCluster official documentation for more information.\"}, {\"level\": \"WARNING\", \"type\": \"KeyPairValidator\", \"message\": \"If you do not specify a key pair, you can't connect to the instance unless you choose an AMI that is configured to allow users another way to log in\"}]",
        "clusterName": "c-PCCustomeResorucev393POC",
        "cloudformationStackStatus": "CREATE_IN_PROGRESS",
        "cloudformationStackArn": "arn:aws:cloudformation:us-east-2:654225707598:stack/c-PCCustomeResorucev393POC/37ad55e0-5fe9-11ef-a822-06ddf3cbf353",
        "region": "us-east-2",
        "version": "3.9.3",
        "clusterStatus": "CREATE_IN_PROGRESS"
    }
}

This probably should be a warning, not a fatal provisioning issue. Pcluster proceeds with with creating the cluster with warnings... but the cloudformation fails.

@snemir2 snemir2 added the 3.x label Aug 21, 2024
@dreambeyondorange
Copy link
Contributor

Can you also share the cloudformation logs that show what the error was?

@himani2411
Copy link
Contributor

We have added support for Suppressing Validators as part of Custom Resource in v3.10.0
https://github.com/aws/aws-parallelcluster/releases/tag/v3.10.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants