Skip to content

Commit

Permalink
feat(batch): fargate support for jobs (#15848)
Browse files Browse the repository at this point in the history
Added Fargate support for Batch jobs.

Note: this is not entirely my work - most of it was done by @kokachev. It is an updated version of Fargate support for batch jobs based on the feedback left in #13591.

- Doc fixes
- Integration test addition
- Network configuration for Fargate
- Support `ResourceRequirements` for Fargate jobs
- Other minor fixes revealed by integration test

closes: #13590, #13591
----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
  • Loading branch information
DDynamic committed Sep 12, 2021
1 parent a732078 commit 066bcb1
Show file tree
Hide file tree
Showing 10 changed files with 1,168 additions and 211 deletions.
17 changes: 16 additions & 1 deletion packages/@aws-cdk/aws-batch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ For more information on **AWS Batch** visit the [AWS Docs for Batch](https://doc

## Compute Environment

At the core of AWS Batch is the compute environment. All batch jobs are processed within a compute environment, which uses resource like OnDemand or Spot EC2 instances.
At the core of AWS Batch is the compute environment. All batch jobs are processed within a compute environment, which uses resource like OnDemand/Spot EC2 instances or Fargate.

In **MANAGED** mode, AWS will handle the provisioning of compute resources to accommodate the demand. Otherwise, in **UNMANAGED** mode, you will need to manage the provisioning of those resources.

Expand Down Expand Up @@ -74,6 +74,21 @@ const spotEnvironment = new batch.ComputeEnvironment(stack, 'MySpotEnvironment',
});
```

### Fargate Compute Environment

It is possible to have AWS Batch submit jobs to be run on Fargate compute resources. Below is an example of how this can be done:

```ts
const vpc = new ec2.Vpc(this, 'VPC');

const fargateSpotEnvironment = new batch.ComputeEnvironment(stack, 'MyFargateEnvironment', {
computeResources: {
type: batch.ComputeResourceType.FARGATE_SPOT,
vpc,
},
});
```

### Understanding Progressive Allocation Strategies

AWS Batch uses an [allocation strategy](https://docs.aws.amazon.com/batch/latest/userguide/allocation-strategies.html) to determine what compute resource will efficiently handle incoming job requests. By default, **BEST_FIT** will pick an available compute instance based on vCPU requirements. If none exist, the job will wait until resources become available. However, with this strategy, you may have jobs waiting in the queue unnecessarily despite having more powerful instances available. Below is an example of how that situation might look like:
Expand Down
170 changes: 124 additions & 46 deletions packages/@aws-cdk/aws-batch/lib/compute-environment.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import { CfnComputeEnvironment } from './batch.generated';

/**
* Property to specify if the compute environment
* uses On-Demand or SpotFleet compute resources.
* uses On-Demand, SpotFleet, Fargate, or Fargate Spot compute resources.
*/
export enum ComputeResourceType {
/**
Expand All @@ -18,6 +18,20 @@ export enum ComputeResourceType {
* Resources will be EC2 SpotFleet resources.
*/
SPOT = 'SPOT',

/**
* Resources will be Fargate resources.
*/
FARGATE = 'FARGATE',

/**
* Resources will be Fargate Spot resources.
*
* Fargate Spot uses spare capacity in the AWS cloud to run your fault-tolerant,
* time-flexible jobs at up to a 70% discount. If AWS needs the resources back,
* jobs running on Fargate Spot will be interrupted with two minutes of notification.
*/
FARGATE_SPOT = 'FARGATE_SPOT',
}

/**
Expand Down Expand Up @@ -135,7 +149,7 @@ export interface ComputeResources {
readonly vpcSubnets?: ec2.SubnetSelection;

/**
* The type of compute environment: ON_DEMAND or SPOT.
* The type of compute environment: ON_DEMAND, SPOT, FARGATE, or FARGATE_SPOT.
*
* @default ON_DEMAND
*/
Expand Down Expand Up @@ -340,44 +354,49 @@ export class ComputeEnvironment extends Resource implements IComputeEnvironment
physicalName: props.computeEnvironmentName,
});

this.validateProps(props);
const isFargate = ComputeResourceType.FARGATE === props.computeResources?.type
|| ComputeResourceType.FARGATE_SPOT === props.computeResources?.type;;

this.validateProps(props, isFargate);

const spotFleetRole = this.getSpotFleetRole(props);
let computeResources: CfnComputeEnvironment.ComputeResourcesProperty | undefined;

// Only allow compute resources to be set when using MANAGED type
if (props.computeResources && this.isManaged(props)) {
computeResources = {
allocationStrategy: props.computeResources.allocationStrategy
|| (
props.computeResources.type === ComputeResourceType.SPOT
? AllocationStrategy.SPOT_CAPACITY_OPTIMIZED
: AllocationStrategy.BEST_FIT
),
bidPercentage: props.computeResources.bidPercentage,
desiredvCpus: props.computeResources.desiredvCpus,
ec2KeyPair: props.computeResources.ec2KeyPair,
imageId: props.computeResources.image && props.computeResources.image.getImage(this).imageId,
instanceRole: props.computeResources.instanceRole
? props.computeResources.instanceRole
: new iam.CfnInstanceProfile(this, 'Instance-Profile', {
roles: [new iam.Role(this, 'Ecs-Instance-Role', {
assumedBy: new iam.ServicePrincipal('ec2.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role'),
],
}).roleName],
}).attrArn,
instanceTypes: this.buildInstanceTypes(props.computeResources.instanceTypes),
launchTemplate: props.computeResources.launchTemplate,
maxvCpus: props.computeResources.maxvCpus || 256,
minvCpus: props.computeResources.minvCpus || 0,
placementGroup: props.computeResources.placementGroup,
securityGroupIds: this.buildSecurityGroupIds(props.computeResources.vpc, props.computeResources.securityGroups),
spotIamFleetRole: spotFleetRole?.roleArn,
subnets: props.computeResources.vpc.selectSubnets(props.computeResources.vpcSubnets).subnetIds,
tags: props.computeResources.computeResourcesTags,
type: props.computeResources.type || ComputeResourceType.ON_DEMAND,
...(!isFargate ? {
allocationStrategy: props.computeResources.allocationStrategy
|| (
props.computeResources.type === ComputeResourceType.SPOT
? AllocationStrategy.SPOT_CAPACITY_OPTIMIZED
: AllocationStrategy.BEST_FIT
),
instanceRole: props.computeResources.instanceRole
? props.computeResources.instanceRole
: new iam.CfnInstanceProfile(this, 'Instance-Profile', {
roles: [new iam.Role(this, 'Ecs-Instance-Role', {
assumedBy: new iam.ServicePrincipal('ec2.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role'),
],
}).roleName],
}).attrArn,
instanceTypes: this.buildInstanceTypes(props.computeResources.instanceTypes),
minvCpus: props.computeResources.minvCpus || 0,
} : {}),
};
}

Expand Down Expand Up @@ -414,7 +433,7 @@ export class ComputeEnvironment extends Resource implements IComputeEnvironment
/**
* Validates the properties provided for a new batch compute environment.
*/
private validateProps(props: ComputeEnvironmentProps) {
private validateProps(props: ComputeEnvironmentProps, isFargate: boolean) {
if (props === undefined) {
return;
}
Expand All @@ -427,41 +446,100 @@ export class ComputeEnvironment extends Resource implements IComputeEnvironment
throw new Error('computeResources is missing but required on a managed compute environment');
}

// Setting a bid percentage is only allowed on SPOT resources +
// Cannot use SPOT_CAPACITY_OPTIMIZED when using ON_DEMAND
if (props.computeResources) {
if (props.computeResources.type === ComputeResourceType.ON_DEMAND) {
// VALIDATE FOR ON_DEMAND
if (isFargate) {
// VALIDATE FOR FARGATE

// Bid percentage is not allowed
// Bid percentage cannot be set for Fargate evnvironments
if (props.computeResources.bidPercentage !== undefined) {
throw new Error('Setting the bid percentage is only allowed for SPOT type resources on a batch compute environment');
throw new Error('Bid percentage must not be set for Fargate compute environments');
}

// SPOT_CAPACITY_OPTIMIZED allocation is not allowed
if (props.computeResources.allocationStrategy && props.computeResources.allocationStrategy === AllocationStrategy.SPOT_CAPACITY_OPTIMIZED) {
throw new Error('The SPOT_CAPACITY_OPTIMIZED allocation strategy is only allowed if the environment is a SPOT type compute environment');
// Allocation strategy cannot be set for Fargate evnvironments
if (props.computeResources.allocationStrategy !== undefined) {
throw new Error('Allocation strategy must not be set for Fargate compute environments');
}
} else {
// VALIDATE FOR SPOT

// Bid percentage must be from 0 - 100
if (props.computeResources.bidPercentage !== undefined &&
(props.computeResources.bidPercentage < 0 || props.computeResources.bidPercentage > 100)) {
throw new Error('Bid percentage can only be a value between 0 and 100');
// Desired vCPUs cannot be set for Fargate evnvironments
if (props.computeResources.desiredvCpus !== undefined) {
throw new Error('Desired vCPUs must not be set for Fargate compute environments');
}
}

if (props.computeResources.minvCpus) {
// minvCpus cannot be less than 0
if (props.computeResources.minvCpus < 0) {
throw new Error('Minimum vCpus for a batch compute environment cannot be less than 0');
// Image ID cannot be set for Fargate evnvironments
if (props.computeResources.image !== undefined) {
throw new Error('Image must not be set for Fargate compute environments');
}

// minvCpus cannot exceed max vCpus
if (props.computeResources.maxvCpus &&
props.computeResources.minvCpus > props.computeResources.maxvCpus) {
throw new Error('Minimum vCpus cannot be greater than the maximum vCpus');
// Instance types cannot be set for Fargate evnvironments
if (props.computeResources.instanceTypes !== undefined) {
throw new Error('Instance types must not be set for Fargate compute environments');
}

// EC2 key pair cannot be set for Fargate evnvironments
if (props.computeResources.ec2KeyPair !== undefined) {
throw new Error('EC2 key pair must not be set for Fargate compute environments');
}

// Instance role cannot be set for Fargate evnvironments
if (props.computeResources.instanceRole !== undefined) {
throw new Error('Instance role must not be set for Fargate compute environments');
}

// Launch template cannot be set for Fargate evnvironments
if (props.computeResources.launchTemplate !== undefined) {
throw new Error('Launch template must not be set for Fargate compute environments');
}

// Min vCPUs cannot be set for Fargate evnvironments
if (props.computeResources.minvCpus !== undefined) {
throw new Error('Min vCPUs must not be set for Fargate compute environments');
}

// Placement group cannot be set for Fargate evnvironments
if (props.computeResources.placementGroup !== undefined) {
throw new Error('Placement group must not be set for Fargate compute environments');
}

// Spot fleet role cannot be set for Fargate evnvironments
if (props.computeResources.spotFleetRole !== undefined) {
throw new Error('Spot fleet role must not be set for Fargate compute environments');
}
} else {
// VALIDATE FOR ON_DEMAND AND SPOT
if (props.computeResources.minvCpus) {
// minvCpus cannot be less than 0
if (props.computeResources.minvCpus < 0) {
throw new Error('Minimum vCpus for a batch compute environment cannot be less than 0');
}

// minvCpus cannot exceed max vCpus
if (props.computeResources.maxvCpus &&
props.computeResources.minvCpus > props.computeResources.maxvCpus) {
throw new Error('Minimum vCpus cannot be greater than the maximum vCpus');
}
}
// Setting a bid percentage is only allowed on SPOT resources +
// Cannot use SPOT_CAPACITY_OPTIMIZED when using ON_DEMAND
if (props.computeResources.type === ComputeResourceType.ON_DEMAND) {
// VALIDATE FOR ON_DEMAND

// Bid percentage is not allowed
if (props.computeResources.bidPercentage !== undefined) {
throw new Error('Setting the bid percentage is only allowed for SPOT type resources on a batch compute environment');
}

// SPOT_CAPACITY_OPTIMIZED allocation is not allowed
if (props.computeResources.allocationStrategy && props.computeResources.allocationStrategy === AllocationStrategy.SPOT_CAPACITY_OPTIMIZED) {
throw new Error('The SPOT_CAPACITY_OPTIMIZED allocation strategy is only allowed if the environment is a SPOT type compute environment');
}
} else if (props.computeResources.type === ComputeResourceType.SPOT) {
// VALIDATE FOR SPOT

// Bid percentage must be from 0 - 100
if (props.computeResources.bidPercentage !== undefined &&
(props.computeResources.bidPercentage < 0 || props.computeResources.bidPercentage > 100)) {
throw new Error('Bid percentage can only be a value between 0 and 100');
}
}
}
}
Expand Down
Loading

0 comments on commit 066bcb1

Please sign in to comment.