Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CloudFormation backend. #803

Merged
merged 23 commits into from
May 3, 2016
Merged

Implement CloudFormation backend. #803

merged 23 commits into from
May 3, 2016

Conversation

ejholmes
Copy link
Contributor

@ejholmes ejholmes commented Apr 28, 2016

Fixes #556
Fixes #684
Probably fixes #665
Probably fixes #770
Can Fix #590 easily
Closes #560
Closes #578
Makes #630 trivial
Makes #706 trivial
Makes #723 trivial
Makes #796 trivial
Makes #797 trivial

WIP so, not everything is implemented (about 80-90%).

The deeper I go into adding the extended Procfile, the more I feel like we're just reimplementing terraform/cloudformation within Empire, which is not what I want to spend my time doing. The only reason we never started with a CloudFormation backend was because CloudFormation didn't support ECS when we built Empire :).

This adds support for a CloudFormation backend, so that AWS resources for an app are managed entirely by CloudFormation. We just pass a scheduler.App to a template and get a CloudFormation stack that represents the application.

There's massive benefits to doing this:

  1. Way quicker to extend with new functionality. We just need to update the stack template, and boom.
  2. Resources will be named better. We should prefix the stack with the app name (and possibly an environment if provided) so that ecs services, load balancers, etc are easier to find.
  3. Cleaning up after ourselves after destroying an application is super simple. Just delete the stack.
  4. We get an audit log of all the stack updates made.

The only disadvantage is we'll need to come up with a simple and safe migration plan (that's easy for other users of Empire as well), since this will require all new ECS/ELB resources for existing apps.

This will make almost everything we do inside Empire vastly simpler as we add features moving forward, so I think it's worth tackling now.

TODO

  • Name stacks based on the app name and environment.
  • Allocate instance ports.
  • Implement updating the existing stack.
  • Tests
  • Upload templates to s3
  • Update aws sdk
  • Store stack name in a transaction.
  • Implement run.
  • Create service role.
  • When deleting, wait for stack to stabilize.
  • Prefix stack name.
  • Store template based on app name.
  • Parameterize scaling.
  • Set environment tag.
  • Update CHANGELOG.
  • Auto-subscribe sqs queue to sns queue
  • Don't start custom resources server if sqs queue isn't set.
  • Error out of cloudformation scheduler is sqs queue isn't set.
  • Error out of template if there's any duplicate process names.

@ejholmes
Copy link
Contributor Author

Regarding migration plans, my initial thoughts would be that we just run the old and new backend in parallel, but only use the new backend for newly created apps. Once we're comfortable with it, we should have a mostly (or entirely) automated way to schedule an existing release using the new scheduler, then destroy the old resources using the old scheduler. In psuedo code, something like:

func Migrate(release) {
  cloudformationScheduler.Submit(release)
  // Wait 30 minutes or something
  oldECSScheduler.Remove(release.App.ID)
}

@@ -301,6 +301,7 @@
{
"Effect": "Allow",
"Action": [
"cloudformation:*",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is ready, need to determine exact permissions that Empire needs (and maybe we can lock this down so it can only access stacks that it created).

subnets = t.ExternalSubnetIDs
}

instancePort := int64(9000) // TODO: Allocate a port
Copy link
Contributor Author

@ejholmes ejholmes Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that is a little more challenging in a CloudFormation world is instance port allocation. Because we no longer have direct control over creating and deleting load balancers, it's hard to allocate and release ports for them.

I was thinking that we could just add a custom resource with a lambda function that allocates and releases ports, so it's just native CloudFormation (and can be re-used outside of Empire).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe better yet, use SNS backed custom resources and have Empire listen on an SQS queue to provision port allocations. Keeps everything more self contained.

@ejholmes
Copy link
Contributor Author

Ok, once #811 is merged, this is feature complete and has all the existing functionality of the old scheduler. My be some minor bugs to fix, but this works beautifully in my tests locally.

Once this is merged, I'll work on an automated migration scheduler that will migrate apps from the old scheduler to the new one.

@ejholmes
Copy link
Contributor Author

Also, it's worth mentioning that this is currently only enabled when the --scheduler flag is explicitly set to cloudformation. The old scheduler will still be the default for the time being.


_, err = s.s3.PutObject(&s3.PutObjectInput{
Bucket: aws.String(s.Bucket),
Key: aws.String(fmt.Sprintf("/%s", key)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have app id, i would include that in this key.

i could see it being useful to view different stack "versions" for a specific app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 84f55e5. The template is now stored in a folder prefixed with the app name and id.

@mwildehahn
Copy link
Contributor

👍 all of this looks awesome

@ejholmes
Copy link
Contributor Author

ejholmes commented May 3, 2016

Ok. Gonna go ahead and merge this in. This won't become the default scheduling backend until there's an easy way to migrate from the old backend to the new one.

My hope is that this becomes the default backend in the next release, with an automated migrator. Then we can remove the old backend, which frees us up to move faster on extended Procfile.

There will probably be some minor bugs to fix in this, but we can deal with those as we see them.

@ejholmes ejholmes merged commit c228d57 into master May 3, 2016
@ejholmes ejholmes deleted the cloudformation branch May 3, 2016 03:38
@dekz
Copy link

dekz commented May 4, 2016

Eric this is so incredibly dope. I totally agree with your assessment with the extended Procfile and am glad Empire didn't go down that route.

@ejholmes
Copy link
Contributor Author

ejholmes commented May 4, 2016

@dekz 🤘 super excited about this as well. Opens up a lot of possibilities for Empire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment