Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Support of managing subnets on AWS Wavelength Zones #4901

Merged
merged 5 commits into from
Apr 26, 2024

Conversation

mtulio
Copy link
Contributor

@mtulio mtulio commented Apr 3, 2024

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR implements support of managed subnets and carrier gateway for AWS Wavelength zones. Feature request #4874 .

There API is changed to introduce the following fields:

  • VPCSpec.CarrierGatewayID: representation of Carrier Gateway resource ("internet gateway" for AWS Wavelength Zones)
  • SubnetSpec.ZoneType: representation of subnet's zone type
  • SubnetSpec.ParentZoneName: representation of subnet's parent zone name (an availability zone in the Region which the edge zone is tied)

The subnets in AWS Local Zones and Wavelength Zones are not eligible to create core components for the cluster, like NAT Gateway, Control Plane nodes, and Network Load Balancers, keeping compatibility with existing flow when edge subnets are added.

To create subnets in edge zones, the subnet must be added for each zone you want to create the subnet in NetworkSpec.Subnets. For example to create subnets in Wavelength Zone us-east-1-wl1-nyc-wlz-1, set:

apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSCluster
metadata:
  name: aws-cluster-wavelength
spec:
  region: us-east-1
  networkSpec:
    vpc:
      cidrBlock: "10.0.0.0/20"
    subnets:
    # regular zones (availability zones. Eg us-east-1a, us-east-1b...)
    [...]
    # Subnets in Wavelength Zones of New York location (public and private)
    - availabilityZone: us-east-1-wl1-nyc-wlz-1
      cidrBlock: "10.0.128.0/25"
      id: "cluster-subnet-private-us-east-1-wl1-nyc-wlz-1"
      isPublic: false
    - availabilityZone: us-east-1-wl1-nyc-wlz-1
      cidrBlock: "10.0.128.128/25"
      id: "cluster-subnet-public-us-east-1-wl1-nyc-wlz-1"
      isPublic: true

This PR is a super set and includes it's dependencies, isolated on each PRs (and it is blocked by those):

Which issue(s) this PR fixes *

Ref #4874

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emojis
  • adds unit tests
  • adds or updates e2e tests

Release note:

action required
Support deploying network requirements, subnets and carrier gateway, in AWS Wavelength Zones. This introduces new required IAM permissions. If you have an existing stack you will need to update it with clusterawsadm bootstrap iam create-cloudformation-stack

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 3, 2024
@k8s-ci-robot k8s-ci-robot added needs-priority needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 3, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @mtulio. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

api/v1beta2/network_types.go Outdated Show resolved Hide resolved
@mtulio mtulio force-pushed the CORS-3214-wavelength-zones branch 2 times, most recently from d545d8d to edeba21 Compare April 4, 2024 19:30
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 4, 2024
@mtulio mtulio force-pushed the CORS-3214-wavelength-zones branch 3 times, most recently from 939ef72 to 73f5db7 Compare April 8, 2024 14:53
@rvanderp3
Copy link

/test pull-cluster-api-provider-aws-test

@mtulio mtulio force-pushed the CORS-3214-wavelength-zones branch from 73f5db7 to 2b0fe5b Compare April 8, 2024 17:39
@rvanderp3
Copy link

/test pull-cluster-api-provider-aws-build

@rvanderp3
Copy link

/test pull-cluster-api-provider-aws-test

@mtulio mtulio force-pushed the CORS-3214-wavelength-zones branch from 1b9c633 to 0e6b4dc Compare April 23, 2024 20:11
@mtulio
Copy link
Contributor Author

mtulio commented Apr 23, 2024

@richardcase , feedback addressed. 👍🏽

@mtulio
Copy link
Contributor Author

mtulio commented Apr 23, 2024

e2e finished successfully in the first run after rebase. Triggering again after adding cloudformation permissions.

/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-eks

/assign @richardcase

@mtulio
Copy link
Contributor Author

mtulio commented Apr 24, 2024

e2e job 1782910662712758272 failed, looks like timeout.

/test pull-cluster-api-provider-aws-e2e

@richardcase
Copy link
Member

e2e job 1782910662712758272 failed, looks like timeout.

It passed on a re-run

@mtulio
Copy link
Contributor Author

mtulio commented Apr 24, 2024

e2e job 1782910662712758272 failed, looks like timeout.

It passed on a re-run

🎉 Perfect! Thanks!

@richardcase Let me know if you are ok with the current changes.

My plan to fully address the #4874, aligned w/ we discussed in last community meeting, is to create individual e2e tests for Local Zones and Wavelength Zones (or mixed to save infra/runtime) in follow up PRs. cc @damdo @nrb

Create a dedicated document, "topic", with instructions to deploy
network infrastructure (subnets, gateways and route tables) in "edge
zones" - Local Zone and Wavelength Zone infrastructure.
    This change introduce support of required network components to deploy
    subnets on AWS Wavelength Zones infrastructure.

    The NetworkSpec API handles the CarrierGatewayId on NetworkSpec with
    the unique identifier of Carrier Gateway resource attached to the VPC.
    Subnets in AWS Wavelength Zone is a classified as a type of
    edge subnets, not used to create regular control plane resources, like
    nodes, NAT Gateways or API Load Balancers.

    The ZoneType is used to group the zones from regular and the edge zones.
    Regular zones are with type 'availability-zone', and the edge zones are
    types 'local-zone' and 'wavelength-zone'.

    The following statements are valid for edge subnets:
    - private subnets supports egress traffic only using NAT Gateway in the
      region.
    - public subnets in Wavelength must be attached to a route table with
      valid Carrier Gateway as a default route.
    - public subnets in Wavelength zones does not support map public ip on
      launch flag, instead, the runInstance must set the network interface
      flag to assign public ip from carrier gateway
    - IPv6 subnets is not supported in edge zones
    - subnet tags for load balancer are not set in edge subnets. Edge
      subnets should not be elected by CCM to create service load balancers.
      Use ALB ingress instead
✨ edge subnets/cagw: carrier gateway for public subnets in Wavelength

Introduce Carrier Gateway resource reconciliator in the network service.

Carrier Gateway is the gateway responsible to route ingress and egress
traffic **in/out the Wavelength Zone**, located in the Carrier
Infrastructure - communications service providers’ (CSP) 5G networks.

Carrier Gateway is similar Internet Gatewat resource, responsible for
the network border groups in the Region and Local Zones for public
subnets.

✨ edge subnets/routes: supporting custom routes for Wavelength

For private and public subnets in edge zones, the following changes is
introduced according to each rule:

General:

- IPv6 subnets is not be supported in AWS Local Zones and Wavelength
  zone, consequently no ip6 routes will be created
- nat gateways is not supported, default gateway's route for private
  subnets will use nat gateways from the zones in the Region
(availability-zone's zone type)
- one route table by zone's role by zone (standard flow)

Private tables for Local Zones and Wavelength:
- default route's gateways is assigned using nat gateway created in
  the region (availability-zones).

Public tables for Wavelength zones:
- default route's gateways is assigned using Carrier Gateway, resource
  introduced in the edge zone's feature.

The changes in the standard flow (without edge subnets' support) was
isolated in the PR kubernetes-sigs#4900
Add IAM policy on cloudformation templates for clusterawsadm to
manipulate gateways in Wavelength zone: carrier gateway.
@mtulio mtulio force-pushed the CORS-3214-wavelength-zones branch from 0e6b4dc to 2270604 Compare April 24, 2024 18:42
@mtulio
Copy link
Contributor Author

mtulio commented Apr 24, 2024

Just rebased to get the most recent fixes from main - used in external projects to trigger e2e. cc @nrb @damdo

@mtulio
Copy link
Contributor Author

mtulio commented Apr 25, 2024

@richardcase Let me know if you are ok with the current changes.

My plan to fully address the #4874, aligned w/ we discussed in last community meeting, is to create individual e2e tests for Local Zones and Wavelength Zones (or mixed to save infra/runtime) in follow up PRs. cc @damdo @nrb

Hi @richardcase - let me share the e2e tests we are running on OpenShift using this version and a few details about how the e2e for edge zones is built to hear from you if it makes sense to use a similar approach on CAPA.

e2e results on OpenShift using this PR/CAPA version:

e2e details - the "edge subnets" CI workflow is built on openshift using the flow similar to this:

  • pre
    • one edge zone is randomly selected from DescribeAvailabilityZones(AllZones==true)
    • the zone attribute OptInStatus is checked, and if the status is not-opted-in the zone's zone group is modified to opted-in
    • the zone is ready to create subnets
  • if managed: the SubnetSpec will be created setting the public and private subnets for each regular zone, and the selected edge zone
  • if un-managed/BYO VPC: the required network components are created for unmanaged/BYO net (VPC, subnets, rtbs, gateways, etc), and the SubnetSpec will be created with subnet Ids, and respective zones
  • CAPA install
  • e2e run
  • Tear down:
    • cluster destroyed
    • if un-managed/BYO VPC, the cloudformation stacks will be deleted

Note 0: the edge zone is randomly selected to increase coverage and dynamically test new zones while AWS is adding it.
Note 1: the zone group isn't modified (opted out) to the old state as AWS requires to open a ticket to do it. There is no additional cost to keep those enabled or create subnets and carrier gateways.
Note 2: Considering "edge subnets" feature can include Local Zones (LZ) and Wavelength Zones (WL), we also created a single workflow randomly selecting one zone LZ and WL each to test creating a cluster and running e2e in a single flow, saving test time and infra costs.

cc @nrb @r4f4 @damdo

@nrb
Copy link
Contributor

nrb commented Apr 25, 2024

Thanks for the details on how OpenShift is testing this! It's good to know there's downstream testing of this feature, though of course we'd like to see similar tests added to this repo as well.

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 25, 2024
@richardcase
Copy link
Member

@mtulio - just to let you know that i updated the "release note" section to mark it as action required.

@richardcase
Copy link
Member

/test pull-cluster-api-provider-aws-e2e-eks

@richardcase
Copy link
Member

From my side:

/approve

When the e2e passes then we can unhold to merge.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: richardcase

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 26, 2024
@nrb
Copy link
Contributor

nrb commented Apr 26, 2024

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 26, 2024
@k8s-ci-robot k8s-ci-robot merged commit ac49580 into kubernetes-sigs:main Apr 26, 2024
24 checks passed
@mtulio mtulio deleted the CORS-3214-wavelength-zones branch April 26, 2024 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants