Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Support for Standard LB in k8s v1.11 #3515

Merged
merged 1 commit into from
Aug 15, 2018

Conversation

sozercan
Copy link
Member

@sozercan sozercan commented Jul 19, 2018

What this PR does / why we need it:
Adds support for Standard LB for Kubernetes v1.11

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #3468

Release note:

Support for Standard Load Balancer and Public IP in Kubernetes 1.11

@ghost ghost assigned sozercan Jul 19, 2018
@ghost ghost added the in progress label Jul 19, 2018
@sozercan sozercan force-pushed the loadBalancerSku branch 2 times, most recently from 63e9936 to ee14ae9 Compare July 20, 2018 03:46
@sozercan sozercan changed the title [WIP] Support for Standard LB in k8s v1.11 Support for Standard LB in k8s v1.11 Jul 20, 2018
@feiskyer
Copy link
Member

feiskyer commented Jul 20, 2018

@sozercan Thanks for adding this. We may only apply the options for master nodes, as they're only used in kube-controller-manager or cloud-controller-manager.

@sozercan
Copy link
Member Author

@feiskyer i can add it to provisionScriptParametersMaster only but azure.json is same on master and agents https://github.com/Azure/acs-engine/blob/master/parts/k8s/kubernetescustomscript.sh#L230

@feiskyer
Copy link
Member

Yep, I see. It's also ok to set it in node's configs.

@codecov
Copy link

codecov bot commented Jul 20, 2018

Codecov Report

Merging #3515 into master will decrease coverage by 0.06%.
The diff coverage is 48.27%.

@@            Coverage Diff             @@
##           master    #3515      +/-   ##
==========================================
- Coverage   55.46%   55.39%   -0.07%     
==========================================
  Files         108      108              
  Lines       16069    16098      +29     
==========================================
+ Hits         8912     8918       +6     
- Misses       6389     6413      +24     
+ Partials      768      767       -1

| serviceCidr | no | IP range for Service IPs, Default is "10.0.0.0/16". This range is never routed outside of a node so does not need to lie within clusterSubnet or the VNET |
| useInstanceMetadata | no | Use the Azure cloudprovider instance metadata service for appropriate resource discovery operations. Default is `true` |
| useManagedIdentity | no | Includes and uses MSI identities for all interactions with the Azure Resource Manager (ARM) API. Instead of using a static service principal written to /etc/kubernetes/azure.json, Kubernetes will use a dynamic, time-limited token fetched from the MSI extension running on master and agent nodes. This support is currently alpha and requires Kubernetes v1.9.1 or newer. (boolean - default == false) |
| loadBalancerSku | no | Sku of Load Balancer and Public IP. Candidate values are: `basic` and `standard`. If not set, it will be default to basic. Requires Kubernetes 1.11 or newer. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only standard requires 1.11 correct? "standard requires Kubernetes 1.11 or newer" might be more clear

| useInstanceMetadata | no | Use the Azure cloudprovider instance metadata service for appropriate resource discovery operations. Default is `true` |
| useManagedIdentity | no | Includes and uses MSI identities for all interactions with the Azure Resource Manager (ARM) API. Instead of using a static service principal written to /etc/kubernetes/azure.json, Kubernetes will use a dynamic, time-limited token fetched from the MSI extension running on master and agent nodes. This support is currently alpha and requires Kubernetes v1.9.1 or newer. (boolean - default == false) |
| loadBalancerSku | no | Sku of Load Balancer and Public IP. Candidate values are: `basic` and `standard`. If not set, it will be default to basic. Requires Kubernetes 1.11 or newer. |
| excludeMasterFromStandardLB | no | Excludes master nodes from standard load balancer. Default is `true`. Requires Kubernetes 1.11 or newer. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, which part requires 1.11? If the default is always true then what happens if the user is on k8s 1.8?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it'll just be ignored by k8s v1.10 or earlier (since anything but v1.11 doesn't check for those values). @feiskyer can you confirm?

@@ -65,6 +65,8 @@ $global:KubeDnsSearchPath = "svc.cluster.local"

$global:UseManagedIdentityExtension = "{{WrapAsVariable "useManagedIdentityExtension"}}"
$global:UseInstanceMetadata = "{{WrapAsVariable "useInstanceMetadata"}}"
$global:LoadBalancerSku = "{{WrapAsVariable "loadBalancerSku"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickLang could you please review this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - the changes here look good to me. I don't see any Windows-specific things changing. The only questions from me are whether this needs an azure-cni update and if the cloud provider already supports these in K8s 1.9 - 1.11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code wouldn't need to change, just checking that those are updated if required when this is merged

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PatrickLang k8s v1.11 is required for this, tested this on windows pool and it successfully created standard lb and public ip.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PatrickLang

| useInstanceMetadata | no | Use the Azure cloudprovider instance metadata service for appropriate resource discovery operations. Default is `true` |
| useManagedIdentity | no | Includes and uses MSI identities for all interactions with the Azure Resource Manager (ARM) API. Instead of using a static service principal written to /etc/kubernetes/azure.json, Kubernetes will use a dynamic, time-limited token fetched from the MSI extension running on master and agent nodes. This support is currently alpha and requires Kubernetes v1.9.1 or newer. (boolean - default == false) |
| loadBalancerSku | no | Sku of Load Balancer and Public IP. Candidate values are: `basic` and `standard`. If not set, defaults to `basic`. `standard` requires Kubernetes 1.11 or newer. |
| excludeMasterFromStandardLB | no | Excludes master nodes from standard load balancer. Requires `loadBalancerSku` to be set to `standard` and Kubernetes v1.11 or newer. Default is `true`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this default behavior is counter intuitive since we are always defaulting to true even though that is only supported for k8s 1.11 with standard LB.... which means that we are defaulting to a non-supported option

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we flip around the setting to includeMasterInStandardLB? @feiskyer thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we set excludeMasterFromStandardLB only when it is configured explicitly by the user? The default true is recommended.

includeMasterInStandardLB is actually same with excludeMasterFromStandardLB, just with a different default value. Since it's a k8s configure option, it's better to keep it as is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the default behavior be: if loadBalancerSku == standard => true, Else => false

Copy link
Contributor

@PatrickLang PatrickLang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows change looks good to me. The tests in place do cover ingress to a Windows service so be sure to update the apimodel to test each mode before merge

@@ -65,6 +65,8 @@ $global:KubeDnsSearchPath = "svc.cluster.local"

$global:UseManagedIdentityExtension = "{{WrapAsVariable "useManagedIdentityExtension"}}"
$global:UseInstanceMetadata = "{{WrapAsVariable "useInstanceMetadata"}}"
$global:LoadBalancerSku = "{{WrapAsVariable "loadBalancerSku"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - the changes here look good to me. I don't see any Windows-specific things changing. The only questions from me are whether this needs an azure-cni update and if the cloud provider already supports these in K8s 1.9 - 1.11

@@ -65,6 +65,8 @@ $global:KubeDnsSearchPath = "svc.cluster.local"

$global:UseManagedIdentityExtension = "{{WrapAsVariable "useManagedIdentityExtension"}}"
$global:UseInstanceMetadata = "{{WrapAsVariable "useInstanceMetadata"}}"
$global:LoadBalancerSku = "{{WrapAsVariable "loadBalancerSku"}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code wouldn't need to change, just checking that those are updated if required when this is merged

@PatrickLang
Copy link
Contributor

I looked at the section @CecileRobertMichon recommended, but didn't review the rest

@feiskyer
Copy link
Member

cc @khenidak

@jackfrancis
Copy link
Member

code lgtm, pending rebase and E2E

@sozercan
Copy link
Member Author

@jackfrancis rebased

@khenidak
Copy link
Contributor

The only comment i have is. When zoned are enabled we will have to validate that this sku is set by default.

cc @ritazh @kkmsft

@ritazh
Copy link
Member

ritazh commented Jul 25, 2018

@khenidak yup

}

if a.OrchestratorProfile.KubernetesConfig.ExcludeMasterFromStandardLB == nil {
a.OrchestratorProfile.KubernetesConfig.ExcludeMasterFromStandardLB = helpers.PointerToBool(api.DefaultExcludeMasterFromStandardLB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @CecileRobertMichon suggested, let's convert the default value assignment of ExcludeMasterFromStandardLB to a function. The function should look like this:

  • is this a 1.11 or greater cluster that is using "Standard" LB?
    • if so, set to true (based on comments from @feiskyer)
    • if not, false

@jackfrancis
Copy link
Member

Our E2E tests tell us that ILB scenarios break w/ standard LB-enabled clusters. @feiskyer do you have an upstream bug that this may be related to?

@ritazh
Copy link
Member

ritazh commented Aug 3, 2018

This is a prereq for #3453.

@feiskyer
Copy link
Member

feiskyer commented Aug 5, 2018

Our E2E tests tell us that ILB scenarios break w/ standard LB-enabled clusters. @feiskyer do you have an upstream bug that this may be related to?

Which region are you using? Could you verify whether it's a Standard LB issue (e.g. without k8s and enabling floatingIP)? I have met that before and standard LB with floatingIP enabled may not work on some regions.

@sozercan
Copy link
Member Author

sozercan commented Aug 6, 2018

@feiskyer acs-engine CI failed on southcentralus, westcentralus, southeastasia, westeurope (it failed on all 4 runs)

This is the test that fails which tests working ILB:
https://github.com/Azure/acs-engine/blob/master/test/e2e/kubernetes/kubernetes_test.go#L503
https://github.com/Azure/acs-engine/blob/master/test/e2e/kubernetes/workloads/ingress-nginx-ilb.yaml

@feiskyer
Copy link
Member

feiskyer commented Aug 7, 2018

@sozercan Thanks. So it's a standard ILB issue, which disables external connectivity by default.

@jackfrancis
Copy link
Member

@sozercan @feiskyer What are the next steps here?

@khenidak
Copy link
Contributor

khenidak commented Aug 8, 2018

Hold, we are in talks with Networking team for clarity on what we can do. Should resolve this week

@ritazh
Copy link
Member

ritazh commented Aug 13, 2018

@khenidak @jackfrancis I've added the elb service as a workaround. PTAL

- port: 8765
targetPort: 9376
selector:
app: random
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you keeping this as random?

@@ -53,6 +53,7 @@ Here are the valid values for the orchestrator types:
| gcLowThreshold | no | Sets the --image-gc-low-threshold value on the kublet configuration. Default is 80. [See kubelet Garbage Collection](https://kubernetes.io/docs/concepts/cluster-administration/kubelet-garbage-collection/) |
| kubeletConfig | no | Configure various runtime configuration for kubelet. See `kubeletConfig` [below](#feat-kubelet-config) |
| kubernetesImageBase | no | Specifies the base URL (everything preceding the actual image filename) of the kubernetes hyperkube image to use for cluster deployment, e.g., `k8s.gcr.io/` |
| loadBalancerSku | no | Sku of Load Balancer and Public IP. Candidate values are: `basic` and `standard`. If not set, it will be default to basic. Requires Kubernetes 1.11 or newer. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation for LB outbound rule workaround + what is expected in v1.12

Copy link
Contributor

@khenidak khenidak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing documentation for the outbound rule workaround

@ritazh ritazh force-pushed the loadBalancerSku branch 3 times, most recently from f2806f1 to ad88ed6 Compare August 14, 2018 22:53
@ritazh
Copy link
Member

ritazh commented Aug 14, 2018

@khenidak @jackfrancis @sozercan doc added, rebased, ILB e2e test on jenkins passed, PTAL

@@ -67,6 +67,8 @@
{{end}}
"useManagedIdentityExtension": "{{ UseManagedIdentity }}",
"useInstanceMetadata": "{{ UseInstanceMetadata }}",
"loadBalancerSku": "{{ LoadBalancerSku }}",
"excludeMasterFromStandardLB": "{{ ExcludeMasterFromStandardLB }}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get rid of this variable assignment...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are variables, not parameters.

@@ -132,7 +134,7 @@
"customSearchDomainsScript": "{{GetKubernetesB64CustomSearchDomainsScript}}",
"sshdConfig": "{{GetB64sshdConfig}}",
{{if not IsOpenShift}}
"provisionScriptParametersCommon": "[concat('ADMINUSER=',parameters('linuxAdminUsername'),' ETCD_DOWNLOAD_URL=',parameters('etcdDownloadURLBase'),' ETCD_VERSION=',parameters('etcdVersion'),' DOCKER_ENGINE_VERSION=',parameters('dockerEngineVersion'),' DOCKER_REPO=',parameters('dockerEngineDownloadRepo'),' TENANT_ID=',variables('tenantID'),' HYPERKUBE_URL=',parameters('kubernetesHyperkubeSpec'),' APISERVER_PUBLIC_KEY=',parameters('apiserverCertificate'),' SUBSCRIPTION_ID=',variables('subscriptionId'),' RESOURCE_GROUP=',variables('resourceGroup'),' LOCATION=',variables('location'),' VM_TYPE=',variables('vmType'),' SUBNET=',variables('subnetName'),' NETWORK_SECURITY_GROUP=',variables('nsgName'),' VIRTUAL_NETWORK=',variables('virtualNetworkName'),' VIRTUAL_NETWORK_RESOURCE_GROUP=',variables('virtualNetworkResourceGroupName'),' ROUTE_TABLE=',variables('routeTableName'),' PRIMARY_AVAILABILITY_SET=',variables('primaryAvailabilitySetName'),' PRIMARY_SCALE_SET=',variables('primaryScaleSetName'),' SERVICE_PRINCIPAL_CLIENT_ID=',variables('servicePrincipalClientId'),' SERVICE_PRINCIPAL_CLIENT_SECRET=',variables('singleQuote'),variables('servicePrincipalClientSecret'),variables('singleQuote'),' KUBELET_PRIVATE_KEY=',parameters('clientPrivateKey'),' TARGET_ENVIRONMENT=',parameters('targetEnvironment'),' NETWORK_PLUGIN=',parameters('networkPlugin'),' VNET_CNI_PLUGINS_URL=',parameters('vnetCniLinuxPluginsURL'),' CNI_PLUGINS_URL=',parameters('cniPluginsURL'),' CLOUDPROVIDER_BACKOFF=',parameters('cloudproviderConfig').cloudProviderBackoff,' CLOUDPROVIDER_BACKOFF_RETRIES=',parameters('cloudproviderConfig').cloudProviderBackoffRetries,' CLOUDPROVIDER_BACKOFF_EXPONENT=',parameters('cloudproviderConfig').cloudProviderBackoffExponent,' CLOUDPROVIDER_BACKOFF_DURATION=',parameters('cloudproviderConfig').cloudProviderBackoffDuration,' CLOUDPROVIDER_BACKOFF_JITTER=',parameters('cloudproviderConfig').cloudProviderBackoffJitter,' CLOUDPROVIDER_RATELIMIT=',parameters('cloudproviderConfig').cloudProviderRatelimit,' CLOUDPROVIDER_RATELIMIT_QPS=',parameters('cloudproviderConfig').cloudProviderRatelimitQPS,' CLOUDPROVIDER_RATELIMIT_BUCKET=',parameters('cloudproviderConfig').cloudProviderRatelimitBucket,' USE_MANAGED_IDENTITY_EXTENSION=',variables('useManagedIdentityExtension'),' USE_INSTANCE_METADATA=',variables('useInstanceMetadata'),' CONTAINER_RUNTIME=',parameters('containerRuntime'),' CONTAINERD_DOWNLOAD_URL_BASE=',parameters('containerdDownloadURLBase'),' POD_INFRA_CONTAINER_SPEC=',parameters('kubernetesPodInfraContainerSpec'),' KMS_PROVIDER_VAULT_NAME=',variables('clusterKeyVaultName'))]",
"provisionScriptParametersCommon": "[concat('ADMINUSER=',parameters('linuxAdminUsername'),' ETCD_DOWNLOAD_URL=',parameters('etcdDownloadURLBase'),' ETCD_VERSION=',parameters('etcdVersion'),' DOCKER_ENGINE_VERSION=',parameters('dockerEngineVersion'),' DOCKER_REPO=',parameters('dockerEngineDownloadRepo'),' TENANT_ID=',variables('tenantID'),' HYPERKUBE_URL=',parameters('kubernetesHyperkubeSpec'),' APISERVER_PUBLIC_KEY=',parameters('apiserverCertificate'),' SUBSCRIPTION_ID=',variables('subscriptionId'),' RESOURCE_GROUP=',variables('resourceGroup'),' LOCATION=',variables('location'),' VM_TYPE=',variables('vmType'),' SUBNET=',variables('subnetName'),' NETWORK_SECURITY_GROUP=',variables('nsgName'),' VIRTUAL_NETWORK=',variables('virtualNetworkName'),' VIRTUAL_NETWORK_RESOURCE_GROUP=',variables('virtualNetworkResourceGroupName'),' ROUTE_TABLE=',variables('routeTableName'),' PRIMARY_AVAILABILITY_SET=',variables('primaryAvailabilitySetName'),' PRIMARY_SCALE_SET=',variables('primaryScaleSetName'),' SERVICE_PRINCIPAL_CLIENT_ID=',variables('servicePrincipalClientId'),' SERVICE_PRINCIPAL_CLIENT_SECRET=',variables('singleQuote'),variables('servicePrincipalClientSecret'),variables('singleQuote'),' KUBELET_PRIVATE_KEY=',parameters('clientPrivateKey'),' TARGET_ENVIRONMENT=',parameters('targetEnvironment'),' NETWORK_PLUGIN=',parameters('networkPlugin'),' VNET_CNI_PLUGINS_URL=',parameters('vnetCniLinuxPluginsURL'),' CNI_PLUGINS_URL=',parameters('cniPluginsURL'),' CLOUDPROVIDER_BACKOFF=',parameters('cloudproviderConfig').cloudProviderBackoff,' CLOUDPROVIDER_BACKOFF_RETRIES=',parameters('cloudproviderConfig').cloudProviderBackoffRetries,' CLOUDPROVIDER_BACKOFF_EXPONENT=',parameters('cloudproviderConfig').cloudProviderBackoffExponent,' CLOUDPROVIDER_BACKOFF_DURATION=',parameters('cloudproviderConfig').cloudProviderBackoffDuration,' CLOUDPROVIDER_BACKOFF_JITTER=',parameters('cloudproviderConfig').cloudProviderBackoffJitter,' CLOUDPROVIDER_RATELIMIT=',parameters('cloudproviderConfig').cloudProviderRatelimit,' CLOUDPROVIDER_RATELIMIT_QPS=',parameters('cloudproviderConfig').cloudProviderRatelimitQPS,' CLOUDPROVIDER_RATELIMIT_BUCKET=',parameters('cloudproviderConfig').cloudProviderRatelimitBucket,' USE_MANAGED_IDENTITY_EXTENSION=',variables('useManagedIdentityExtension'),' USE_INSTANCE_METADATA=',variables('useInstanceMetadata'),' LOAD_BALANCER_SKU=',variables('loadBalancerSku'),' EXCLUDE_MASTER_FROM_STANDARD_LB=',variables('excludeMasterFromStandardLB'),' CONTAINER_RUNTIME=',parameters('containerRuntime'),' CONTAINERD_DOWNLOAD_URL_BASE=',parameters('containerdDownloadURLBase'),' POD_INFRA_CONTAINER_SPEC=',parameters('kubernetesPodInfraContainerSpec'),' KMS_PROVIDER_VAULT_NAME=',variables('clusterKeyVaultName'))]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and change to parameters('excludeMasterFromStandardLB')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, these are variables, not parameters.

@@ -65,6 +65,8 @@ $global:KubeDnsSearchPath = "svc.cluster.local"

$global:UseManagedIdentityExtension = "{{WrapAsVariable "useManagedIdentityExtension"}}"
$global:UseInstanceMetadata = "{{WrapAsVariable "useInstanceMetadata"}}"
$global:LoadBalancerSku = "{{WrapAsVariable "loadBalancerSku"}}"
$global:ExcludeMasterFromStandardLB = "{{WrapAsVariable "excludeMasterFromStandardLB"}}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and change to WrapAsParameter "excludeMasterFromStandardLB"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@jackfrancis
Copy link
Member

added a fun comment about superfluous variables and then lgtm!

@khenidak
Copy link
Contributor

/lgtm

@acs-bot
Copy link

acs-bot commented Aug 15, 2018

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: khenidak, sozercan
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jackfrancis

If they are not already assigned, you can assign the PR to them by writing /assign @jackfrancis in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@acs-bot
Copy link

acs-bot commented Aug 15, 2018

@khenidak: changing LGTM is restricted to assignees, and only Azure/acs-engine repo collaborators may be assigned issues.

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jackfrancis
Copy link
Member

@ritazh yeah, nevermind, this works fine for now, thanks!

@jackfrancis jackfrancis merged commit 1ade2f3 into Azure:master Aug 15, 2018
@ghost ghost removed the in progress label Aug 15, 2018
@sozercan sozercan deleted the loadBalancerSku branch August 15, 2018 00:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to configure standard load balancer for k8s
8 participants