[incubator/jaeger] Improve the Jaeger chart #3109

pavelnikolov · 2017-12-19T23:14:31Z

This PR adds the following improvements to the Jaeger chart:

Use ConfigMap for configurable parameters
Add the Hotrod example app
Allow only some of the components to be installed
Add support for the spark dependencies job (as a k8s cronjob)
Use provisionDataStore key in the values.yaml file instead of tags to configure data store provisioning
Refactor chart to remove unnecessary quotes
Remove the command overrides of the docker images and use environment variables configuration instead
Fix hard-coded replica count
Collector service works both as NodePort and ClusterIP service types

k8s-ci-robot · 2017-12-19T23:14:43Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please email the CNCF helpdesk: helpdesk@rt.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

pavelnikolov · 2017-12-19T23:41:24Z

Thanks to @mikelorant for helping me with this PR.

pavelnikolov · 2017-12-19T23:47:28Z

/assign @foxish

pavelnikolov · 2017-12-19T23:56:00Z

/assign @unguiculus

pavelnikolov · 2017-12-20T00:12:18Z

FYI @dvonthenen @jpkrohling @yurishkuro @pavolloffay

foxish · 2017-12-20T00:36:28Z

/ok-to-test

foxish · 2017-12-20T00:36:35Z

Unassigning myself - don't know enough about Jaeger to approve/review.

pavolloffay · 2017-12-21T11:50:04Z

incubator/jaeger/values.yaml

+  image: jaegertracing/spark-dependencies
+  tag: latest
+  pullPolicy: Always
+  schedule: "*/12 * * * *"


It should run just before the end of the day. Let say 5-10 min for startup

I configured it to run every day @ 23:49

pavolloffay · 2017-12-21T11:51:04Z

does it expose zipkin service by default (9411 port to collect spans)?

pavelnikolov · 2017-12-21T12:37:12Z

@pavolloffay
TL;DR yes, it does.

Take a look at the values.yaml file line #L128 - this is where the default port is set. Then, if you take a look at the collector-deploy.yaml file #L83, you will notice that the same container port is used. Also, on #L73, the COLLECTOR_ZIPKIN_HTTP_PORT environment variable is configured, which in turn gets its value from the collector.zipkin.http-port config map key. Finally, on #20 in the common-cm.yaml file you can see that the value comes from the default value (i.e. 9411 from the values.yaml).

dvonthenen · 2017-12-22T15:27:29Z

Will take a look this afternoon as well. Been out of the office.

dvonthenen · 2018-01-02T18:38:33Z

Giving this a test drive today...

Test Cases:

New Cassandra instance
New ES instance
Existing Cassandra instance
Existing ES instance

Will be adding a few more...

dvonthenen · 2018-01-03T22:37:18Z

@pavolloffay is there any reason why we need both sets of parameters?

provisionDataStore:
  cassandra: true
  elasticsearch: false

storage:
  # allowed values (cassandra, elasticsearch)
  type: cassandra

It is possible to replace .Values.storage.type using .Values.provisionDataStore.cassandra for the if statements instead? The reason why I ask it because I ran into a split brain (two different parameters are controlling the storage deploy and configuration) where I didnt know storage.type existed/needed-to-be-changed but I had enabled provisionDataStore.elasticsearch=true. By having only a single parameter, it's a little more straight forward. The default is currently Cassandra, but that can be changed to ES. My understanding is (I could be wrong) that ES is prefered storage type by the Jaeger docs/maintainers.

pavelnikolov · 2018-01-04T07:23:53Z

@dvonthenen The only reason there are two sets of elasticsearch and cassandra params is for clarity in the requirements conditions. Probably provisionDataStore can be turned into a boolean then we can get rid of the duplication.

dvonthenen · 2018-01-04T16:55:08Z

@pavelnikolov I think it would go a long wait in terms of usability to consolidate. Minimally, there needs to be an update to the README to update the example command line in the Installing the Chart using a New ElasticSearch Cluster section to add the --set storage.type=elasticsearch command line parameter. That is what led to this discussion. Will finish off the verification today.

dvonthenen

Found a couple of issues when trying to deploy the chart using previously deployed cassandra and elasticsearch stores.

dvonthenen · 2018-01-04T18:49:24Z

incubator/jaeger/templates/common-cm.yaml

+  cassandra.datacenter.name: {{ .Values.cassandra.config.dc_name | quote }}
+  cassandra.keyspace: {{ printf "%s_%s" "jaeger_v1" .Values.cassandra.config.dc_name | quote }}
+  cassandra.schema.mode: {{ .Values.schema.mode | quote }}
+  cassandra.servers: {{ template "cassandra.fullname" . }}


This forces the external or previously deployed cassandra host to be -cassandra. This should be set to .Values.storage.cassandra.host.

Seems like we might need another define in helpers like you did for elasticsearch to toggle between using the template "cassandra.fullname" and .Values.storage.cassandra.host.

Good catch! I've tested provisioned cassandra, external ES and provisioned ES. This was the only scenario that I haven't tested. I'll fix it right away. Thank you very much for catching this!

dvonthenen · 2018-01-04T18:50:04Z

incubator/jaeger/templates/common-cm.yaml

+  collector.http-port: {{ .Values.collector.service.httpPort | quote }}
+  collector.port: {{ .Values.collector.service.tchannelPort | quote }}
+  collector.zipkin.http-port: {{ .Values.collector.service.zipkinPort | quote }}
+  es.server-urls: {{ template "elasticsearch.client.url" . }}


This forces the external or previously deployed elasticsearch host to be -elasticsearch. This should be set to .Values.storage.elasticsearch.host.

What values are you using it with? I have tested this with both external and provisioned ES and it is working fine for me. Take a closer look at this helper:

{{- define "elasticsearch.client.url" -}} {{- $port := .Values.storage.elasticsearch.port | toString -}} {{- if .Values.provisionDataStore.elasticsearch -}} {{- $host := printf "%s-%s-%s" .Release.Name "elasticsearch" "client" | trunc 63 | trimSuffix "-" -}} {{- printf "%s://%s:%s" .Values.storage.elasticsearch.scheme $host $port }} {{- else }} {{- printf "%s://%s:%s" .Values.storage.elasticsearch.scheme .Values.storage.elasticsearch.host $port }} {{- end -}} {{- end -}}

It doesn't force anything, right?

Sorry.... small goof. I got hit by the storage.type thing again by not setting it. I corrected that now but it looks like the collector and query components are having problems connecting to elasticsearch.

These are the command line parameters I am running with:

--name myrel --set provisionDataStore.cassandra=false --set provisionDataStore.elasticsearch=false --set storage.type=elasticsearch --set storage.elasticsearch.host=elasticsearch --set storage.elasticsearch.port=9200 --set storage.elasticsearch.user=elastic --set storage.elasticsearch.password=changeme

Get pods output:

[dev@k8controller ~]$ kubectl get pods NAME READY STATUS RESTARTS AGE elasticsearch-0 1/1 Running 0 24m elasticsearch-1 1/1 Running 0 24m elasticsearch-2 1/1 Running 0 24m myrel-jaeger-agent-31jxv 1/1 Running 0 8m myrel-jaeger-agent-89kxf 1/1 Running 0 8m myrel-jaeger-agent-gldbw 1/1 Running 0 8m myrel-jaeger-agent-lh8tx 1/1 Running 0 8m myrel-jaeger-agent-t7c0h 1/1 Running 0 8m myrel-jaeger-agent-wgvrm 1/1 Running 0 8m myrel-jaeger-collector-623630575-b32fr 0/1 CrashLoopBackOff 6 8m myrel-jaeger-query-3159756439-nk7c6 0/1 CrashLoopBackOff 6 8m

Error:

{"level":"fatal","ts":1515108228.394109,"caller":"collector/main.go:99","msg":"Unable to set up builder","error":"health check timeout: no Elasticsearch node available"

Verified the configmap has the correct value:

Name: myrel-jaeger Namespace: default Labels: app=jaeger chart=jaeger-0.3.0 heritage=Tiller jaeger-infra=common-configmap release=myrel Annotations: <none> Data ==== ... es.server-urls: ---- http://elasticsearch:9200 ...

Oh, good catch. I've tested with ES without username and password. I'll add those two if they are set. As you can see in this file on lines 60 and 62 they haven't been implemented.

@pavelnikolov I would also recommend setting the default username and password to elastic/changeme since those are the defaults.

ledor473 · 2018-01-11T22:13:11Z

@pavelnikolov For me, it's good ✅

My PR #3100 was merged so you will probably have to rebase/update your branch

pavelnikolov · 2018-01-11T22:16:41Z

Just rebased.

pavelnikolov · 2018-01-11T22:23:02Z

incubator/jaeger/templates/collector-deploy.yaml

          name: http
          protocol: TCP
-        - containerPort: 9411
+        - containerPort: {{ .Values.collector.service.zipkinPort }}


@ledor473 I used the named ports from your PR, but the values are no longer hard-coded. Now they come from the values.yaml and are mapped in the config map.

@pavelnikolov yes that is even better considering the Collector Pod will listen on that port instead of doing port-forwarding at the Service level 👍

pavelnikolov · 2018-01-12T11:45:52Z

/assign @unguiculus

pavelnikolov · 2018-01-17T21:15:06Z

@unguiculus any changes required? Did you want me to assign this PR to someone else? Or break it into smaller pieces?

pavelnikolov · 2018-01-21T23:58:31Z

/assign @mattfarina

unguiculus · 2018-01-22T06:30:11Z

incubator/jaeger/templates/_helpers.tpl

+Return the appropriate apiVersion for cronjob APIs.
+*/}}
+{{- define "cronjob.apiVersion" -}}
+{{- if ge .Capabilities.KubeVersion.Minor "8" -}}


This won't work once 1.10 is out. Also, this expects alpha features to be enabled for < 1.8. Better use Capabilities.APIVersions.Has.

Done. I didn't know this was possible! I just learned something new! 🎉

As for:

Also, this expects alpha features to be enabled for...

This has been documented in the Prerequisites section of the README.md file and the spark cronjob is disabled by default.

pavelnikolov · 2018-01-22T13:15:23Z

/assign @unguiculus

mattfarina · 2018-01-22T15:55:57Z

@dvonthenen there have been some small changes to it since you said LGTM. Still LGTM?

dvonthenen · 2018-01-22T16:03:56Z

Will check it out

dvonthenen · 2018-01-22T16:57:40Z

LGTM

mattfarina · 2018-01-23T14:56:39Z

incubator/jaeger/templates/collector-deploy.yaml

@@ -1,28 +1,29 @@
+{{- if .Values.collector.enabled -}}
 apiVersion: extensions/v1beta1


A quick note, Deployments as extensions/v1beta1 will be removed from Kubernetes soon. Since this there has been apps/v1beta1, apps/v1beta2, and apps/v1 API versions. For broad compatibility we'd like to see charts use apps/v1beta2 for the moment. In k8s 1.10 the API versions prior to that can be removed.

This is just a note. Not going to have it hold up this pull request.

Thank you! I'll make sure I update it when 1.10 is out.

@mattfarina should I remove extensions/v1beta1 now that k8s 1.10 is out?

mattfarina · 2018-01-23T14:58:53Z

/lgtm

k8s-ci-robot · 2018-01-23T14:59:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattfarina, pavelnikolov

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~OWNERS~~ [mattfarina]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 19, 2017

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Dec 19, 2017

pavelnikolov force-pushed the improve-jaeger branch 2 times, most recently from 3fc92b6 to dac7a59 Compare December 19, 2017 23:33

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 19, 2017

k8s-ci-robot assigned foxish Dec 19, 2017

k8s-ci-robot assigned unguiculus Dec 19, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 20, 2017

foxish removed their assignment Dec 20, 2017

pavelnikolov changed the title ~~[jaeger] Improve the Jaeger chart~~ [incubator/jaeger] Improve the Jaeger chart Dec 20, 2017

pavolloffay reviewed Dec 21, 2017

View reviewed changes

pavelnikolov force-pushed the improve-jaeger branch 2 times, most recently from 6e125e9 to cec83a7 Compare December 21, 2017 20:30

dvonthenen reviewed Jan 4, 2018

View reviewed changes

Pavel Nikolov added 6 commits January 12, 2018 09:14

[incubator/jaeger] Improve the Jaeger chart

3dcb5a5

Implement ES user/pass in the configMap

c81d772

Allow both existing and provisioned Cassandra

853edc6

Use default ES credentials

c4d7e89

Run dependencies cronjob 11 mins before EOD

7424e7e

Remove unused value

9153093

pavelnikolov force-pushed the improve-jaeger branch from 5ec2c5d to 9153093 Compare January 11, 2018 22:15

pavelnikolov commented Jan 11, 2018

View reviewed changes

Document config values

0bdc434

k8s-ci-robot assigned mattfarina Jan 21, 2018

unguiculus suggested changes Jan 22, 2018

View reviewed changes

Use .Capabilities.APIVersion.Has

0febc72

pavelnikolov force-pushed the improve-jaeger branch from 44e6f84 to 0febc72 Compare January 22, 2018 12:37

mattfarina reviewed Jan 23, 2018

View reviewed changes

k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2018

k8s-ci-robot merged commit f8fd3fb into helm:master Jan 23, 2018

pavelnikolov deleted the improve-jaeger branch January 24, 2018 11:42

		@@ -1,28 +1,29 @@
		{{- if .Values.collector.enabled -}}
		apiVersion: extensions/v1beta1

[incubator/jaeger] Improve the Jaeger chart #3109

[incubator/jaeger] Improve the Jaeger chart #3109

Conversation

pavelnikolov commented Dec 19, 2017

k8s-ci-robot commented Dec 19, 2017

pavelnikolov commented Dec 19, 2017

pavelnikolov commented Dec 19, 2017

pavelnikolov commented Dec 19, 2017

pavelnikolov commented Dec 20, 2017 • edited Loading

foxish commented Dec 20, 2017

foxish commented Dec 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Dec 21, 2017

pavelnikolov commented Dec 21, 2017 • edited Loading

dvonthenen commented Dec 22, 2017

dvonthenen commented Jan 2, 2018 • edited Loading

dvonthenen commented Jan 3, 2018 • edited Loading

pavelnikolov commented Jan 4, 2018

dvonthenen commented Jan 4, 2018

dvonthenen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavelnikolov Jan 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ledor473 commented Jan 11, 2018 • edited Loading

pavelnikolov commented Jan 11, 2018

pavelnikolov Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavelnikolov commented Jan 12, 2018

pavelnikolov commented Jan 17, 2018

pavelnikolov commented Jan 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavelnikolov commented Jan 22, 2018

mattfarina commented Jan 22, 2018

dvonthenen commented Jan 22, 2018

dvonthenen commented Jan 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattfarina commented Jan 23, 2018

k8s-ci-robot commented Jan 23, 2018

pavelnikolov commented Dec 20, 2017 •

edited

Loading

pavelnikolov commented Dec 21, 2017 •

edited

Loading

dvonthenen commented Jan 2, 2018 •

edited

Loading

dvonthenen commented Jan 3, 2018 •

edited

Loading

pavelnikolov Jan 5, 2018 •

edited

Loading

ledor473 commented Jan 11, 2018 •

edited

Loading

pavelnikolov Jan 11, 2018 •

edited

Loading