Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gRPC communication between agent and collector #1165

Merged
merged 14 commits into from
Nov 13, 2018

Conversation

pavolloffay
Copy link
Member

@pavolloffay pavolloffay commented Nov 8, 2018

Resolves some parts of #773

Rebased and modified #903 and #1042

  • grpc server on the collector and reporter on the agent side
  • sampling strategies over grpc
  • grpc flags are marked as experimental. By default is used tchannel
  • xdock uses grpc

@codecov
Copy link

codecov bot commented Nov 8, 2018

Codecov Report

Merging #1165 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #1165    +/-   ##
=======================================
  Coverage     100%    100%            
=======================================
  Files         144     154    +10     
  Lines        6839    7034   +195     
=======================================
+ Hits         6839    7034   +195
Impacted Files Coverage Δ
cmd/agent/app/reporter/grpc/reporter.go 100% <100%> (ø)
cmd/agent/app/reporter/grpc/flags.go 100% <100%> (ø)
...el/converter/thrift/jaeger/sampling_from_domain.go 100% <100%> (ø)
cmd/agent/app/httpserver/server.go 100% <100%> (ø) ⬆️
cmd/agent/app/reporter/grpc/sampling_manager.go 100% <100%> (ø)
cmd/collector/app/builder/span_handler_builder.go 100% <100%> (ø) ⬆️
cmd/collector/app/builder/builder_flags.go 100% <100%> (ø) ⬆️
cmd/collector/app/grpc_handler.go 100% <100%> (ø)
cmd/collector/app/grpcserver/grpc_server.go 100% <100%> (ø)
cmd/agent/app/flags.go 100% <100%> (ø) ⬆️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2eb7d1...955ab84. Read the comment docs.

bool ok = 1;
}

service CollectorService {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I have excluded query service. Let's do it in a separate PR.

Makefile Show resolved Hide resolved

const (
gRPCPrefix = "reporter.grpc."
collectorHostPort = "collector.host-port"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different from the old reporter? I thought that one supported a list of addresses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpc dial supports only one address by default. An option is to use with balancer https://stackoverflow.com/a/50302295/4158442 not sure what that implies though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the load balancer implementation can be done later, I'm just asking whether the cmd line should support a list of addresses (it can fail for now on >1 address).

cmd/agent/app/reporter/grpc/sampling_manager.go Outdated Show resolved Hide resolved
cmd/all-in-one/main.go Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
if err != nil {
logger.Fatal("Unable to start listening on channel", zap.Error(err))

{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplication with the collector. It would be good to have all verbose initialization code in a shared package where it can be tested rather than in main.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to move it in some package in collector. The same applies for tchannel there is duplication even in the current master.

cmd/all-in-one/main.go Outdated Show resolved Hide resolved
}

// PostSpans implements gRPC CollectorService.
func (g *GRPCHandler) PostSpans(ctx context.Context, r *api_v2.PostSpansRequest) (*api_v2.PostSpansResponse, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need new metrics here?

cmd/collector/app/grpc_handler_test.go Outdated Show resolved Hide resolved
}

service CollectorService {
rpc PostSpans(PostSpansRequest) returns (PostSpansResponse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should be using streaming in this API... Unless we expect the agent to resubmit to another collector on error (worth checking what OC agent is planning to do, those folks have a lot more experience with grpc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS can be addressed in another PR, we should merge the infra first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it can I would prefer.

@pavolloffay
Copy link
Member Author

pavolloffay commented Nov 9, 2018

Outstanding tasks:

  • metrics
  • rpc/stream for posting spans
  • dedupe process - pass process to reporter.send() in agent - avoid converting every process
  • Revisit postSpans response
  • close grpc server and connections

I would prefer to do it in a separate PR if possible.

@yurishkuro
Copy link
Member

I would prefer to do it in a separate PR if possible.

That's fine with me (I'd prefer that too, this diff is already too large).

But the main question we need to answer is what to do w.r.t. releasing 1.8. If there are any user-facing API changes (like supporting multiple addresses may change the cli flag name), then once we merge this PR we'll need to finish the outstanding tasks before cutting 1.8. Thoughts?

@pavolloffay
Copy link
Member Author

Originally I wanted to cut 1.8 without gRPC to have more baking time in master before releasing it. I will have a look at the list of addresses.

@pavolloffay
Copy link
Member Author

@yurishkuro I have added load balancer support in the last commit. If you prefer I can revert it, supporting a list of collectors in the same flag is back compatible anyways.

resolvedAddrs = append(resolvedAddrs, resolver.Address{Addr: addr})
}
r.InitialAddrs(resolvedAddrs)
conn, _ = grpc.Dial(r.Scheme()+":///jaeger.collector", grpc.WithInsecure(), grpc.WithBalancerName(roundrobin.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the significance of ":///jaeger.collector" string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't know, It seems that it can be any string after :///

https://github.com/grpc/grpc-go/blob/master/balancer/roundrobin/roundrobin_test.go#L136

cmd/agent/main.go Show resolved Hide resolved
@pavolloffay
Copy link
Member Author

I will merge after 1.8 is out

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@pavolloffay pavolloffay merged commit 9635a33 into jaegertracing:master Nov 13, 2018
@ghost ghost removed the review label Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants