Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(new sink): Initial gcp_stackdriver_logging sink implementation #1555

Merged
merged 11 commits into from
Jan 28, 2020

Conversation

bruceg
Copy link
Member

@bruceg bruceg commented Jan 20, 2020

Adds a sink to support GCP Stackdriver Logging

@bruceg bruceg force-pushed the gcp-logging-sink branch 2 times, most recently from 6f3640d to 6b8615c Compare January 20, 2020 23:26
Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good first pass, excited to get this merged. Just a few documentation comments I had.

.meta/sinks/gcp_stackdriver_logging.toml Show resolved Hide resolved
examples = ["vector-123456"]
description = """\
The billing account ID to which to publish logs.
Exactly one of `billing_account_id`, `folder_id`, `log_id`, or `organization_id` must be set.\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, a user cannot supply multiple of these attributes, they must supply exactly one? If that's the case I wonder if it would make more sense to have a single polymorphic attribute:

[sinks.stackdriver]
  type = "gcp_stackdriver_logging"
  billing.type = "account"
  billing.id = "12345"

Just food for thought. I don't completely understand what these attributes are used for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log name is one of projects/[PROJECT_ID]/logs/[LOG_ID], folders/[FOLDER_ID]/logs[LOG_ID], etc. Exactly one of the four attributes must be supplied to construct the log name.

I see however that I made a mistake in which four they are, should include project_id instead of log_id.

required = "true"
examples = ["global", "gce_instance"]
description = """\
The monitored resource type. \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understand what this is. Could you explain a little further? What does global mean in this context? Are there docs we could link to that further describe what this is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, understanding this requires a bit of knowledge of how Stackdriver organizes logs. It was a bit confusing to me at first too. The docs are here: https://cloud.google.com/logging/docs/reference/v2/rest/v2/MonitoredResource

common = false
examples = ["vector-123456"]
description = """\
The log ID to which to publish logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to docs or help the user understand where they can obtain this ID? Or should it be common sense for a Stackdriver user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be common sense, not sure. It is not an ID that is retrieved from GCP, rather it is something the user would generate based on what they are logging from. That is, you can specify anything and Stackdriver will file it away.

common = false
examples = ["vector-123456"]
description = """\
The folder ID to which to publish logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to docs or help the user understand where they can obtain this ID? Or should it be common sense for a Stackdriver user?

.meta/sinks/gcp_stackdriver_logging.toml Show resolved Hide resolved
request_retry_attempts = 9223372036854775807
request_retry_initial_backoff_secs = 1
request_retry_max_duration_secs = 10
request_in_flight_limit = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this aligned with API limits or just an arbitrary limit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most are arbitrary, but the rate limit and batch size are from the API docs: https://cloud.google.com/logging/quotas#log-limits

.meta/sinks/gcp_stackdriver_logging.toml Outdated Show resolved Hide resolved
@binarylogic binarylogic changed the title feat(new sink): Google Cloud Platform Stackdriver Logging feat(new sink): Initial gcp_stackdriver_logging sink implementation Jan 21, 2020
Copy link
Contributor

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I think the way you did the splice was really smart!

I'd like to see the two config changes before we merge.

Another thing, is maybe we want to create a src/sinks/gcp folder that contains the util file, the stackdriver and pubsub sink. What do you think?

src/sinks/gcp_stackdriver_logging.rs Outdated Show resolved Hide resolved
logs.pop(); // Strip the trailing comma

let mut body = wrapper.clone().into_bytes();
body.splice(wrapper_splice..wrapper_splice, logs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool solution! I've been thinking all week how I want to accomplish this and I think this is as close to a good solution as well get.

I have some ideas of how to improve this but won't block this PR on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I thought this was a huge hack, but if you like it, I'm happy. Best would of course be some kind of JSON serializer that could do incremental work, but it's not obvious how that might work in practice.

I am of course curious what could be done to improve on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, ive been mulling over this problem for a while since this is a major thing with making easy to write http sinks. I plan on seeing if I can clean up this hack next week. I think you have basically the gist of it. I think we could possibly write a custom body actually for http that could basically have slots for 3 slices so we don't have to do any double allocations like what is happening now. The other idea also is to use bytes as the "static" bytes and arc clone them rather then recopying the vec.

But I think for this right now, this is a good hack and its not messy either so I'm happy to keep this for now.

src/sinks/gcp_stackdriver_logging.rs Outdated Show resolved Hide resolved
src/sinks/gcp_stackdriver_logging.rs Outdated Show resolved Hide resolved
@LucioFranco
Copy link
Contributor

I think this also could use a rebase against master.

@bruceg
Copy link
Member Author

bruceg commented Jan 24, 2020

Another thing, is maybe we want to create a src/sinks/gcp folder that contains the util file, the stackdriver and pubsub sink. What do you think?

It would make some sense. If we're going to do that, we should probably do similar with the AWS too, which makes it a "code debt" kind of thing. At least it's a separate PR.

@LucioFranco
Copy link
Contributor

@bruceg good idea! I opened an issue for aws and possibly datadog #1603. I still think we should do the change here for gcp since we are adding a new util file.

Bruce Guenter added 2 commits January 24, 2020 17:27
Signed-off-by: Bruce Guenter <bruce@timber.io>
Signed-off-by: Bruce Guenter <bruce@timber.io>
Signed-off-by: Bruce Guenter <bruce@timber.io>
Copy link
Contributor

@LucioFranco LucioFranco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

Signed-off-by: Bruce Guenter <bruce@timber.io>
@bruceg bruceg assigned binarylogic and unassigned LucioFranco Jan 24, 2020
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
@binarylogic
Copy link
Contributor

@bruceg I'm writing documentation on GCP authentication. Do you think it makes sense to support the GOOGLE_APPLICATION_CREDENTIALS environment variable? This seems to be a standard as noted in the GCP docs.

@binarylogic
Copy link
Contributor

Also, it appears the api_key option is not documented? I'm working on fixing this.

Signed-off-by: binarylogic <bjohnson@binarylogic.com>
@bruceg
Copy link
Member Author

bruceg commented Jan 28, 2020

I had thought GOOGLE_APPLICATION_CREDENTIALS might be support by the goauth crate we use, but there is no evidence of that in the source. It also affects validation of our config too, so I'll add that.

The api_key setting only works for certain APIs. It just so happens that the first GCP sink we implemented is one of those APIs (and it's not even listed there).

binarylogic and others added 3 commits January 28, 2020 15:36
Signed-off-by: binarylogic <bjohnson@binarylogic.com>
Signed-off-by: Bruce Guenter <bruce@timber.io>
Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@bruceg bruceg merged commit 219b1a8 into master Jan 28, 2020
@bruceg bruceg deleted the gcp-logging-sink branch January 28, 2020 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants