Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Facility for telling Spring Scheduling instrumentation not to trace individual @Scheduled methods #5008

Closed
GrahamLea opened this issue Jan 4, 2022 · 15 comments
Labels
enhancement New feature or request

Comments

@GrahamLea
Copy link

Is your feature request related to a problem? Please describe.
Spring Scheduling instrumentation can only be either on or off. A method @Scheduled to run every second gets traced every second, even if it does nothing interesting. There is no way (that I can tell) to ask the instrumentation not to trace individual methods.

Describe the solution you'd like
It'd be great to be able to annotate a @Scheduled method with something like @NoTracing and have it not be traced.

Describe alternatives you've considered
I have a hack where I use an aspect over TaskScheduler.schedule*(..) to unwrap the original Runnable from a ScheduledMethodRunnable if the wrapped runnable is a ScheduledMethodRunnable and I can see a specific annotation on the method.

@GrahamLea GrahamLea added the enhancement New feature or request label Jan 4, 2022
@trask
Copy link
Member

trask commented Jan 4, 2022

hi @GrahamLea! I recently added code.* span attribute to the quartz spans for a similar use case, so that you could use sampling to filter out specific noisy jobs (#4332).

It looks like we should do the same for spring scheduling spans.

Though that still requires writing an agent extension to add and configure such a sampler, see #1060 (comment) if you want to go down that route.

@GrahamLea
Copy link
Author

I think it's interesting that in the few cases where I've seen similar filtering requests mentioned, the advice offered is often that a Sampler will be needed.

From my quick look at the code around AgentInstaller, the way it handles otel.javaagent.exclude-classes is to never instrument those classes to begin with.

I'm sure there's a number of factors to consider in deciding which way to implement these things, but I would have thought that, generally, if it's possible to not instrument something rather than to instrument it and then throw the data away, the first would probably be preferable. 🤷🏻‍♂️

@iNikem
Copy link
Contributor

iNikem commented Jan 5, 2022

As said in our documentation:

This option should not be used lightly, as it can leave some instrumentation partially applied,
which could have unknown side-effects.

@GrahamLea
Copy link
Author

@iNikem Thanks. Would you mind helping me connect your highlighting of that para to what I've said above? Is it that the approach used by exclude-classes - of not instrumenting code - is generally not a good idea (bc "unknown side-effects") and so Samplers are the preferred way to solve these kinds of problems? Or something else?

@iNikem
Copy link
Contributor

iNikem commented Jan 5, 2022

Yes, it is easier to break instrumentation by using exclude-classes than by using sampler. E.g. by excluding some classes from instrumentation you may break intra-process context propagation and so your traces are wrong. But if you can thoroughly test your application with exclude-classes and verify that resulting telemetry makes sense to you, then I don't see any reasons why you should not use it :)

@trask trask closed this as completed Jan 19, 2022
@GrahamLea
Copy link
Author

Hi @trask.
Old thread, but I'm just trying to put all the pieces together here to get this working in our app (and hopefully leave a papertrail for others).

Am I right in thinking that, to employ selective filtering of Spring @Scheduled tasks, I need to...

  1. Use a version of the Agent that includes this change from Feb 2022:
    Add code attributes to spring-scheduling spans #5306
  2. And then should I...
    a. create my own extension with a custom sampler based off this example...:
    https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/examples/extension/src/main/java/com/example/javaagent/DemoSampler.java
    ... and the code.namespace and code.function attributes to match the class and function names (based off):
    public void onStart(AttributesBuilder attributes, Context parentContext, REQUEST request) {
    Class<?> cls = getter.codeClass(request);
    if (cls != null) {
    internalSet(attributes, SemanticAttributes.CODE_NAMESPACE, cls.getName());
    }
    internalSet(attributes, SemanticAttributes.CODE_FUNCTION, getter.methodName(request));
    }

    b. or should I just be building for myself the RuleBasedRoutingSampler from here?:
    https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/samplers/RuleBasedRoutingSampler.java
  3. And then include that extension at runtime using -Dotel.javaagent.extensions as described here:
    https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/examples/extension
  4. At this point, I'm lost. How does the Sampler get instantiated and installed in the Agent? If it needs configuration (e.g. the rules for the RuleBasedRoutingSampler), how are they controlled by our app rather than needing to be baked into the extension JAR?

Any guidance you can give would be very appreciated! 🙏

@GrahamLea
Copy link
Author

Okay, found another comment here that seems to say the RuleBasedRoutingSampler needs to be packaged into an agent extension, with the configuration already baked in. Is that right?
(Seems less than ideal to have the sampler config in a separate project to my app where the @Scheduled tasks are.)
#1060 (comment)

@GrahamLea
Copy link
Author

And on my configuration question, I see that the comments in DemoSampler have pointed me at DemoAutoConfigurationCustomizerProvider

@GrahamLea
Copy link
Author

GrahamLea commented Jun 22, 2022

Okay, I've got a good way with this so far. I've built a SpringScheduledTaskSampler into an extension JAR, and when I run our app with the right extension JAR options, I can see that the Sampler is being created by addSamplerCustomizer in this code (new SpringScheduledTaskSampler() contains a println):

@AutoService(AutoConfigurationCustomizerProvider.class)
public class SpringScheduledTaskSamplerCustomizer implements AutoConfigurationCustomizerProvider {

    private static final String BLOCK_LIST_SYSTEM_PROPERTY = "otel.spring.scheduled.blocklist";

    @Override
    public void customize(AutoConfigurationCustomizer autoConfiguration) {
        String blockListString = System.getProperty(BLOCK_LIST_SYSTEM_PROPERTY);
        if (blockListString == null) {
            throw new IllegalStateException(
                "Required system property not found: " + BLOCK_LIST_SYSTEM_PROPERTY);
        }
        HashSet<String> blockList = new HashSet<>(asList(blockListString.split(",")));
        System.out.println("[otel-spring-scheduled-task-sampler]" +
            " Configuring SpringScheduledTaskSampler, blockList = " + blockList);
        autoConfiguration.addSamplerCustomizer(
            (sampler, configProperties) -> new SpringScheduledTaskSampler(sampler, blockList)
        );
    }
}

However, the custom Sampler never gets called. (A println immediately inside shouldSample() never prints anything.) My @Scheduled tasks with high frequencies are continuing to spit out telemetry like the Sampler doesn't exist.
Any idea why that would be?

@trask
Copy link
Member

trask commented Jun 22, 2022

hi @GrahamLea! i'm not sure why the sampler is not getting called. I updated the example extension to use the (better) addSamplerCustomizer which you have used above, and it is working: #6204

@GrahamLea
Copy link
Author

Thanks for the reply, Trask.
I'm using Honeycomb's customised OTel agent. I think the next step will have to be checking if it's overwriting my Sampler with one of it's own.

@GrahamLea
Copy link
Author

Yep. Honeycomb's HoneycombAutoConfigurationCustomizerProvider calls setSampler(), and it's being loaded after my AutoConfigurationCustomizerProvider, so it blats my decorator-style sampler.
I can't see any way to prioritise AutoConfigurationCustomizerProviders so that mine is loaded 2nd. Do you know of any way to do that?
Seems like this is a problem with the way Honeycomb does its auto configuration. I'll raise with them.

@GrahamLea
Copy link
Author

Okay, I've now got my Filter Sampler working (using a fork of the Honeycomb agent - hopefully temporary).

But... with a lot of these @Scheduled tasks that I want to ignore, the logic inside is...

once every few seconds {
    check some condition quickly
    if condition is true {  // occurs once every few minutes to hours
        do some significant piece of work
    }
}

When I get to do some significant piece of work, I DO want to trace what's going on there.
I've tried to do this by manually creating a Span in the code at that point, e.g. ...

        val span: Span = tracer.spanBuilder(spanName).startSpan()
        try {
            // do some significant piece of work
        } finally {
            span.end();
        }

But I'm not seeing those appear in my telemetry. (And my sampling is set to 100%)

Is that because it's within a Span that my filter is telling OTel to DROP?

Is there an easy way for me to start a new trace within a Span that's been generated by the agent instrumentation and then dropped by a Sampler?

@trask
Copy link
Member

trask commented Jun 23, 2022

Is that because it's within a Span that my filter is telling OTel to DROP?

yes, exactly. the default samplers are all "parent-based" which means they respect the decision of the parent (which includes the parent's decision to drop).

try changing your sampler to explicitly "record" the nested span instead of delegating to the underlying (parent-based) sampler

@GrahamLea
Copy link
Author

Thanks. I need to check I'm understanding, and what the solution is...
The purpose of my filter Sampler is to DROP Spans that match a filter, because they happen too often.
But now I realise I want the option to have a Span created manually within those dropped Spans that will get sampled.

I believe what you're saying is I should change my filter to detect those spans and give them a RECORD result instead? (Why not RECORD_AND_SAMPLE?)

So, is there an obvious way for me to detect when something is a manual span created under my DROPped spans? Or do I need to create one, e.g. add a custom Span attribute that tells my filter to record a Span instead of looking at the parent Span's sampling? Or should I just assume that if there's another INTERNAL under one of my DROPped spans that it's manual and I should sample it?

So many questions. Feel free to point me at some docs if this is explained well somewhere. (I tried searching, but all explanations I found of custom Samplers are pretty high level.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants