Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Audit attribution from third-party wrapping #7005

Closed
nicjansma opened this issue Jan 14, 2019 · 8 comments
Closed

Performance Audit attribution from third-party wrapping #7005

nicjansma opened this issue Jan 14, 2019 · 8 comments
Labels

Comments

@nicjansma
Copy link

Feature request summary

We are a third-party analytics provider that provides Real User Monitoring (RUM) and other features such as JavaScript Error Tracking through a JavaScript library our customers include on their pages called Boomerang.

Lately, we've had several customers raise support issues over their concerns that our JavaScript library is causing excessive JavaScript CPU usage, according to Lighthouse/PSI reports.

Here's an example:

image

Upon investigation, we found that our Error Tracking plugin was causing Lighthouse to be confused about work being done in our customer's app. The plugin wraps several high-level functions, and the callbacks of, setTimeout(), addEventListener() and others. It does this to ensure that the customer's application can get detailed error messages (instead of the generic Script error. message) when errors are generated from scripts in other origins. More details are here, and other error tracking libraries do this wrapping as well.

Because Boomerang is wrapping the callbacks in setTimeout()/etc, any function in any file that calls setTimeout() will be attributed to Boomerang instead, because Boomerang's wrapped callback is at the "bottom" of the stack. As you can imagine, this accounts for a lot of the JavaScript CPU time on many sites. The actual work done in the wrapped function is minimal (but non-zero), but the time of the entire callback is attributed to Boomerang instead of the "real" work being done. Our understanding is the "URL" of the JavaScript boot-up time /etc tables are based on the function/URL at the bottom of the stack.

If we were to disable Error Tracking and re-load the same page, we can see the real attribution moves a lot to the Page and other scripts (as well as Boomerang), but the "cost" of Boomerang has decreased from 770ms to 127ms:

image

We were hoping there would be a way to change it so these wrapping functions don't get blamed for the entire callback cost. One idea would be to have a list of files or functions that could be excluded from "blame" if at the bottom of the stack, similar to library blackboxing.

We're not sure if that's feasible though:

  • We'd still want the non-wrapping code in Boomerang to be judged based on the work it does, of course.
  • We'd like for our customers's Lighthouse/PSI runs to have this exclusion applied without them having to make any changes or configure any lists (can this be built-into Lighthouse?)
  • Each of our customers load our third-party library from distinct URLs

The only other alternates we've come up with is better education (explaining the issue to the customer, which we're doing now), or disabling Boomerang's wrapping if Lighthouse is detected (which we'd prefer to not do).

I have a repro page here, which merely does a setTimeout() that busy-loops for 2 seconds:

image

image

I'm willing to work on this change, if there's agreement on a correct approach.

What is the motivation or use case for changing this?

To accurately reflect JavaScript CPU cost when third-party libraries are in the callstack.

How is this beneficial to Lighthouse?

More accurate JavaScript CPU attribution.

@patrickhulce
Copy link
Collaborator

Thanks so much for the detailed information and willingness to work with us @nicjansma !

I agree we should be trying to attribute these costs more accurately in Lighthouse core (and therefore PSI, etc) without needing the user to do any opting-in to a plugin/config/options.

Let's start with where the cost should be going and follow up with how we're not quite getting there yet.

There are a couple ways to look at this.

  1. We attribute time to each script according to the aggregate selfTime of all its executions. i.e. if we have jQuery on the page, then jQuery would receive the attribution for all the users' calls to $.whatever. This is typically what a developer wants to see when working on identifying bottlenecks during a performance deep dive. However, this is probably not what we want to surface in Lighthouse. This somewhat presumes that the calls to functions are fixed and we're trying to have the largest overall impact improving things, but we're trying to surface what scripts instigated the most work. Let's take a look at number 2.
  2. We attribute time to each script according to the aggregate totalTime of all tasks it initiated. i.e. if we have jQuery on the page, then jQuery would receive attribution for its initial installation and any handlers, setTimeouts, or additional work it defers, but not for any calls to jQuery functions from other scripts. This is what we're trying to achieve, but you've identified a hole where we are attributing to the first URL that executed in the timeout instead of the URL that scheduled the timeout.

Does that sound right to everyone?

Assuming it does, fixing this is a bit tricky for a first-time contribution.

We will need to bring the same logic we have for tracking timer installations over in page-dependency-graph.js and bring it into the way we compute attributeableURLs in main-thread-tasks.js

case 'TimerInstall':
// @ts-ignore - 'TimerInstall' event means timerId exists.
timers.set(evt.args.data.timerId, node);
stackTraceUrls.forEach(url => addDependencyOnUrl(node, url));
break;
case 'TimerFire': {
// @ts-ignore - 'TimerFire' event means timerId exists.
const installer = timers.get(evt.args.data.timerId);
if (!installer) break;
installer.addDependent(node);
break;
}

switch (task.event.name) {
/**
* Some trace events reference a specific script URL that triggered them.
* Use this URL as the higher precedence attributable URL.
* @see https://cs.chromium.org/chromium/src/third_party/blink/renderer/devtools/front_end/timeline/TimelineUIUtils.js?type=cs&q=_initEventStyles+-f:out+f:devtools&sq=package:chromium&g=0&l=678-744
*/
case 'v8.compile':
case 'EvaluateScript':
case 'FunctionCall':
taskURLs = [argsData.url].concat(stackFrameURLs);
break;
case 'v8.compileModule':
taskURLs = [task.event.args.fileName].concat(stackFrameURLs);
break;
default:
taskURLs = stackFrameURLs;
break;
}

@nicjansma
Copy link
Author

Totally agree on the above thoughts @patrickhulce and #7059 looks like a great start. It looks like that may solve our customer-facing issue (with boomerang on the page), though possibly not the specific repro case above, correct?

Anything I can look into to help?

@patrickhulce
Copy link
Collaborator

That is correct, #7059 should solve your customer-facing issue 👍

I think it just needs to be reviewed so no further action needed on your part!

@cheneytsai
Copy link

@nicjansma Is this another example of misattribution to boomerang?
Screen Shot 2020-08-27 at 10 26 08 AM

Seems abnormally high for a script eval. Tested URL is www.hilton.com/en/

@patrickhulce
Copy link
Collaborator

@cheneytsai I can't reproduce that behavior on latest Lighthouse. Boomerang is way at the bottom with 188ms

image

What version of LH is that?

@cheneytsai
Copy link

Oops forgot to list. This was LH 6.0.0 as part of Chrome 84.0.4147.135

@patrickhulce
Copy link
Collaborator

Hm, well it's possible it's been fixed in past couple months I can't observe that behavior on 6.3.0.

@nicjansma
Copy link
Author

@cheneytsai Yeah I certainly wouldn't expect Boomerang to be anything more than ~200ms in most scenarios.

If you have the trace file, you could share it (or drop me an email nic at nicj dot net) and we can take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants