Performance Audit attribution from third-party wrapping #7005

nicjansma · 2019-01-14T20:13:34Z

Feature request summary

We are a third-party analytics provider that provides Real User Monitoring (RUM) and other features such as JavaScript Error Tracking through a JavaScript library our customers include on their pages called Boomerang.

Lately, we've had several customers raise support issues over their concerns that our JavaScript library is causing excessive JavaScript CPU usage, according to Lighthouse/PSI reports.

Here's an example:

Upon investigation, we found that our Error Tracking plugin was causing Lighthouse to be confused about work being done in our customer's app. The plugin wraps several high-level functions, and the callbacks of, setTimeout(), addEventListener() and others. It does this to ensure that the customer's application can get detailed error messages (instead of the generic Script error. message) when errors are generated from scripts in other origins. More details are here, and other error tracking libraries do this wrapping as well.

Because Boomerang is wrapping the callbacks in setTimeout()/etc, any function in any file that calls setTimeout() will be attributed to Boomerang instead, because Boomerang's wrapped callback is at the "bottom" of the stack. As you can imagine, this accounts for a lot of the JavaScript CPU time on many sites. The actual work done in the wrapped function is minimal (but non-zero), but the time of the entire callback is attributed to Boomerang instead of the "real" work being done. Our understanding is the "URL" of the JavaScript boot-up time /etc tables are based on the function/URL at the bottom of the stack.

If we were to disable Error Tracking and re-load the same page, we can see the real attribution moves a lot to the Page and other scripts (as well as Boomerang), but the "cost" of Boomerang has decreased from 770ms to 127ms:

We were hoping there would be a way to change it so these wrapping functions don't get blamed for the entire callback cost. One idea would be to have a list of files or functions that could be excluded from "blame" if at the bottom of the stack, similar to library blackboxing.

We're not sure if that's feasible though:

We'd still want the non-wrapping code in Boomerang to be judged based on the work it does, of course.
We'd like for our customers's Lighthouse/PSI runs to have this exclusion applied without them having to make any changes or configure any lists (can this be built-into Lighthouse?)
Each of our customers load our third-party library from distinct URLs

The only other alternates we've come up with is better education (explaining the issue to the customer, which we're doing now), or disabling Boomerang's wrapping if Lighthouse is detected (which we'd prefer to not do).

I have a repro page here, which merely does a setTimeout() that busy-loops for 2 seconds:

Boomerang Error Tracking enabled: http://dev.nicj.net/lighthouse-attribution/?errortracking=on

Boomerang Error Tracking disabled: http://dev.nicj.net/lighthouse-attribution/?errortracking=off

I'm willing to work on this change, if there's agreement on a correct approach.

What is the motivation or use case for changing this?

To accurately reflect JavaScript CPU cost when third-party libraries are in the callstack.

How is this beneficial to Lighthouse?

More accurate JavaScript CPU attribution.

The text was updated successfully, but these errors were encountered:

patrickhulce · 2019-01-14T20:47:57Z

Thanks so much for the detailed information and willingness to work with us @nicjansma !

I agree we should be trying to attribute these costs more accurately in Lighthouse core (and therefore PSI, etc) without needing the user to do any opting-in to a plugin/config/options.

Let's start with where the cost should be going and follow up with how we're not quite getting there yet.

There are a couple ways to look at this.

We attribute time to each script according to the aggregate selfTime of all its executions. i.e. if we have jQuery on the page, then jQuery would receive the attribution for all the users' calls to $.whatever. This is typically what a developer wants to see when working on identifying bottlenecks during a performance deep dive. However, this is probably not what we want to surface in Lighthouse. This somewhat presumes that the calls to functions are fixed and we're trying to have the largest overall impact improving things, but we're trying to surface what scripts instigated the most work. Let's take a look at number 2.
We attribute time to each script according to the aggregate totalTime of all tasks it initiated. i.e. if we have jQuery on the page, then jQuery would receive attribution for its initial installation and any handlers, setTimeouts, or additional work it defers, but not for any calls to jQuery functions from other scripts. This is what we're trying to achieve, but you've identified a hole where we are attributing to the first URL that executed in the timeout instead of the URL that scheduled the timeout.

Does that sound right to everyone?

Assuming it does, fixing this is a bit tricky for a first-time contribution.

We will need to bring the same logic we have for tracking timer installations over in page-dependency-graph.js and bring it into the way we compute attributeableURLs in main-thread-tasks.js

lighthouse/lighthouse-core/computed/page-dependency-graph.js

Lines 204 to 215 in 017574b

    
           case 'TimerInstall': 
        
             // @ts-ignore - 'TimerInstall' event means timerId exists. 
        
             timers.set(evt.args.data.timerId, node); 
        
             stackTraceUrls.forEach(url => addDependencyOnUrl(node, url)); 
        
             break; 
        
           case 'TimerFire': { 
        
             // @ts-ignore - 'TimerFire' event means timerId exists. 
        
             const installer = timers.get(evt.args.data.timerId); 
        
             if (!installer) break; 
        
             installer.addDependent(node); 
        
             break; 
        
           }

lighthouse/lighthouse-core/computed/main-thread-tasks.js

Lines 150 to 167 in 017574b

    
               switch (task.event.name) { 
        
                 /** 
        
                  * Some trace events reference a specific script URL that triggered them. 
        
                  * Use this URL as the higher precedence attributable URL. 
        
                  * @see https://cs.chromium.org/chromium/src/third_party/blink/renderer/devtools/front_end/timeline/TimelineUIUtils.js?type=cs&q=_initEventStyles+-f:out+f:devtools&sq=package:chromium&g=0&l=678-744 
        
                  */ 
        
                 case 'v8.compile': 
        
                 case 'EvaluateScript': 
        
                 case 'FunctionCall': 
        
                   taskURLs = [argsData.url].concat(stackFrameURLs); 
        
                   break; 
        
                 case 'v8.compileModule': 
        
                   taskURLs = [task.event.args.fileName].concat(stackFrameURLs); 
        
                   break; 
        
                 default: 
        
                   taskURLs = stackFrameURLs; 
        
                   break; 
        
               }

nicjansma · 2019-02-08T13:13:43Z

Totally agree on the above thoughts @patrickhulce and #7059 looks like a great start. It looks like that may solve our customer-facing issue (with boomerang on the page), though possibly not the specific repro case above, correct?

Anything I can look into to help?

patrickhulce · 2019-02-08T15:34:05Z

That is correct, #7059 should solve your customer-facing issue 👍

I think it just needs to be reviewed so no further action needed on your part!

cheneytsai · 2020-08-27T17:26:48Z

@nicjansma Is this another example of misattribution to boomerang?

Seems abnormally high for a script eval. Tested URL is www.hilton.com/en/

patrickhulce · 2020-08-27T17:47:24Z

@cheneytsai I can't reproduce that behavior on latest Lighthouse. Boomerang is way at the bottom with 188ms

What version of LH is that?

cheneytsai · 2020-08-27T19:54:45Z

Oops forgot to list. This was LH 6.0.0 as part of Chrome 84.0.4147.135

patrickhulce · 2020-08-27T21:15:30Z

Hm, well it's possible it's been fixed in past couple months I can't observe that behavior on 6.3.0.

nicjansma · 2020-08-31T17:07:51Z

@cheneytsai Yeah I certainly wouldn't expect Boomerang to be anything more than ~200ms in most scenarios.

If you have the trace file, you could share it (or drop me an email nic at nicj dot net) and we can take a look.

This was referenced Jan 16, 2019

new_audit(legacy-javascript): detect unnecessary polyfills and transforms #6730

Closed

core(bootup-time): better attribution #7059

Merged

paulirish added needs-priority P1.5 and removed needs-priority labels Feb 12, 2019

brendankenny mentioned this issue Jun 25, 2019

Audit: Attribute mainthread costs by framework #2631

Closed

connorjclark closed this as completed Dec 30, 2019

paulirish mentioned this issue Apr 21, 2023

Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Audit attribution from third-party wrapping #7005

Performance Audit attribution from third-party wrapping #7005

nicjansma commented Jan 14, 2019

patrickhulce commented Jan 14, 2019

nicjansma commented Feb 8, 2019

patrickhulce commented Feb 8, 2019

cheneytsai commented Aug 27, 2020

patrickhulce commented Aug 27, 2020

cheneytsai commented Aug 27, 2020

patrickhulce commented Aug 27, 2020

nicjansma commented Aug 31, 2020

Performance Audit attribution from third-party wrapping #7005

Performance Audit attribution from third-party wrapping #7005

Comments

nicjansma commented Jan 14, 2019

patrickhulce commented Jan 14, 2019

nicjansma commented Feb 8, 2019

patrickhulce commented Feb 8, 2019

cheneytsai commented Aug 27, 2020

patrickhulce commented Aug 27, 2020

cheneytsai commented Aug 27, 2020

patrickhulce commented Aug 27, 2020

nicjansma commented Aug 31, 2020