-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lighthouse misattributing Main-Thread Blocking Time in Run Microtasks #14843
Comments
Thank you so much for the great bug report!! I think there are 3 questions here:
|
Heyo 👋 excellent bug report 👍 thank you! I can help provide some explanation as to how we arrived here. This deck is helpful for explaining how Lighthouse attribution is supposed to work. To answer @yoavweiss 's formulation of the question "Why is Lighthouse identifying any particular script as the guilty party here, where it's clearly a group effort? :)" the short answer is our view is that long tasks are almost always a group effort and we attempt to find a reasonable proximate cause. We've taken the approach that proximate cause from Lighthouse's perspective means who is responsible for scheduling this work rather than performing it. To many of your questions @nicjansma, we don't really care what actually executed inside the RunMicrotasks at all and only use first executor when we lack instrumentation from Chrome about who scheduled the task (as a best guess for, it's rather likely that whoever scheduled the work will call their own code first before delegating to other scripts/libraries). I would've expected to be able to tie RunMicrotasks back to its scheduling script without this fallback case being triggered though 🤔 , so perhaps that's where a bug has been introduced at some point in the last few years since this was written?
+1 to these questions. Combining with the above, it'd be really helpful to understand a human expert's manual assessment of the proximate cause description found in the LH deck to be able to pinpoint where exactly Chrome tracing/Lighthouse fell off the wagon. |
@nicjansma I just want to say Thank Tou for a phenomenal bug report. I'm sorry we haven't responded yet.. and this isn't a proper response either. We've got a few things going on. But.. the short of it is: your bug is legit and it'll take us a little bit to draft a response. Attribution is a hard problem with indeterminate solutions. But we want attribution/blame to make sense to developers, point the way to actionability, be fair, and treat things consistently across tools. I'm eager figure the right solution. |
There could be several reasons why unrelated functions would run in the same task:
In this case, it seems like the long task is caused by several event handlers that happen at DCL time, some to DCL itself and some (like the first one) to readyStateChange. If you had deferred scripts they would probably make that task even longer. One of the actual causes of long tasks here btw is DOZENS of style thrashes (coming from calls to getComputedStyle). Tracing this page with LoAF did show this. |
Any updates on this issue, we are seeing it as well with a RUM product. |
FAQ
URL
https://nicj.net
What happened?
(submitted URL is a placeholder URL, not the site I've reproduced this issue on)
As a third party RUM provider, my team provides a JavaScript RUM library called boomerang.js that is included in our customer's websites. boomerang.js is loaded on every page load and beacons performance analytics information.
Our customers, who are paying for RUM or CDN services, are usually very motivated and proactive in monitoring their site's performance, as you would expect. They will utilize tools like Lighthouse and want to ensure that boomerang.js is not causing any performance side-effects on their sites. We sometimes get Lighthouse reports that highlight boomerang.js in some of the audits. Being a RUM provider, our development team is very interested in ensuring boomerang.js is as performant and non-invasive as possible.
We recently were sent a Lighthouse report by a customer that highlighted boomerang.js as one of the top hits under the Reduce the impact of third-party code audit. I was surprised to see boomerang.js highlighted there, as we haven't seen occurrences of blocking time in our own self-audits of our boomerang.js runtime. Their report showed 989ms of Main-Thread Blocking Time (though my screenshots below will show a smaller amount (286ms) with my local repro).
Here's an example screenshot from my development machine:
(the rest of the third-parties in this audit have been chopped off for brevity and the privacy of our customer)
I started to dig into what was causing the Main-Thread Blocking Time being 286ms in this trace. In other examples, when running Lighthouse locally, I would potentially see it even higher, 500-1000ms or more.
I had a hard time correlating those 286ms to anything within the Trace itself. Once loaded in the Performance tab, I could see our script executing infrequently, but most tasks were 0.1ms or smaller, and added up (even at the microsecond granularity) the total
FunctionCall
time was only around ~23ms. Even taking the mobile.cpuMultiplier
4x into account that would be just 92ms.We have a few 100ms-interval timers and a rAF callback, most of which complete in under 0.1ms. The below list is filtered to just our URL:
I additionally took several JavaScript Profiler traces, and generally found the same results: 25-50ms maximum spent via sampled profiles under our JavaScript library.
I eventually dug into the Lighthouse source to understand how the Main-Thread Blocking Time was calculated, and came across
third-party-summary.js
:lighthouse/core/audits/third-party-summary.js
Lines 102 to 114 in 516d32c
So for each task, find its source URL and sum
.blockingTime
if it's over 50ms.This function gets the source URL from
getAttributeableURLForTask
:lighthouse/core/lib/tracehouse/task-summary.js
Lines 45 to 58 in 516d32c
The above function looks for any matching 3P script URLs, and falls back to the "first" URL of the task if it can't determine a cause.
When I looked back at the trace, I think I found the actual cause of the "blocking" task -- a Run Microtasks which has ~80 items. This looks possibly like the DCL event, and all related tasks executed:
The only time our boomerang.js is in that Run Microtasks of 80 sub-tasks is the very last
FunctionCall
, of 0.1ms:In the trace JSON, we see that
RunMicrotasks
w/ 80 subtasks:And the previous
FunctionCall
happened to be our script appearing once:Debugging this in Node also confirmed that it's just a big bunch of tasks, and our boomerang.js just happens to be the first in the
attributableURLs
list:If we take that 83.955
selfTime
and multiply by 4 (cpuMultiplier) minus 50ms we get the Main-Thread Blocking Time reported in the original audit of 286ms.So unfortunately because our boomerang.js script has one task executing at DCL that took ~0.1ms, we're unluckily being blamed for a 83.955ms Microtask chunk that gets shown as 286ms of Main-Thread Blocking Time time.
d'oh!
I've run Lighthouse a bunch of times on the same URL, and some times the Main-Thread Blocking Time is 0ms, and other times it's that much higher value. It's up to luck of the draw. I suspect the times it's 0ms, it's not the
[0]
th entry inattributableURLs
or 3P URLs list.What did you expect?
Ultimately I think the question is how we should deal with a microtask-execution chunk when there are many microtasks from various sources involved. Task attribution is hard, and I'm not sure how to "assign blame" to a Run Microtasks that has a lot of sub-calls to various other scripts/handlers/callbacks/timeouts.
In the above examples where boomerang.js was unlucky, it's ~0.1ms task(s) happen to get blamed for the whole chunk of microtask time.
Since this audit's purpose is to highlight third-party scripts that are behaving badly, I'd like to suggest we make a change to the calculation to ensure it's more accurate and repeatable.
I'm not a Chrome or Lighthouse expert so I may be missing some context on how this all works, but with my understanding here are some possible ideas for how we could deal with microtask chunks:
.blockingTime
calculation for 3P URLs (this may hide real problems)FunctionCall
s independently for.blockingTime
consideration.FunctionCall
s grouped by URL. Consider the sum() for each URL for.blockingTime
calculation.FunctionCall
s etc that are related to that specific URL. Combine all related ones for that URL in that Run Microtasks, and if it's > 50(?)% of the total Run Microtasks execution time, declare that URL to be blamed for the whole Run Microtasks.Personally, I think (3) sounds the most logical -- essentially, group each execution of Microtasks by the 3P URLs and treat their weight of runtime independently, to see if they cross the
.blockingTime
threshold (50ms).I'm happy to take a stab at the code changes for this, if there's agreement on the approach.
What have you tried?
No response
How were you running Lighthouse?
CLI, Chrome DevTools
Lighthouse Version
10.0.1
Chrome Version
113.0.0.0
Node Version
16.19.1
OS
Windows
Relevant log output
No response
The text was updated successfully, but these errors were encountered: