Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] @osd/optimizer runs for very long time with no output #3181

Open
pjfitzgibbons opened this issue Jan 5, 2023 · 7 comments
Open

[BUG] @osd/optimizer runs for very long time with no output #3181

pjfitzgibbons opened this issue Jan 5, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@pjfitzgibbons
Copy link
Contributor

Describe the bug

@osd/optimizer package is set to run by default on yarn start. This feature runs async code-compilation work at runtime on the server. This package runs for a very long time (MINUTES), meanwhile, the application is "broken" on the browser.
Symptoms may include :

  • "OpenSearch Dashboards did not load properly"
    image

  • Application error on browser with stack trace and advised workaround "Clear your session".

To Reproduce

Steps to reproduce the behavior:
This issue has intermittent behaviour. Most commonly, this occurrs after switching branches on local dev environment, especially between maj/min versions (main => 2.x for instance).

  1. Go to './'
  2. Swich git branch, make significant code change, et.
  3. yarn start
  4. Log output shows http server running at http://localhost:5603/xyz
  5. Navigate to URL on browser - experience reported error for many minutes, until finally browser load "works" without any other intervention.

Expected behavior
Some log output of some kind indicating that Optimizer is actively working on processes. The async and file-watch behaviour may preclude Optimizer from declaring "finished" - observation of the log should indicate when Optimizer activity has relatively ended over time.

OpenSearch Version
3.0, 2.x

Dashboards Version
3.0, 2.x - "main" branch

Plugins

All default plugins.

Screenshots

If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: Linux, Macos
  • Chrome 108

Additional context

Add any other context about the problem here.

@pjfitzgibbons pjfitzgibbons added bug Something isn't working untriaged labels Jan 5, 2023
@joshuarrrr
Copy link
Member

See also #2331

@ashwin-pc
Copy link
Member

Having done a little deep dive here, here's my findings. The error seen here is the result of 3 async processes resulting in the beautiful error that you see.

  1. The error itself is because the failure function in the main handlebar template that renders the SPA. This failure is because the bundles havent finished building but the server itself has technically started up. so when the page requests the subsequent bundles, it gets a 404 from the server thats still compiling the bundles. The error message is misleading here because there are actually no server logs because the server has no actual error, its just bundling.
  2. The server console shows that the app has started up: Server running at http://localhost:5601. This is because the core server startup sequence and the optimizer bundling process happen in parallel and it is not aware of the optimizers status
  3. The Optimizer itself spins up workers for each bundle that isnt cached asyncronously and outputs its result to the console.

If you know what to look for its quite easy to tell when this is happening. e.g. for a fresh install when nothing is cached:

  1. The optimizer first outputs initialized, 0 bundles cached indicating that its begun the bundling process and then begins to spin up worker for the bundling
  2. The server then outputs Server running at http://localhost:5601 stating that the HTTP server has initialized and can start to accept requests. At this time the optimizer might still be bundling.
  3. After bundling the optimizer outputs 61 bundles compiled successfully after 73.1 sec, watching for changes stating that the bundling is done.

If we access the page after this, we dont see the above error. This will not happen if any of the above conditions arent true. e.g. If the bundles are already cached or if the user only attempts to access the page after about 2 minutes, etc.

@pjfitzgibbons, the expected behavior you are asking for does occur. If the optimiser is still working on bundling you should see the log output [info][@osd/optimizer] starting worker [4 bundles] followed by 61 bundles compiled successfully after 73.1 sec, watching for changes when its done bundling. And if ther is no need for bundling you should see all bundles cached, success after 0.3 sec. Unfortunately there are other log outputs that might make you miss these since they run in parallel.

I dont have a perfect solution at hand for this, but here are a few ideas I had:

  1. Update the HTTP server log output to callout that the optimizer might still be bundling.
  2. Add additional log outputs to the optimizer process to better indicate whats taking place and that the server isnt ready yet.
  3. Trigger the server startup only after the optimizer has finished bundling.

I'm also not a fan of how we surface the big red banner but if we implement any of the above solutions, it will atleast be accurate since the server will have more information.

@joshuali925
Copy link
Member

@ashwin-pc is it possible to expose optimizer status somewhere, maybe through API, or existence of some file, or a recommended ps command? So i can tell if server is optimizing without scrolling through logs

I find it easy to miss [info][@osd/optimizer] starting worker [1 bundle] and work on cached outdated results in browser. The optimizer only uses a single cpu core and is very slow, takes about 2 minutes to compile.

@ashwin-pc
Copy link
Member

@joshuali925 I've outlined some suggestions on how we can expose this status. The ones that im leaning towards the most is exposing it via the html when you load the Dashboards page. Another thing thats available to you today is actually just looking out for the [success][@osd/optimizer] log message. there should only be one of those and if your shell supports color codes, this should look distinct from the other log messages and recognizable quite easily.

Screenshot 2023-04-26 at 2 18 04 PM

The reason i suggest this over a more complex solution is because there are plans to get rid of the optimizer code as a whole, so im not sure how much time we want to invest in a problem that has at least a partial solution for now.

@joshuali925
Copy link
Member

@ashwin-pc Thanks for looking into this

there are plans to get rid of the optimizer code as a whole

That would be great and I agree it's not worth it to improve messaging on optimizer. For now I've been grepping logs to find the latest successful compile time. I'll know it's stuck at compiling if the time does not change after saving source code.

$ awk '/bundles compiled successfully after/{t=$4} END{print t}' dashboards.log  # [20:58:52.777]

@ashwin-pc
Copy link
Member

@joshuali925 is the optimizer taking 2 mins even after the first startup? the contents should be cached after the initial run and only rebuild the plugins that change. Are you working on a core plugin like the data plugin or something that needs you to actively monitor the optimizer?

@joshuali925
Copy link
Member

@ashwin-pc I don't know what triggers the starting worker. It usually happens if I add some debugging lines to a core plugin, but I've also seen it occasionally when working on observability plugin.

The starting worker takes at least 2 minutes (depends on how many bundles). Plugin code change is quick and usually under 1 second. Changing scss style files takes about 6 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants