Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[🐛 Bug]: Race condition in ruby library for capybara system tests #14454

Open
krschacht opened this issue Aug 28, 2024 · 7 comments
Open

[🐛 Bug]: Race condition in ruby library for capybara system tests #14454

krschacht opened this issue Aug 28, 2024 · 7 comments

Comments

@krschacht
Copy link

What happened?

I've been using successfully using Capybara in Rails for quite some time (many months). But one day, about a month ago, my system tests started sporadically failing in my Github CI Actions with Net::ReadTimeout with "Net::ReadTimeout with #<TCPSocket:(closed)>". If I re-run the test suite a few times I can eventually get it to successfully run through. I've tried many different workarounds but none of them work around the issue. I've tried rolling back all changes in my repo to months ago when tests were consistently passing, and that doesn't seem to fix it either.

We've spent many hours investigating the cause and we currently think there is a race condition somewhere between chromedriver and selenium. My project is an open source project so here is a direct link to one of the failed CI runs where you can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

The Net::ReadTimeout is coming from capybara (aka selenium) failing to hit chromedriver when attempting to set up the server. One of my engineers has outlined his read of that stack trace:

  • I think the tests run (and fail) before puma is started by capybara
  • The test hung because the server was still running and ruby wouldn't exit
  • It says the TCP socket was closed -- does this means the socket was open when it started but closed during the exchange? Or that it was never open? I suspect the former because the stack trace is in the middle of a read loop.
  • The failure is in the area of code which causes chromedriver to build a new session (ie, start chrome up):

Also, another thing that suggests a race condition is that when we SSH into the job mid-run, it sometimes fails or hangs for a bit. But if I interrupt the process (^c) and then re-run it, it goes fine.

Capybara Version: 3.39.2
Driver Information (and browser if relevant): selenium-webdriver (4.23.0) using headless chrome

How can we reproduce the issue?

1. On github you can [fork this repo](https://github.com/AllYourBot/hostedgpt)
2. I've configured the Github CI Actions to **not** run system tests on forks, but (a) [delete this line](https://github.com/AllYourBot/hostedgpt/blob/main/.github/workflows/rubyonrails.yml#L49) to remove the short circuit, and (b) change the very next "runs-on" line back to `ubuntu-latest` which are the default Github Action servers.
3. Push a change to the repo to trigger Github CI to run

Relevant log output

You can see the full stack trace: https://github.com/AllYourBot/hostedgpt/actions/runs/10533347868/job/29189182499?pr=498

Operating System

Alpine Linux

Selenium version

4.23.0 of selenium-webdriver gem

What are the browser(s) and version(s) where you see this issue?

Chrome

What are the browser driver(s) and version(s) where you see this issue?

ChromeDriver but not sure how to get version, latest, I think

Are you using Selenium Grid?

No

Copy link

@krschacht, thank you for creating this issue. We will troubleshoot it as soon as we can.


Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

@AnrichVS
Copy link

AnrichVS commented Sep 3, 2024

Hi,

I recently starting experiencing exactly what @krschacht describes.

I believe it might be related to the Chrome version. On my host OS (Arch) the issue occurs, and I'm running:

  • google-chrome 128.0.6613.84-1

Within a Docker container, with exactly the same code base (mounted from host OS), the issue doesn't occur. It is running:

  • chromium-112.0.5615.165-r0

I suspect this is related to the Chrome version since it only started happening on my host OS recently after having done a full system upgrade (which also upgraded Chrome).

@sickdyd
Copy link

sickdyd commented Sep 5, 2024

We are facing the same problem. Had specs working fine for years and since a month ago or so they started having Net::ReadTimeout: errors. I tried literally everything I could think of and searched everywhere online, nothing seems to fix the problem.

@sickdyd
Copy link

sickdyd commented Sep 10, 2024

I can confirm that the most recent versions of Chrome seem to be the root cause.

I could solve the problem by using version 126.0.6478.61 for both Chrome and chromedriver.

Not a permanent solution, but for the time being is better than having specs constantly failing.

Note that the Chrome installer requires to add -1 to the version in the download link.

# CHROME_DRIVER_VERSION=126.0.6478.61

- name: Install Chrome
  run: |
    # Download specific Chrome version
    wget https://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-stable/google-chrome-stable_${CHROME_DRIVER_VERSION}-1_amd64.deb
    # Install Chrome
    sudo apt-get install -y --allow-downgrades ./google-chrome-stable_${CHROME_DRIVER_VERSION}-1_amd64.deb

- name: Install ChromeDriver
  run: |
    wget "https://storage.googleapis.com/chrome-for-testing-public/${CHROME_DRIVER_VERSION}/linux64/chromedriver-linux64.zip"
    unzip chromedriver-linux64.zip
    sudo mv chromedriver-linux64/chromedriver /usr/local/bin/
    rm chromedriver-linux64.zip
    rm -rf chromedriver-linux64

@tvdeyen
Copy link

tvdeyen commented Sep 10, 2024

We experience the same issues. We pinned the Chrome version to 127 with this setup

 Capybara.register_driver :selenium_chrome_headless do |app|
   options = ::Selenium::WebDriver::Chrome::Options.new.tap do |opts|
     opts.add_argument("--headless")
     opts.add_argument("--disable-gpu") if Gem.win_platform?
     # Workaround https://bugs.chromium.org/p/chromedriver/issues/detail?id=2650&q=load&sort=-id&colspec=ID%20Status%20Pri%20Owner%20Summary
     opts.add_argument("--disable-site-isolation-trials")
     opts.add_argument("--window-size=1920,1080")
     opts.add_argument("--disable-search-engine-choice-screen")
+    opts.browser_version = "127"
   end
 
   Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
 end

and all tests run fine.

But it fails with Chrome 128

 Capybara.register_driver :selenium_chrome_headless do |app|
   options = ::Selenium::WebDriver::Chrome::Options.new.tap do |opts|
     opts.add_argument("--headless")
     opts.add_argument("--disable-gpu") if Gem.win_platform?
     # Workaround https://bugs.chromium.org/p/chromedriver/issues/detail?id=2650&q=load&sort=-id&colspec=ID%20Status%20Pri%20Owner%20Summary
     opts.add_argument("--disable-site-isolation-trials")
     opts.add_argument("--window-size=1920,1080")
     opts.add_argument("--disable-search-engine-choice-screen")
-    opts.browser_version = "127"
+    opts.browser_version = "128"
   end
 
   Capybara::Selenium::Driver.new(app, browser: :chrome, options: options)
 end

@glaszig
Copy link
Contributor

glaszig commented Sep 13, 2024

experiencing the same since 1 or 2 months. but i'm using firefox.

@ehutzelman
Copy link

Been seeing issues in system tests getting locked up since Chrome 128. Just updated to Chrome 129 and unfortunately still see the same issues. Looks like turning off headless allows the tests to run as expected, but not a great fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants