Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic induced by wireless network usage? #77

Closed
tslilc opened this issue Jan 19, 2019 · 27 comments
Closed

Kernel panic induced by wireless network usage? #77

tslilc opened this issue Jan 19, 2019 · 27 comments

Comments

@tslilc
Copy link

tslilc commented Jan 19, 2019

I compiled the master branch in a clean Debian vm, ran the InstallToInternal.sh and i believe that wireless usage is inducing kernel panics on a fairly clean install (no DE's, manual wpa association & dhclient). Sometimes it happens during association, other times during usage. I have attached some pictures of the kernel panics.

There don't seem to be any logs that i can find of these events. If there's any way i can provide more information please let me know!

img-20190118-202751
kp2
kp3

@SolidHal
Copy link
Owner

Thats no good. Must be due to a regression in the kernel between kernel 4.17.2 and 4.17.19.

Looks like interrupt request handling is what ends up panicing.

For now, you can switch your checkout to commit 6333149 to use 4.17.2 instead of 4.17.9.

@SolidHal
Copy link
Owner

Going through for sanities sake:
No device tree changes, no changes to the open wifi firmware, dma and ath kernel drivers are mostly unchanged.

Some dma handling changes in dwc2, so I'll test reverting the dwc2 tree.
The cros_ec spi/i2c drivers are unchanged.

There are some changes in the i2c driver, and given the panics I'll test reverting that tree as well.

@SolidHal
Copy link
Owner

This commit in the touchpad driver could also be at fault
f1f3d22d65f1e657826f5515b6b6b38728082d9a

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/input/mouse/?h=v4.17.19&id=f1f3d22d65f1e657826f5515b6b6b38728082d9a

@SolidHal
Copy link
Owner

@tslilc

do you happen to see errors similar to these in the kernel logs before the panic, or even just randomly?

dwc2 f72c0000.usb: dwc2_hc_chhltd_intr_dma: Channel 0 - ChHltd set,
but reason is unknown
dwc2 f72c0000.usb: hcint 0x00000002, intsts 0x06200029

dwc2 f72c0000.usb: dwc2_hc_chhltd_intr_dma: Channel 11 - ChHltd set,
but reason is unknown
dwc2 f72c0000.usb: hcint 0x00000002, intsts 0x04200029

To make logging and debugging hangs easier, I do this https://github.com/SolidHal/PrawnOS/wiki/Using-the-debug-usb-uart-serial-on-the-Asus-C201

@tslilc
Copy link
Author

tslilc commented Jan 23, 2019

@SolidHal

Thanks for your continued and amazing work on this project. Already i have a 90% functional libre laptop and couldn't be happier.

Yes, i can confirm that i see these types of errors both randomly and leading up to the panic. The numbers are a little different, but i don't think that's a relevant difference. It seems to happen more often when the USB wifi is plugged into the port closest to the screen.

Thanks for the advice, i'll try to test whether this happens with 4.17.2, thanks!

@SolidHal
Copy link
Owner

@tslilc Thank you for the kind words! I hope to make it a 100% functional libre laptop!

Alright, then I am seeing the same issue in my 4.19.15 tests.

It seems to happen more often when the USB wifi is plugged into the port closest to the screen

I noticed the same, probably a dwc2 oddity.

@SolidHal
Copy link
Owner

Heres my steps for reproducing the issue reliably:

  1. Connect to a wireless network
  2. if that doesn't cause issue, download a large file like a 4GB debian dvd image.

Using kernel 4.19.15 with dwc2 and ath trees from 4.17.2 I can reliably download the debian dvd image multiple times. The ath tree alone didn't cut it, and I'm not convinced it is needed so I'll test without it.

One caveat, with the 4.17.2 trees and USB_DWC2_DEBUG set in the kernel config tons of these messages are thrown

[ 59.285317] dwc2 ff540000.usb: --Host Channel 7 Interrupt: Transaction Error--
[  162.662343] dwc2 ff540000.usb: --Host Channel 8 Interrupt: Transaction Error--
[  163.035968] dwc2 ff540000.usb: --Host Channel 15 Interrupt: Transaction Error--
[  173.322006] dwc2 ff540000.usb: --Host Channel 5 Interrupt: Transaction Error--
[  174.491225] dwc2 ff540000.usb: --Host Channel 14 Interrupt: Transaction Error--
[  180.492403] dwc2 ff540000.usb: --Host Channel 4 Interrupt: Transaction Error--

and seem to replace the dwc2_hc_chhltd_intr_dma errors seen with later versions dwc2 trees.

This points to changes in dwc2 between 4.17.2 and 4.17.19 that make these transaction errors into a more noticeable issue. Unsure if the transaction errors result in data corruption. Testing this.

One note on usb transaction errors is that the are allowable by spec and should not result in data corruption, so if ath and dwc2 are written correctly transaction errors aren't a huge concern.

@SolidHal
Copy link
Owner

One important thing: The driver used with wpa_supplicant.
If I use wext it hangs even with the 4.17.2 dwc2 tree. With nl80211 I get far fewer transaction errors and it doesn't seem to panic.

@Anthony-Sensors
Copy link
Contributor

I haven't manage to reproduce this issue using 4.17.19

@SolidHal
Copy link
Owner

SolidHal commented Feb 19, 2019

@Anthony-Sensors Huh, are you using the repo as-is or do you have some modifications?
EDIT: Also, are you using the same ath9k download in /build/ that you were using previously to build 4.17.2?

@Anthony-Sensors
Copy link
Contributor

Anthony-Sensors commented Feb 19, 2019

I'm using your release alpha version 6. I'm using wireless on usb port closest to me. I haven't experience this issue yet.

@SolidHal
Copy link
Owner

Looks like the issue I was actually experiencing was #83. Now that I have that figured out, I can try to figure out why this issue happens.

Sucks when the debug tools have issues.

@SolidHal
Copy link
Owner

SolidHal commented Mar 7, 2019

@tslilc Did you happen to specify a driver with -D when running wpa_supplicant?
I'm finding some correlation between the nl80211 driver and this crash.

@tslilc
Copy link
Author

tslilc commented Mar 11, 2019 via email

@tslilc
Copy link
Author

tslilc commented Mar 16, 2019

@SolidHal, i should say that some time ago i installed a (3.3V! no need for any extra wiring) AR9271 usb WiFi adapter to the webcam connector and i haven't had any issues at all -- even with 4.17.19 from your development branch -- on both wext (tested a little) and nl80211 (tested far more). Could this be something about the external USB ports?

@SolidHal
Copy link
Owner

@tslilc Yeah, that's part of the reason I didn't notice that this issue has existed since the 4.17.2 releases. The bug is due to how the dwc2 drivers, which handle the usb ports and the ath9k devices interact.
I've been debugging it when I have the free time, but its slow going.

@tslilc
Copy link
Author

tslilc commented Mar 18, 2019

@SolidHal i see. Well i'm certainly grateful for your continued efforts!

Unfortunately, for the time being, i think this sort of hardware hacking and debugging is somewhat above my head.

@SolidHal
Copy link
Owner

I think I've completed this chase. Moving ipv6 back in to the kernel instead of building it as a module seems to fix this. The other issues I was experiencing seem to be a bug in enabling the dwc2 periodic debug and SOF debugging, which is annoying.

With the image I'm about to push as a release I was able to download files continuously overnight using the chromium browser from debian unstable apt install -t unstable chromium
I chose to use this over firefox-esr as all of the available firefox-esr builds are still buggy in weird places that I don't want to dig in to right now.

I also set all of the sleep and display turn off sliders to never in the settings.

@tslilc
Copy link
Author

tslilc commented May 27, 2019

@SolidHal thanks for your hard work on this. Unfortunately i’m travelling right now (with the c201) and so don’t have access to an external USB WiFi device to test. I’ll be sure to try it when i’m back though. Thanks again!

@ergofroggy
Copy link

I believe I've gotten the same problem as @tslilc. I was trying your Alpha 9 release, with XFCE. Clean install, resize, reboot and try to associate to wifi on first login. System completely freezes and needs a hard shutdown. On reboot everything seems to work, can open apps, mount hard drives. I'll try building an image based on 6333149, as you suggested

@SolidHal
Copy link
Owner

@robinde, could you share what brand/model of ath9271 dongle you have?

@SolidHal
Copy link
Owner

I'm haven't been able to recreate these crashes on version 9 unfortunately, probably just getting lucky. I did come accross two arm cpu errata that the chrome os team tried to get mainlined that fix hangs on the rk3288. https://patchwork.kernel.org/patch/10909833/ and https://patchwork.kernel.org/patch/10909835/.

Maybe the wireless device is causing the specific cpu states that they refer to?
I've pulled them in and moved up to 4.19.53 in the latest release. I also disabled most power management to see if that is causing it.

I tested with what will be alpha version 10 for 7 hours, and haven't had a crash yet (although thats not any different than my experience with the previous version.)

When version 10 finished uploading, could you guys test it out when you have a chance? @tslilc @robinde @ifbizo

My test process for anyone that is interested is:

  1. Boot up the laptop
  2. log in to the xfce install
  3. Plug in the ar9271
  4. Use the xfce network manager to connect to a network
  5. open firefox, stream video on autoplay
  6. in another tab, download all three parts of a debian 9 amd64 release
  7. in a terminal, run ping <some ip>

as the debian downloads finish, delete them and queue them up again.

@tehbra1n
Copy link

I'm not sure if this is helpful, but during the InstallPackages script, with the ar9271 plugged in, I got this panic:.

image

My main issue though continues to be #95 even with Alpha 10, even without the ar9271 plugged in. I'm not convinced there isn't a local hardware issue with my machine.

@SolidHal
Copy link
Owner

@tehbra1n Yes it is! Thank you. If you finished the install after that, was the wireless working?
That seems to be a different panic than what tslilc was experiencing, so definitely interesting...

If it happens again, and the system is usable can you capture the output of sudo dmesg and upload it here?

@tehbra1n
Copy link

@tehbra1n Yes it is! Thank you. If you finished the install after that, was the wireless working?
That seems to be a different panic than what tslilc was experiencing, so definitely interesting...

If it happens again, and the system is usable can you capture the output of sudo dmesg and upload it here?

After fixing my trackpad I moved on to alpha 11 with no repeat of that kind of panic.

@ergofroggy
Copy link

I tested out Alpha 11 this weekend, seems like things are working really well. I was using the TPE-N150USB, which has an AR9271 chipset.
No crashes, good throughput, seems stable.

@SolidHal
Copy link
Owner

This issue and #102 refer to the same problem. Since this one is a bit older, and many of the logs predate quite a few fixes I'm going to close this one and keep #102 which contains more recent logs.

Please post any updates to #102.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants