Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GVT-g crashes on Broadwell: *ERROR* [CRTC:45:pipe A] flip_done timed out #222

Open
darkbasic opened this issue Nov 30, 2022 · 2 comments
Open

Comments

@darkbasic
Copy link

I'm using the HD Graphics 5500 (BDW GT2) of my Core i7-5600U and I'm trying to use GVT-g with KVM/libvirt.
Distro is Arch Linux amd64 and the kernel is the latest 6.0.10.

I saw that DMAR has been disabled in recent kernels for Broadwell, but AFAIK that shouldn't be required for GVT-g.
Arch Linux's wiki tells you to add intel_iommu=on to your kernel parameters: from my testing nothing really changes with intel_iommu=off so I guess that's not really necessary for GVT-g as well.
I also couldn't get SPICE with egl-headless working due to this bug, so I've reverted to spice over a unix socket.

Here is the host dmesg:
dmesg_host.txt

I use the smallest virtual GPU as possible for the guest:

# cat /sys/devices/pci0000\:00/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V4_8/description
low_gm_size: 64MB
high_gm_size: 384MB
fence: 4
resolution: 1024x768
weight: 2

# echo "cdf78c0b-d494-452c-bdd3-cb85796c5539" > /sys/devices/pci0000\:00/0000\:00\:02.0/mdev_supported_types/i915-GVTg_V4_8/create

These are the relevant parts of the guest xml:

    <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='on'>
      <source>
        <address uuid='cdf78c0b-d494-452c-bdd3-cb85796c5539'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <graphics type='spice'>
      <listen type='none'/>
      <gl enable='yes'/>
    </graphics>
    <video>
      <model type='none'/>
    </video>

NOTE: since most the times GVT-g was failing I often ended up setting display='off', disabling gl and setting video's model type='qxl' to be able to see something.

  1. Windows 11 doesn't crash but cannot install the Intel drivers for the GPU either: something failed and in the usual Windows way you have no way to debug what's happening. It doesn't fall back to VGA drivers which is weird, so I had to use QXL to see something.

  2. Next I've tried Fedora 37 and it doesn't even manage to boot:
    error3_fedora

i915 0000:07:00.0: [drm] iGVT-g is active, disabling use of stolen memory
i915 0000:07:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
i915 0000:07:00.0: [drm] Failed to find VBIOS tables (VBT)
  1. With Ubuntu 22.04 LTS I've finally managed to display a picture on the screen and it apparently works well but behind the scenes it segfaults:
[    3.649471] ------------[ cut here ]------------
[    3.649474] i915 0000:07:00.0: vblank wait timed out on crtc 0
[    3.649540] WARNING: CPU: 2 PID: 187 at drivers/gpu/drm/drm_vblank.c:1269 drm_wait_one_vblank+0x1e4/0x200 [drm]
[    3.649576] Modules linked in: hid_generic usbhid hid i915(+) i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul crc32_pclmul fb_sys_fops ghash_clmulni_intel cec virtio_net rc_core xhci_pci net_failover aesni_intel crypto_simd i2c_i801 drm ahci cryptd psmouse virtio_blk i2c_smbus libahci lpc_ich video xhci_pci_renesas failover
[    3.649603] CPU: 2 PID: 187 Comm: systemd-udevd Not tainted 5.15.0-43-generic #46-Ubuntu
[    3.649609] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[    3.649612] RIP: 0010:drm_wait_one_vblank+0x1e4/0x200 [drm]
[    3.649639] Code: ff ff 49 8b 7c 24 08 4c 8b 77 50 4d 85 f6 74 26 e8 f1 f3 31 d2 44 89 e9 4c 89 f2 48 c7 c7 10 5c 34 c0 48 89 c6 e8 e3 e7 78 d2 <0f> 0b e9 85 fe ff ff 4c 8b 27 eb 93 4c 8b 37 eb d5 e8 a6 43 83 d2
[    3.649640] RSP: 0018:ffffa357c01c75f8 EFLAGS: 00010282
[    3.649645] RAX: 0000000000000000 RBX: 0000000000000010 RCX: ffffffff93d7a468
[    3.649648] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000003
[    3.649649] RBP: ffffa357c01c7650 R08: 0000000000000003 R09: fffffffffffcdb58
[    3.649650] R10: 0000000000ffff0a R11: 0000000000000001 R12: ffff8f9a5b780000
[    3.649651] R13: 0000000000000000 R14: ffff8f9a40d3dc40 R15: ffff8f9a59b70030
[    3.649653] FS:  00007fba834f38c0(0000) GS:ffff8f9ba2100000(0000) knlGS:0000000000000000
[    3.649654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.649656] CR2: 00007ff5e7234a50 CR3: 00000001008bc006 CR4: 0000000000370ee0
[    3.649659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.649660] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    3.649661] Call Trace:
[    3.649666]  <TASK>
[    3.649671]  ? wait_woken+0x70/0x70
[    3.649679]  drm_crtc_wait_one_vblank+0x17/0x20 [drm]
[    3.649702]  hsw_disable_ips+0xa8/0x180 [i915]
[    3.649833]  intel_pre_plane_update+0x230/0x650 [i915]
[    3.649943]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[    3.649957]  ? try_to_wake_up+0x1fc/0x5a0
[    3.649964]  ? ilk_validate_pipe_wm+0x7f/0xd0 [i915]
[    3.650050]  intel_update_crtc+0xa4/0x440 [i915]
[    3.650161]  ? intel_atomic_commit_fence_wait+0xbe/0xe0 [i915]
[    3.650271]  intel_commit_modeset_enables+0x74/0x90 [i915]
[    3.650382]  intel_atomic_commit_tail+0x405/0xb70 [i915]
[    3.650492]  ? intel_atomic_commit_ready+0x50/0x54 [i915]
[    3.650604]  ? __i915_sw_fence_complete+0x114/0x1c0 [i915]
[    3.650692]  intel_atomic_commit+0x390/0x410 [i915]
[    3.650813]  drm_atomic_commit+0x4a/0x50 [drm]
[    3.650838]  intel_initial_commit+0x17b/0x200 [i915]
[    3.650950]  intel_modeset_init+0x23/0x80 [i915]
[    3.651059]  i915_driver_probe+0x1dd/0x470 [i915]
[    3.651140]  ? mutex_lock+0x13/0x40
[    3.651146]  i915_pci_probe+0x58/0x140 [i915]
[    3.651215]  ? _raw_spin_unlock_irqrestore+0xe/0x30
[    3.651218]  local_pci_probe+0x4b/0x90
[    3.651227]  pci_device_probe+0x115/0x1f0
[    3.651229]  really_probe+0x21e/0x420
[    3.651234]  __driver_probe_device+0x115/0x190
[    3.651236]  driver_probe_device+0x23/0xc0
[    3.651238]  __driver_attach+0xbd/0x1d0
[    3.651241]  ? __device_attach_driver+0x110/0x110
[    3.651243]  bus_for_each_dev+0x7e/0xc0
[    3.651245]  driver_attach+0x1e/0x20
[    3.651247]  bus_add_driver+0x135/0x200
[    3.651250]  driver_register+0x95/0xf0
[    3.651257]  __pci_register_driver+0x68/0x70
[    3.651260]  i915_register_pci_driver+0x23/0x30 [i915]
[    3.651331]  i915_init+0x3e/0xfc [i915]
[    3.651397]  ? 0xffffffffc04f9000
[    3.651401]  do_one_initcall+0x48/0x1d0
[    3.651408]  ? kmem_cache_alloc_trace+0x19e/0x2e0
[    3.651414]  do_init_module+0x52/0x250
[    3.651418]  load_module+0xac9/0xbb0
[    3.651420]  __do_sys_finit_module+0xbf/0x120
[    3.651422]  __x64_sys_finit_module+0x18/0x20
[    3.651424]  do_syscall_64+0x5c/0xc0
[    3.651427]  ? syscall_exit_to_user_mode+0x27/0x50
[    3.651429]  ? __x64_sys_mmap+0x33/0x40
[    3.651433]  ? do_syscall_64+0x69/0xc0
[    3.651434]  ? exit_to_user_mode_prepare+0x37/0xb0
[    3.651438]  ? syscall_exit_to_user_mode+0x27/0x50
[    3.651441]  ? __x64_sys_newfstatat+0x1c/0x20
[    3.651445]  ? do_syscall_64+0x69/0xc0
[    3.651446]  ? do_syscall_64+0x69/0xc0
[    3.651448]  ? exit_to_user_mode_prepare+0x37/0xb0
[    3.651450]  ? syscall_exit_to_user_mode+0x27/0x50
[    3.651452]  ? __x64_sys_newfstatat+0x1c/0x20
[    3.651454]  ? do_syscall_64+0x69/0xc0
[    3.651456]  ? do_syscall_64+0x69/0xc0
[    3.651457]  ? do_syscall_64+0x69/0xc0
[    3.651458]  ? sysvec_call_function_single+0x4e/0x90
[    3.651461]  ? asm_sysvec_call_function_single+0xa/0x20
[    3.651463]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[    3.651466] RIP: 0033:0x7fba83beba3d
[    3.651479] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[    3.651480] RSP: 002b:00007ffcd888df78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    3.651482] RAX: ffffffffffffffda RBX: 0000559fee9af170 RCX: 00007fba83beba3d
[    3.651483] RDX: 0000000000000000 RSI: 00007fba83d82441 RDI: 0000000000000016
[    3.651486] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
[    3.651487] R10: 0000000000000016 R11: 0000000000000246 R12: 00007fba83d82441
[    3.651488] R13: 0000559fee9a1810 R14: 0000559fee98bee0 R15: 0000559fee9b1b60
[    3.651489]  </TASK>
[    3.651490] ---[ end trace c0aabf2716774df1 ]---
[   13.709511] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:45:pipe A] flip_done timed out
[   13.709560] fbcon: Taking over console
[   13.711482] [drm] Initialized i915 1.6.0 20201103 for 0000:07:00.0 on minor 0
[   13.829803] fbcon: i915drmfb (fb0) is primary device
[   23.949457] [drm:drm_crtc_commit_wait [drm]] *ERROR* flip_done timed out
[   23.949492] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:45:pipe A] commit wait timed out
[   23.967721] Console: switching to colour frame buffer device 128x48
[   23.984617] i915 0000:07:00.0: [drm] fb0: i915drmfb frame buffer device

Full guest dmesg:
dmesg_guest.txt

@darkbasic
Copy link
Author

I'v also tried intel/gvt-linux#gvt-staging from git and it still doesn't work.
I've also noticed the following errors in the host as soon as the guest boots:

dic 01 16:41:28 arch-laptop libvirtd[660]: g_hash_table_unref: assertion 'hash_table != NULL' failed
dic 01 16:41:59 arch-laptop kwin_wayland[1563]: kwin_core: Failed to open /dev/dri/card0 device (Device already taken)
dic 01 16:41:59 arch-laptop kwin_wayland[1563]: kwin_wayland_drm: failed to open drm device at "/dev/dri/card0"

@mrjutterson
Copy link

mrjutterson commented Mar 20, 2023

I have a similar problem with an intel hd 6000 gpu.

[drm] *ERROR* flip_done timed out
[drm] *ERROR* [CRTC:45:pipe A] commit wait timed out

The guest operating system boots but It hangs for a minute or two. All seems to work. GPU drivers development is above my skill level and I do not understand fully the debug logs and terminology. I would like to understand what is the bug about and find a workaround, but after a few days of troubleshooting I am at a dead end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants