Fixes issues #212 and #77 #215

smsilva98 · 2022-08-09T04:33:38Z

When request_firmware runs its prints an error message
"Direct firmware load for i915/gvt/vid_0x8086_did_0x5917_rid_0x07.golden_hw_state failed with error -2"
into the logs, as mentioned in issue #77 this is not a real error message and should be removed. firmware_request_nowarn is the proper system call to use. It runs the exact same code as request_firmware except passes an extra bit to the internally called _request_firmware to disable the branch of code that prints the error message. I posted and discussed some of the functioning of the internal code of request_firmware in issue #212

As a lockmap takes a reference for every ww_mutex used together, this can be an arbitrarily large number and under control of userspace -- easily overflowing the arbitrary limit of 4096. However, the pin_count (used for detecting unexpected lock dropping) is a full 32b despite nesting being extremely rare (see lockdep_pin_lock). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190425092004.9995-33-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

We have recently turned on ftrace-dump-on-oops for i915's CI and an issue we have encountered is that the trace buffer size greatly exceeds the pstore capabilities; we get the tail of the oops but not the introduction. Currently the global buffer size is controllable on the cmdline, but at the request of our CI sysadmin, we would like to add a control to the Kconfig as well. The rationale being the cmdline carries the temporary hacks that we want to eradicate, and we want to track the permanent configuration in .config. I have kept the Kconfig option hidden from the user as the default should suffice for the majority of users; reserving the configuration for those that eschew the cmdline option. v2: Add an expert prompt to stop the default value overriding .config changes. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tomi Sarvela <tomi.p.sarvela@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Most systems keep the last messages from the panic, and we value the stacktrace most, so dump it last in order to preserve it for post-mortems. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Martin Peres <martin.peres@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180903131745.30593-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Under CI testing, it is common for the cpus to overheat with the continuous workloads and end up being throttled. As the cpus still function, it is less of a critical error meriting urgent action, but an expected yet significant condition (pr_note). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Petri Latvala <petri.latvala@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Petri Latvala <petri.latvala@intel.com> [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

In commit 1fd7e41 ("perf/core: Remove perf_cpu_context::unique_pmu"), the search for another user of the pmu_cpu_context was removed, and so we unconditionally free it during perf_pmu_unregister. This leads to random corruption later and a BUG at mm/percpu.c:689. v2: Check for shared pmu_contexts under the mutex. Fixes: 1fd7e41 ("perf/core: Remove perf_cpu_context::unique_pmu") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: <stable@vger.kernel.org> # v4.11+ Link: http://patchwork.freedesktop.org/patch/msgid/20170512114525.17575-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Work around the following boot time crash: [ 10.456056] CPU: 1 PID: 220 Comm: systemd-udevd Tainted: G W 4.17.0-rc7-CI-CI_DRM_4040+ intel#182 [ 10.465828] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP, BIOS +ICLSFWR1.R00.2204.A00.1805172221 05/17/2018 [ 10.479168] RIP: 0010:acpi_ps_complete_this_op+0xa7/0x22a [ 10.484627] RSP: 0018:ffffc900003a7578 EFLAGS: 00010202 [ 10.489881] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8804abeda9c8 RCX: 0000000000000020 [ 10.497045] RDX: 0000000000000000 RSI: ffff88049e604a68 RDI: 0000000000000000 [ 10.504213] RBP: 0000000000000000 R08: ffff8804abeda9c8 R09: 0000000000000000 [ 10.511376] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000e [ 10.518542] R13: ffff88049e604a68 R14: ffff88049e604a68 R15: ffffffffa00263c2 [ 10.525713] FS: 00007ff6d85f18c0(0000) GS:ffff8804be880000(0000) knlGS:0000000000000000 [ 10.533839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.539616] CR2: 00007ff6d73cff40 CR3: 000000049f794001 CR4: 0000000000760ee0 [ 10.546783] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.553949] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.561112] PKRU: 55555554 [ 10.563849] Call Trace: [ 10.566323] acpi_ps_complete_op+0x49/0x3f1 [ 10.570537] acpi_ps_parse_loop+0x94c/0x9bb [ 10.574754] ? acpi_ds_delete_walk_state+0x113/0x131 [ 10.579750] acpi_ps_parse_aml+0x1a2/0x4af [ 10.583875] acpi_ps_execute_method+0x1e9/0x2a5 [ 10.588435] acpi_ns_evaluate+0x2e4/0x42c [ 10.592473] acpi_evaluate_object+0x1fd/0x3a8 [ 10.596873] usb_acpi_find_companion+0xee/0x1f0 [usbcore] [ 10.602319] acpi_platform_notify+0x33/0xa0 [ 10.606532] device_add+0x197/0x600 [ 10.610048] ? __init_waitqueue_head+0x36/0x50 [ 10.614529] usb_hub_create_port_device+0x11d/0x340 [usbcore] [ 10.620314] hub_probe+0x9a5/0x1010 [usbcore] [ 10.624701] ? _raw_spin_unlock_irqrestore+0x51/0x60 [ 10.629730] usb_probe_interface+0x13f/0x300 [usbcore] [ 10.634900] driver_probe_device+0x302/0x470 [ 10.639198] ? __driver_attach+0xe0/0xe0 [ 10.643147] bus_for_each_drv+0x59/0x90 [ 10.647013] __device_attach+0xb7/0x130 [ 10.650878] bus_probe_device+0x9c/0xb0 [ 10.654745] device_add+0x3c5/0x600 [ 10.658270] usb_set_configuration+0x540/0x880 [usbcore] [ 10.663621] generic_probe+0x28/0x80 [usbcore] [ 10.668097] driver_probe_device+0x302/0x470 [ 10.672393] ? __driver_attach+0xe0/0xe0 [ 10.676346] bus_for_each_drv+0x59/0x90 [ 10.680211] __device_attach+0xb7/0x130 [ 10.684076] bus_probe_device+0x9c/0xb0 [ 10.687940] device_add+0x3c5/0x600 [ 10.691464] usb_new_device+0x269/0x490 [usbcore] [ 10.696206] usb_add_hcd+0x558/0x850 [usbcore] [ 10.700682] xhci_pci_probe+0x13d/0x240 [xhci_pci] [ 10.705534] pci_device_probe+0xa1/0x130 [ 10.709484] driver_probe_device+0x302/0x470 [ 10.713784] __driver_attach+0xb9/0xe0 [ 10.717562] ? driver_probe_device+0x470/0x470 [ 10.722033] ? driver_probe_device+0x470/0x470 [ 10.726505] bus_for_each_dev+0x64/0x90 [ 10.730370] ? preempt_count_sub+0x92/0xd0 [ 10.734495] bus_add_driver+0x164/0x260 [ 10.738362] ? 0xffffffffa004e000 [ 10.741704] driver_register+0x57/0xc0 [ 10.745482] ? 0xffffffffa004e000 [ 10.748824] do_one_initcall+0x4a/0x350 [ 10.752690] ? do_init_module+0x22/0x20a [ 10.756643] ? rcu_read_lock_sched_held+0x74/0x80 [ 10.761377] ? kmem_cache_alloc_trace+0x284/0x2e0 [ 10.766114] do_init_module+0x5b/0x20a [ 10.769895] load_module+0x250d/0x2b20 [ 10.773678] ? kernel_read+0x2c/0x40 [ 10.777285] ? __se_sys_finit_module+0xaa/0xc0 [ 10.781759] __se_sys_finit_module+0xaa/0xc0 [ 10.786061] do_syscall_64+0x54/0x190 [ 10.789752] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 10.794831] RIP: 0033:0x7ff6d74664d9 [ 10.798430] RSP: 002b:00007ffd91e7dd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 10.806033] RAX: ffffffffffffffda RBX: 0000560519bfae20 RCX: 00007ff6d74664d9 [ 10.813195] RDX: 0000000000000000 RSI: 00007ff6d795ce23 RDI: 000000000000000e [ 10.820360] RBP: 00007ff6d795ce23 R08: 0000000000000000 R09: 0000000000000000 [ 10.827523] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000 [ 10.834690] R13: 0000560519bf9a30 R14: 0000000000020000 R15: 000000000aba9500 [ 10.841862] Code: c2 10 5f ea 81 48 c7 c6 f0 5e ea 81 bf 7c 00 00 00 e8 0d 7c 00 00 31 ed e9 88 01 00 00 48 8b 03 31 ed 48 85 c0 +0f 84 e9 00 00 00 <4c> 8b 60 28 4d 85 e4 0f 84 dc 00 00 00 0f b7 78 0a e8 62 fe ff [ 10.860832] RIP: acpi_ps_complete_this_op+0xa7/0x22a RSP: ffffc900003a7578 [ 10.867907] ---[ end trace 3a0d2ee1129bc71e ]--- Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Tested-by: Tomi Sarvela <tomi.p.sarvela@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180702135756.12159-1-imre.deak@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

There's the hung_task_panic sysctl, but that's a bit an extreme measure. As a fallback taint at least the machine. Our CI uses this to decide when a reboot is necessary, plus to figure out whether the kernel is still happy. v2: Works much better when I put the else { add_taint() } at the right place. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@linux.ibm.com> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: "Liu, Chuansheng" <chuansheng.liu@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (for core-for-CI) Link: https://patchwork.freedesktop.org/patch/msgid/20190502204648.5537-1-daniel.vetter@ffwll.ch Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

There's the soft/hardlookup_panic sysctls, but that's a bit an extreme measure. As a fallback taint at least the machine. Our CI uses this to decide when a reboot is necessary, plus to figure out whether the kernel is still happy. Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Laurence Oberman <loberman@redhat.com> Cc: Vincent Whitchurch <vincent.whitchurch@axis.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Sinan Kaya <okaya@kernel.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> (for core-for-CI) Link: https://patchwork.freedesktop.org/patch/msgid/20190502194208.3535-2-daniel.vetter@ffwll.ch Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2781 [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

We can't allow spam in CI. Update 26th June 2018: This is still an issue: Update 23rd May 2019: You guessed it, still ocurring. [ 224.739686] ------------[ cut here ]------------ [ 224.739712] WARNING: CPU: 3 PID: 2982 at net/sched/sch_generic.c:461 dev_watchdog+0x1fd/0x210 [ 224.739714] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_pcm i915 asix usbnet mii mei_me mei prime_numbers i2c_hid pinctrl_sunrisepoint pinctrl_intel btusb btrtl btbcm btintel bluetooth ecdh_generic [ 224.739775] CPU: 3 PID: 2982 Comm: gem_exec_suspen Tainted: G U W 4.18.0-rc2-CI-Patchwork_9414+ intel#1 [ 224.739777] Hardware name: Dell Inc. XPS 13 9350/, BIOS 1.4.12 11/30/2016 [ 224.739780] RIP: 0010:dev_watchdog+0x1fd/0x210 [ 224.739781] Code: 49 63 4c 24 f0 eb 92 4c 89 ef c6 05 21 46 ad 00 01 e8 77 ee fc ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 88 4c 14 82 e8 a3 fe 84 ff <0f> 0b eb be 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 c7 47 [ 224.739866] RSP: 0018:ffff88027dd83e40 EFLAGS: 00010286 [ 224.739869] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000102 [ 224.739871] RDX: 0000000080000102 RSI: ffffffff820c8c6c RDI: 00000000ffffffff [ 224.739873] RBP: ffff8802644c1540 R08: 0000000071be9b33 R09: 0000000000000000 [ 224.739874] R10: ffff88027dd83dc0 R11: 0000000000000000 R12: ffff8802644c1588 [ 224.739876] R13: ffff8802644c1160 R14: 0000000000000001 R15: ffff88026a5dc728 [ 224.739878] FS: 00007f18f4887980(0000) GS:ffff88027dd80000(0000) knlGS:0000000000000000 [ 224.739880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 224.739881] CR2: 00007f4c627ae548 CR3: 000000022ca1a002 CR4: 00000000003606e0 [ 224.739883] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 224.739885] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 224.739886] Call Trace: [ 224.739888] <IRQ> [ 224.739892] ? qdisc_reset+0xe0/0xe0 [ 224.739894] ? qdisc_reset+0xe0/0xe0 [ 224.739897] call_timer_fn+0x93/0x360 [ 224.739903] expire_timers+0xc1/0x1d0 [ 224.739908] run_timer_softirq+0xc7/0x170 [ 224.739916] __do_softirq+0xd9/0x505 [ 224.739923] irq_exit+0xa9/0xc0 [ 224.739926] smp_apic_timer_interrupt+0x9c/0x2d0 [ 224.739929] apic_timer_interrupt+0xf/0x20 [ 224.739931] </IRQ> [ 224.739934] RIP: 0010:delay_tsc+0x2e/0xb0 [ 224.739936] Code: 49 89 fc 55 53 bf 01 00 00 00 e8 6d 2c 78 ff e8 88 9d b6 ff 41 89 c5 0f ae e8 0f 31 48 c1 e2 20 48 09 c2 48 89 d5 eb 16 f3 90 <bf> 01 00 00 00 e8 48 2c 78 ff e8 63 9d b6 ff 44 39 e8 75 36 0f ae [ 224.740021] RSP: 0018:ffffc900002f7d48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 [ 224.740024] RAX: 0000000080000000 RBX: 0000000649565ca9 RCX: 0000000000000001 [ 224.740026] RDX: 0000000080000001 RSI: ffffffff820c8c6c RDI: 00000000ffffffff [ 224.740027] RBP: 00000006493ea9ce R08: 000000005e81e2ee R09: 0000000000000000 [ 224.740029] R10: 0000000000000120 R11: 0000000000000000 R12: 00000000002ad8d6 [ 224.740030] R13: 0000000000000003 R14: 0000000000000004 R15: ffff88025caf5408 [ 224.740040] ? delay_tsc+0x66/0xb0 [ 224.740045] hibernation_debug_sleep+0x1c/0x30 [ 224.740048] hibernation_snapshot+0x2c1/0x690 [ 224.740053] hibernate+0x142/0x2a4 [ 224.740057] state_store+0xd0/0xe0 [ 224.740063] kernfs_fop_write+0x104/0x190 [ 224.740068] __vfs_write+0x31/0x180 [ 224.740072] ? rcu_read_lock_sched_held+0x6f/0x80 [ 224.740075] ? rcu_sync_lockdep_assert+0x29/0x50 [ 224.740078] ? __sb_start_write+0x152/0x1f0 [ 224.740080] ? __sb_start_write+0x168/0x1f0 [ 224.740084] vfs_write+0xbd/0x1a0 [ 224.740088] ksys_write+0x50/0xc0 [ 224.740094] do_syscall_64+0x55/0x190 [ 224.740097] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 224.740099] RIP: 0033:0x7f18f400a281 [ 224.740100] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53 [ 224.740186] RSP: 002b:00007fffd1f4fec8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 224.740189] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f18f400a281 [ 224.740190] RDX: 0000000000000004 RSI: 00007f18f448069a RDI: 0000000000000006 [ 224.740192] RBP: 00007fffd1f4fef0 R08: 0000000000000000 R09: 0000000000000000 [ 224.740194] R10: 0000000000000000 R11: 0000000000000246 R12: 000055e795d03400 [ 224.740195] R13: 00007fffd1f50500 R14: 0000000000000000 R15: 0000000000000000 [ 224.740205] irq event stamp: 1582591 [ 224.740207] hardirqs last enabled at (1582590): [<ffffffff810f9f9c>] vprintk_emit+0x4bc/0x4d0 [ 224.740210] hardirqs last disabled at (1582591): [<ffffffff81a0111c>] error_entry+0x7c/0x100 [ 224.740212] softirqs last enabled at (1582568): [<ffffffff81c0034f>] __do_softirq+0x34f/0x505 [ 224.740215] softirqs last disabled at (1582571): [<ffffffff8108c959>] irq_exit+0xa9/0xc0 [ 224.740218] WARNING: CPU: 3 PID: 2982 at net/sched/sch_generic.c:461 dev_watchdog+0x1fd/0x210 [ 224.740219] ---[ end trace 6e41d690e611c338 ]--- References: https://bugzilla.kernel.org/show_bug.cgi?id=196399 Acked-by: Martin Peres <martin.peres@linux.intel.com> Cc: Martin Peres <martin.peres@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170718082110.12524-1-daniel.vetter@ffwll.ch Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Since the kernel now used hashed pointers for raw addresses, it is very hard to guage the relative placement within a section, and since the hash value will never match up with any contents, using it provides no information relevant for slab debugging. Show the relative offset into each section, so that some reference for the hexdump is provided. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Remove copious amounts of ./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0) as they are drowning out our warnings. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190727121750.20882-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190910142339.17072-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

CI currently applies intel_iommu=igfx_off on the commandline and we wish to ignore that until it is removed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190910134417.14085-3-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

This reverts commit cea35f5. References: https://lists.freedesktop.org/archives/intel-gfx/2019-November/218878.html Suggested-by: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Improve upon the <3> [310.437368] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10 by describing what delayed_work was queued instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200409114625.12251-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

If the MSI is already enabled, trying to enable it again results in an -EINVAL and on the first attempt a WARN. That WARN causes our CI to abort the run [on each first attempt to suspend]: <4> [463.142025] WARNING: CPU: 0 PID: 2225 at drivers/pci/msi.c:1074 __pci_enable_msi_range+0x3cb/0x420 <4> [463.142026] Modules linked in: snd_hda_intel i915 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec btusb btrtl btbcm btintel e1000e bluetooth snd_hwdep snd_hda_core ptp ecdh_generic snd_pcm ecc pps_core mei_me mei prime_numbers [last unloaded: i915] <4> [463.142045] CPU: 0 PID: 2225 Comm: kworker/u8:14 Tainted: G U 5.7.0-rc2-CI-CI_DRM_8350+ intel#1 <4> [463.142046] Hardware name: Intel Corporation NUC7i5BNH/NUC7i5BNB, BIOS BNKBL357.86A.0060.2017.1214.2013 12/14/2017 <4> [463.142049] Workqueue: events_unbound async_run_entry_fn <4> [463.142051] RIP: 0010:__pci_enable_msi_range+0x3cb/0x420 <4> [463.142053] Code: 76 58 49 8d 56 48 48 89 df e8 31 73 fd ff e9 20 fe ff ff 31 f6 48 89 df e8 c2 e9 fd ff e9 d6 fe ff ff 45 89 fc e9 1a ff ff ff <0f> 0b 41 bc ea ff ff ff e9 0d ff ff ff 41 bc ea ff ff ff e9 02 ff <4> [463.142054] RSP: 0018:ffffc90000593cd0 EFLAGS: 00010202 <4> [463.142056] RAX: 0000000000000010 RBX: ffff888274051000 RCX: 0000000000000000 <4> [463.142057] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff888274051000 <4> [463.142058] RBP: ffff888238aa1018 R08: 0000000000000001 R09: 0000000000000001 <4> [463.142060] R10: ffffc90000593d90 R11: 00000000c79cdfd5 R12: ffff8882740510b0 <4> [463.142061] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001 <4> [463.142062] FS: 0000000000000000(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000 <4> [463.142064] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [463.142065] CR2: 000055706f347d80 CR3: 0000000005610003 CR4: 00000000003606f0 <4> [463.142066] Call Trace: <4> [463.142073] pci_enable_msi+0x11/0x20 <4> [463.142077] azx_resume+0x1ab/0x200 [snd_hda_intel] <4> [463.142080] ? pci_pm_thaw+0x80/0x80 <4> [463.142084] dpm_run_callback+0x64/0x280 <4> [463.142089] device_resume+0xd4/0x1c0 <4> [463.142093] ? dpm_watchdog_set+0x60/0 While this would appear to be a bug in snd-hda, it does appear inconsequential, at least for gfx-ci. Downgrade the warning to an info, like the other already-enabled error for MSI-X. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1687 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423082753.3899018-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

We inadvertently create a dependency on mmap_sem with a whole chain. This breaks any user who wants to take a lock and call rcu_barrier(), while also taking that lock inside mmap_sem: <4> [604.892532] ====================================================== <4> [604.892534] WARNING: possible circular locking dependency detected <4> [604.892536] 5.6.0-rc7-CI-Patchwork_17096+ intel#1 Tainted: G U <4> [604.892537] ------------------------------------------------------ <4> [604.892538] kms_frontbuffer/2595 is trying to acquire lock: <4> [604.892540] ffffffff8264a558 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190 <4> [604.892547] but task is already holding lock: <4> [604.892547] ffff888484716050 (reservation_ww_class_mutex){+.+.}, at: i915_gem_object_pin_to_display_plane+0x89/0x270 [i915] <4> [604.892592] which lock already depends on the new lock. <4> [604.892593] the existing dependency chain (in reverse order) is: <4> [604.892594] -> intel#6 (reservation_ww_class_mutex){+.+.}: <4> [604.892597] __ww_mutex_lock.constprop.15+0xc3/0x1090 <4> [604.892598] ww_mutex_lock+0x39/0x70 <4> [604.892600] dma_resv_lockdep+0x10e/0x1f5 <4> [604.892602] do_one_initcall+0x58/0x300 <4> [604.892604] kernel_init_freeable+0x17b/0x1dc <4> [604.892605] kernel_init+0x5/0x100 <4> [604.892606] ret_from_fork+0x24/0x50 <4> [604.892607] -> intel#5 (reservation_ww_class_acquire){+.+.}: <4> [604.892609] dma_resv_lockdep+0xec/0x1f5 <4> [604.892610] do_one_initcall+0x58/0x300 <4> [604.892610] kernel_init_freeable+0x17b/0x1dc <4> [604.892611] kernel_init+0x5/0x100 <4> [604.892612] ret_from_fork+0x24/0x50 <4> [604.892613] -> intel#4 (&mm->mmap_sem#2){++++}: <4> [604.892615] __might_fault+0x63/0x90 <4> [604.892617] _copy_to_user+0x1e/0x80 <4> [604.892619] perf_read+0x200/0x2b0 <4> [604.892621] vfs_read+0x96/0x160 <4> [604.892622] ksys_read+0x9f/0xe0 <4> [604.892623] do_syscall_64+0x4f/0x220 <4> [604.892624] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [604.892625] -> intel#3 (&cpuctx_mutex){+.+.}: <4> [604.892626] __mutex_lock+0x9a/0x9c0 <4> [604.892627] perf_event_init_cpu+0xa4/0x140 <4> [604.892629] perf_event_init+0x19d/0x1cd <4> [604.892630] start_kernel+0x362/0x4e4 <4> [604.892631] secondary_startup_64+0xa4/0xb0 <4> [604.892631] -> intel#2 (pmus_lock){+.+.}: <4> [604.892633] __mutex_lock+0x9a/0x9c0 <4> [604.892633] perf_event_init_cpu+0x6b/0x140 <4> [604.892635] cpuhp_invoke_callback+0x9b/0x9d0 <4> [604.892636] _cpu_up+0xa2/0x140 <4> [604.892637] do_cpu_up+0x61/0xa0 <4> [604.892639] smp_init+0x57/0x96 <4> [604.892639] kernel_init_freeable+0x87/0x1dc <4> [604.892640] kernel_init+0x5/0x100 <4> [604.892642] ret_from_fork+0x24/0x50 <4> [604.892642] -> intel#1 (cpu_hotplug_lock.rw_sem){++++}: <4> [604.892643] cpus_read_lock+0x34/0xd0 <4> [604.892644] rcu_barrier+0xaa/0x190 <4> [604.892645] kernel_init+0x21/0x100 <4> [604.892647] ret_from_fork+0x24/0x50 <4> [604.892647] -> #0 (rcu_state.barrier_mutex){+.+.}: <4> [604.892649] __lock_acquire+0x1328/0x15d0 <4> [604.892650] lock_acquire+0xa7/0x1c0 <4> [604.892651] __mutex_lock+0x9a/0x9c0 <4> [604.892652] rcu_barrier+0x23/0x190 <4> [604.892680] i915_gem_object_unbind+0x29d/0x3f0 [i915] <4> [604.892707] i915_gem_object_pin_to_display_plane+0x141/0x270 [i915] <4> [604.892737] intel_pin_and_fence_fb_obj+0xec/0x1f0 [i915] <4> [604.892767] intel_plane_pin_fb+0x3f/0xd0 [i915] <4> [604.892797] intel_prepare_plane_fb+0x13b/0x5c0 [i915] <4> [604.892798] drm_atomic_helper_prepare_planes+0x85/0x110 <4> [604.892827] intel_atomic_commit+0xda/0x390 [i915] <4> [604.892828] drm_atomic_helper_set_config+0x57/0xa0 <4> [604.892830] drm_mode_setcrtc+0x1c4/0x720 <4> [604.892830] drm_ioctl_kernel+0xb0/0xf0 <4> [604.892831] drm_ioctl+0x2e1/0x390 <4> [604.892833] ksys_ioctl+0x7b/0x90 <4> [604.892835] __x64_sys_ioctl+0x11/0x20 <4> [604.892835] do_syscall_64+0x4f/0x220 <4> [604.892836] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [604.892837] Changes since v1: - Use (*values)[n++] in perf_read_one(). Changes since v2: - Centrally allocate values. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200502171413.9133-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Since Tigerlake seems to have inherited its cstates and other rapl power caps from Icelake, assume it also follows Icelake for its rapl events. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Audio component in i915 is not hooked up yet causing long timeouts and angry abortive CI. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2805 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2874 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

# Conflicts: # drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c # drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h # drivers/gpu/drm/drm_aperture.c # drivers/gpu/drm/i915/display/intel_dp.c # drivers/gpu/drm/vc4/vc4_drv.c

# Conflicts: # drivers/acpi/sleep.c # kernel/locking/lockdep.c # kernel/time/timer.c

Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220521111145.81697-49-Julia.Lawall@inria.fr Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Fix the following W=1 kernel warnings: drivers/gpu/drm/i915/gvt/handlers.c:3066: warning: expecting prototype for intel_t_default_mmio_write(). Prototype was for intel_vgpu_default_mmio_write() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220524083733.67148-2-jiapeng.chong@linux.alibaba.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Fix the following W=1 kernel warnings: drivers/gpu/drm/i915/gvt/mmio_context.c:560: warning: expecting prototype for intel_gvt_switch_render_mmio(). Prototype was for intel_gvt_switch_mmio() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220524083733.67148-1-jiapeng.chong@linux.alibaba.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>

Fix the following W=1 kernel warnings: drivers/gpu/drm/i915/gvt/aperture_gm.c:308: warning: expecting prototype for inte_gvt_free_vgpu_resource(). Prototype was for intel_vgpu_free_resource() instead. drivers/gpu/drm/i915/gvt/aperture_gm.c:344: warning: expecting prototype for intel_alloc_vgpu_resource(). Prototype was for intel_vgpu_alloc_resource() instead. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220602073519.22363-1-jiapeng.chong@linux.alibaba.com Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>

There is a spelling mistake in a gvt_vgpu_err error message. Fix it. Fixes: 695fbc0 ("drm/i915/gvt: replace the gvt_err with gvt_vgpu_err") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Zhi Wang <zhi.a.wang@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20220315202449.2952845-1-colin.i.king@gmail.com Reviewed-by: Zhi Wang <zhi.a.wang@intel.com>

When request_firmware runs its prints an error message "Direct firmware load for i915/gvt/vid_0x8086_did_0x5917_rid_0x07.golden_hw_state failed with error -2 " into the logs, as mentioned in issue intel#77 this is not a real error message and should be removed. firmware_request_nowarn is the proper system call to use. It runs the exact same code as request_firmware except passes an extra bit to the internally called _request_firmware to disable the branch of code that prints the error message. I posted and discussed some of the internal code in issue intel#212

smsilva98 · 2022-08-09T05:19:37Z

Information to help verify the commit

Error message raised
Direct firmware load for i915/gvt/vid_0x8086_did_0x5917_rid_0x07.golden_hw_state failed with error -2

Link to the documentation for firmware_request_nowarn
https://docs.kernel.org/driver-api/firmware/request_firmware.html#firmware-request-nowarn

Implementation for request_firmware

int request_firmware(const struct firmware **firmware_p, const char *name, struct device *device)
{
	int ret;

	/* Need to pin this module until return */
	__module_get(THIS_MODULE);
	ret = _request_firmware(firmware_p, name, device, NULL, 0, 0,
				FW_OPT_UEVENT);
	module_put(THIS_MODULE);
	return ret;
}

Implementation for firmware_request_nowarn

int firmware_request_nowarn(const struct firmware **firmware, const char *name, struct device *device)
{
	int ret;

	/* Need to pin this module until return */
	__module_get(THIS_MODULE);
	ret = _request_firmware(firmware, name, device, NULL, 0, 0,
				FW_OPT_UEVENT | FW_OPT_NO_WARN);
	module_put(THIS_MODULE);
	return ret;
}

The only difference between these two functions is firmware_request_nowarn adds FW_OPT_NOWARN to the opt_flags bitstring passed into _request_firmware. This disables the branch of code that prints the error message with no other side effects. The branch that prints the error message is towards the bottom of the code.

Implementation for _request_firmware

static int _request_firmware(const struct firmware **firmware_p, const char *name,
		  struct device *device, void *buf, size_t size,
		  size_t offset, u32 opt_flags)
{
	struct firmware *fw = NULL;
	struct cred *kern_cred = NULL;
	const struct cred *old_cred;
	bool nondirect = false;
	int ret;

	if (!firmware_p)
		return -EINVAL;

	if (!name || name[0] == '\0') {
		ret = -EINVAL;
		goto out;
	}

	ret = _request_firmware_prepare(&fw, name, device, buf, size,
					offset, opt_flags);
	if (ret <= 0) /* error or already assigned */
		goto out;

	/*
	 * We are about to try to access the firmware file. Because we may have been
	 * called by a driver when serving an unrelated request from userland, we use
	 * the kernel credentials to read the file.
	 */
	kern_cred = prepare_kernel_cred(NULL);
	if (!kern_cred) {
		ret = -ENOMEM;
		goto out;
	}
	old_cred = override_creds(kern_cred);

	ret = fw_get_filesystem_firmware(device, fw->priv, "", NULL);

	/* Only full reads can support decompression, platform, and sysfs. */
	if (!(opt_flags & FW_OPT_PARTIAL))
		nondirect = true;

#ifdef CONFIG_FW_LOADER_COMPRESS_ZSTD
	if (ret == -ENOENT && nondirect)
		ret = fw_get_filesystem_firmware(device, fw->priv, ".zst",
						 fw_decompress_zstd);
#endif
#ifdef CONFIG_FW_LOADER_COMPRESS_XZ
	if (ret == -ENOENT && nondirect)
		ret = fw_get_filesystem_firmware(device, fw->priv, ".xz",
						 fw_decompress_xz);
#endif
	if (ret == -ENOENT && nondirect)
		ret = firmware_fallback_platform(fw->priv);

	if (ret) {
		if (!(opt_flags & FW_OPT_NO_WARN))
			dev_warn(device,
				 "Direct firmware load for %s failed with error %d\n",
				 name, ret);
		if (nondirect)
			ret = firmware_fallback_sysfs(fw, name, device,
						      opt_flags, ret);
	} else
		ret = assign_fw(fw, device);

	revert_creds(old_cred);
	put_cred(kern_cred);

 out:
	if (ret < 0) {
		fw_abort_batch_reqs(fw);
		release_firmware(fw);
		fw = NULL;
	}

	*firmware_p = fw;
	return ret;
}

smsilva98 · 2022-08-17T00:26:57Z

@TerrenceXu I'm not sure who I should reach out to, but can I get someone to take a look at this pull request? Thank you!

ickle and others added 30 commits May 2, 2022 14:27

libata: Downgrade unsupported feature warnings to notifications

d5eacbf

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Petri Latvala <petri.latvala@intel.com> [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

sched: Mark "RT throttling activated" as KERN_NOTICE

23c6f67

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2781 [danvet: Rebase] Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

drm/i915: Force compilation with intel-iommu for CI validation

447d5cb

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190910142339.17072-1-chris@chris-wilson.co.uk Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

HAX suspend: Disable S3/S4 for fi-bdw-samus

c21406f

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

HAX sound: Disable probing snd_hda with DG1

f96d0b4

Audio component in i915 is not hooked up yet causing long timeouts and angry abortive CI. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

HAX net/phy: Suppress WARN for calling stop while halted

6c1c156

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2805 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

HAX net/phy: Suppress WARN from phy_error

c10ab29

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2874 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge remote-tracking branch 'drm-misc/drm-misc-fixes' into drm-tip

6cf54f2

Merge remote-tracking branch 'drm-intel/drm-intel-fixes' into drm-tip

7494c1e

Merge remote-tracking branch 'drm/drm-next' into drm-tip

fca6518

# Conflicts: # drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c # drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h # drivers/gpu/drm/drm_aperture.c # drivers/gpu/drm/i915/display/intel_dp.c # drivers/gpu/drm/vc4/vc4_drv.c

Merge remote-tracking branch 'drm-misc/drm-misc-next' into drm-tip

12359a7

Merge remote-tracking branch 'drm-intel/drm-intel-next' into drm-tip

d39799b

Merge remote-tracking branch 'drm-intel/drm-intel-gt-next' into drm-tip

916a0ee

sravnborg and others added 13 commits July 9, 2022 16:01

Merge remote-tracking branch 'drm-intel/topic/core-for-CI' into drm-tip

696cadd

# Conflicts: # drivers/acpi/sleep.c # kernel/locking/lockdep.c # kernel/time/timer.c

drm-tip: 2022y-07m-09d-14h-01m-16s UTC integration manifest

ebea934

Merge remote-tracking branch 'vfio-upstream/for-linus' into gvt-staging

320cd26

Merge remote-tracking branch 'intel-iommu/iommu/fixes' into gvt-staging

1936863

Merge remote-tracking branch 'origin/gvt-fixes' into gvt-staging

9620296

Merge remote-tracking branch 'origin/gvt-next' into gvt-staging

aca05bf

gvt-staging: 2022y-07m-11d-13h-06m-27s CST integration manifest

aaa6894

smsilva98 changed the title ~~Fixes issue #212 and #77~~ Fixes issues #212 and #77 Aug 9, 2022

smsilva98 changed the base branch from gvt-staging to gvt-fixes October 20, 2022 01:47

smsilva98 changed the base branch from gvt-fixes to gvt-staging April 20, 2024 01:14

zhenyw force-pushed the gvt-staging branch from e0598d0 to 980de4c Compare May 7, 2024 02:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issues #212 and #77 #215

Fixes issues #212 and #77 #215

smsilva98 commented Aug 9, 2022 •

edited

Loading

smsilva98 commented Aug 9, 2022 •

edited

Loading

smsilva98 commented Aug 17, 2022

Fixes issues #212 and #77 #215

Are you sure you want to change the base?

Fixes issues #212 and #77 #215

Conversation

smsilva98 commented Aug 9, 2022 • edited Loading

smsilva98 commented Aug 9, 2022 • edited Loading

smsilva98 commented Aug 17, 2022

smsilva98 commented Aug 9, 2022 •

edited

Loading

smsilva98 commented Aug 9, 2022 •

edited

Loading