Please add "set the type of aliyun.qcow2 as virtio" #1

Vidos · 2019-06-13T08:58:29Z

When I boot the qcow2, the default type is IDE, but it should be virtio, so could you please add that to main page?

casparant · 2019-06-13T09:26:18Z

When I boot the qcow2, the default type is IDE, but it should be virtio, so could you please add that to main page?

Hi! virtio driver is already built inside in the image, you should set the target driver in your libvirt xml, virt-manager UI or in qemu-kvm command line by yourself. Which emulator and/or GUI tool are you using to boot the qcow2 image, please?

Vidos · 2019-06-13T09:38:40Z

Thanks for quick reply, you may misunderstand.
I have booted it.
Well when I use virt-manager to run this qcow2, the disk type will be set as IDE by default.
So I just suggest you add an explanation on the main page that the disk type is virtio, that will be more user-friendly.
Thanks so much.

casparant · 2019-06-13T09:44:27Z

Ah, I see. Will improve the documentation later. Thanks for the suggestion.

[ Upstream commit 6041186 ] When a module option, or core kernel argument, toggles a static-key it requires jump labels to be initialized early. While x86, PowerPC, and ARM64 arrange for jump_label_init() to be called before parse_args(), ARM does not. Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303 page_alloc_shuffle+0x12c/0x1ac static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used before call to jump_label_init() Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1 Hardware name: ARM Integrator/CP (Device Tree) [<c0011c68>] (unwind_backtrace) from [<c000ec48>] (show_stack+0x10/0x18) [<c000ec48>] (show_stack) from [<c07e9710>] (dump_stack+0x18/0x24) [<c07e9710>] (dump_stack) from [<c001bb1c>] (__warn+0xe0/0x108) [<c001bb1c>] (__warn) from [<c001bb88>] (warn_slowpath_fmt+0x44/0x6c) [<c001bb88>] (warn_slowpath_fmt) from [<c0b0c4a8>] (page_alloc_shuffle+0x12c/0x1ac) [<c0b0c4a8>] (page_alloc_shuffle) from [<c0b0c550>] (shuffle_store+0x28/0x48) [<c0b0c550>] (shuffle_store) from [<c003e6a0>] (parse_args+0x1f4/0x350) [<c003e6a0>] (parse_args) from [<c0ac3c00>] (start_kernel+0x1c0/0x488) Move the fallback call to jump_label_init() to occur before parse_args(). The redundant calls to jump_label_init() in other archs are left intact in case they have static key toggling use cases that are even earlier than option parsing. Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Guenter Roeck <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

…onally" [ Upstream commit 4a9c2e3 ] This reverts the first part of commit 4e485d0 ("strparser: Call skb_unclone conditionally"). To build a message with multiple fragments we need our own root of frag_list. We can't simply use the frag_list of orig_skb, because it will lead to linking all orig_skbs together creating very long frag chains, and causing stack overflow on kfree_skb() (which is called recursively on the frag_lists). BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5) kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP RIP: 0010:free_one_page+0x2b/0x490 Call Trace: __free_pages_ok+0x143/0x2c0 skb_release_data+0x8e/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 [...] skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 ? skb_release_data+0xad/0x140 kfree_skb+0x32/0xb0 skb_release_data+0xad/0x140 __kfree_skb+0xe/0x20 tcp_disconnect+0xd6/0x4d0 tcp_close+0xf4/0x430 ? tcp_check_oom+0xf0/0xf0 tls_sk_proto_close+0xe4/0x1e0 [tls] inet_release+0x36/0x60 __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0xa2/0x1d0 task_work_run+0x89/0xb0 exit_to_usermode_loop+0x9a/0xa0 do_syscall_64+0xc0/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Let's leave the second unclone conditional, as I'm not entirely sure what is its purpose :) Fixes: 4e485d0 ("strparser: Call skb_unclone conditionally") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 68be930 ] BUG: unable to handle kernel paging request at ffffffffa01c5430 PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bc5067 PTE 0 Oops: 0000 [#1 CPU: 0 PID: 6159 Comm: modprobe Not tainted 5.1.0+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:raw_notifier_chain_register+0x16/0x40 Code: 63 f8 66 90 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 8b 07 48 89 e5 48 85 c0 74 1c 8b 56 10 3b 50 10 7e 07 eb 12 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08 RSP: 0018:ffffc90001c33c08 EFLAGS: 00010282 RAX: ffffffffa01c5420 RBX: ffffffffa01db420 RCX: 4fcef45928070a8b RDX: 0000000000000000 RSI: ffffffffa01db420 RDI: ffffffffa01b0068 RBP: ffffc90001c33c08 R08: 000000003e0a33d0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000094443661 R12: ffff88822c320700 R13: ffff88823109be80 R14: 0000000000000000 R15: ffffc90001c33e78 FS: 00007fab8bd08540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa01c5430 CR3: 00000002297ea000 CR4: 00000000000006f0 Call Trace: register_netdevice_notifier+0x43/0x250 ? 0xffffffffa01e0000 dsa_slave_register_notifier+0x13/0x70 [dsa_core ? 0xffffffffa01e0000 dsa_init_module+0x2e/0x1000 [dsa_core do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Cleanup allocated resourses if there are errors, otherwise it will trgger memleak. Fixes: c9eb3e0 ("net: dsa: Add support for learning FDB through notification") Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit ee0df19 ] When changing the number of buffers in the RX ring while the interface is running, the following Oops is encountered due to the new number of buffers being taken into account immediately while their allocation is done when opening the device only. [ 69.882706] Unable to handle kernel paging request for data at address 0xf0000100 [ 69.890172] Faulting instruction address: 0xc033e164 [ 69.895122] Oops: Kernel access of bad area, sig: 11 [#1] [ 69.900494] BE PREEMPT CMPCPRO [ 69.907120] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.115-00006-g179ade8ce3-dirty #269 [ 69.915956] task: c0684310 task.stack: c06da000 [ 69.920470] NIP: c033e164 LR: c02e44d0 CTR: c02e41fc [ 69.925504] REGS: dfff1e20 TRAP: 0300 Not tainted (4.14.115-00006-g179ade8ce3-dirty) [ 69.934161] MSR: 00009032 <EE,ME,IR,DR,RI> CR: 22004428 XER: 20000000 [ 69.940869] DAR: f0000100 DSISR: 20000000 [ 69.940869] GPR00: c0352d70 dfff1ed0 c0684310 f00000a4 00000040 dfff1f68 00000000 0000001f [ 69.940869] GPR08: df53f410 1cc00040 00000021 c0781640 42004424 100c82b6 f00000a4 df53f5b0 [ 69.940869] GPR16: df53f6c0 c05daf84 00000040 00000000 00000040 c0782be4 00000000 00000001 [ 69.940869] GPR24: 00000000 df53f400 000001b0 df53f410 df53f000 0000003f df708220 1cc00044 [ 69.978348] NIP [c033e164] skb_put+0x0/0x5c [ 69.982528] LR [c02e44d0] ucc_geth_poll+0x2d4/0x3f8 [ 69.987384] Call Trace: [ 69.989830] [dfff1ed0] [c02e4554] ucc_geth_poll+0x358/0x3f8 (unreliable) [ 69.996522] [dfff1f20] [c0352d70] net_rx_action+0x248/0x30c [ 70.002099] [dfff1f80] [c04e93e4] __do_softirq+0xfc/0x310 [ 70.007492] [dfff1fe0] [c0021124] irq_exit+0xd0/0xd4 [ 70.012458] [dfff1ff0] [c000e7e0] call_do_irq+0x24/0x3c [ 70.017683] [c06dbe80] [c0006bac] do_IRQ+0x64/0xc4 [ 70.022474] [c06dbea0] [c001097c] ret_from_except+0x0/0x14 [ 70.027964] --- interrupt: 501 at rcu_idle_exit+0x84/0x90 [ 70.027964] LR = rcu_idle_exit+0x74/0x90 [ 70.037585] [c06dbf60] [20000000] 0x20000000 (unreliable) [ 70.042984] [c06dbf80] [c004bb0c] do_idle+0xb4/0x11c [ 70.047945] [c06dbfa0] [c004bd14] cpu_startup_entry+0x18/0x1c [ 70.053682] [c06dbfb0] [c05fb034] start_kernel+0x370/0x384 [ 70.059153] [c06dbff0] [00003438] 0x3438 [ 70.063062] Instruction dump: [ 70.066023] 38a00000 38800000 90010014 4bfff015 80010014 7c0803a6 3123ffff 7c691910 [ 70.073767] 38210010 4e800020 38600000 4e800020 <80e3005c> 80c30098 3107ffff 7d083910 [ 70.081690] ---[ end trace be7ccd9c1e1a9f12 ]--- This patch forbids the modification of the number of buffers in the ring while the interface is running. Fixes: ac42185 ("ucc_geth: add ethtool support") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 36096f2 ] kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1 CPU: 0 PID: 12914 Comm: rmmod Tainted: G W 5.1.0+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__list_del_entry_valid+0x53/0x90 Code: 48 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 00 5d c3 48 89 fe 48 89 c2 48 c7 c7 18 75 fe 82 e8 cb 34 78 ff <0f> 0b 48 89 fe 48 c7 c7 50 75 fe 82 e8 ba 34 78 ff 0f 0b 48 89 f2 RSP: 0018:ffffc90001c2fe40 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffffffffa0184000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888237a17788 RDI: 00000000ffffffff RBP: ffffc90001c2fe40 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90001c2fe10 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90001c2fe50 R14: ffffffffa0184000 R15: 0000000000000000 FS: 00007f3d83634540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555c350ea818 CR3: 0000000231677000 CR4: 00000000000006f0 Call Trace: unregister_pernet_operations+0x34/0x120 unregister_pernet_subsys+0x1c/0x30 packet_exit+0x1c/0x369 [af_packet __x64_sys_delete_module+0x156/0x260 ? lockdep_hardirqs_on+0x133/0x1b0 ? do_syscall_64+0x12/0x1f0 do_syscall_64+0x6e/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe When modprobe af_packet, register_pernet_subsys fails and does a cleanup, ops->list is set to LIST_POISON1, but the module init is considered to success, then while rmmod it, BUG() is triggered in __list_del_entry_valid which is called from unregister_pernet_subsys. This patch fix error handing path in packet_init to avoid possilbe issue if some error occur. Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 4014dfa ] The switch to make bas_gigaset use usb_fill_int_urb() - instead of filling that urb "by hand" - missed the subtle ordering of the previous code. See, before the switch urb->dev was set to a member somewhere deep in a complicated structure and then supplied to usb_rcvisocpipe() and usb_sndisocpipe(). After that switch urb->dev wasn't set to anything specific before being supplied to those two macros. This triggers a nasty oops: BUG: unable to handle kernel NULL pointer dereference at 00000000 #PF error: [normal kernel read fault] *pde = 00000000 Oops: 0000 [#1] SMP CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.1.0-0.rc4.1.local0.fc28.i686 #1 Hardware name: IBM 2525FAG/2525FAG, BIOS 74ET64WW (2.09 ) 12/14/2006 EIP: gigaset_init_bchannel+0x89/0x320 [bas_gigaset] Code: 75 07 83 8b 84 00 00 00 40 8d 47 74 c7 07 01 00 00 00 89 45 f0 8b 44 b7 68 85 c0 0f 84 6a 02 00 00 8b 48 28 8b 93 88 00 00 00 <8b> 09 8d 54 12 03 c1 e2 0f c1 e1 08 09 ca 8b 8b 8c 00 00 00 80 ca EAX: f05ec200 EBX: ed404200 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: f065a000 EBP: f30c9f40 ESP: f30c9f20 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010086 CR0: 80050033 CR2: 00000000 CR3: 0ddc7000 CR4: 000006d0 Call Trace: <SOFTIRQ> ? gigaset_isdn_connD+0xf6/0x140 [gigaset] gigaset_handle_event+0x173e/0x1b90 [gigaset] tasklet_action_common.isra.16+0x4e/0xf0 tasklet_action+0x1e/0x20 __do_softirq+0xb2/0x293 ? __irqentry_text_end+0x3/0x3 call_on_stack+0x45/0x50 </SOFTIRQ> ? irq_exit+0xb5/0xc0 ? do_IRQ+0x78/0xd0 ? acpi_idle_enter_s2idle+0x50/0x50 ? common_interrupt+0xd4/0xdc ? acpi_idle_enter_s2idle+0x50/0x50 ? sched_cpu_activate+0x1b/0xf0 ? acpi_fan_resume.cold.7+0x9/0x18 ? cpuidle_enter_state+0x152/0x4c0 ? cpuidle_enter+0x14/0x20 ? call_cpuidle+0x21/0x40 ? do_idle+0x1c8/0x200 ? cpu_startup_entry+0x25/0x30 ? rest_init+0x88/0x8a ? arch_call_rest_init+0xd/0x19 ? start_kernel+0x42f/0x448 ? i386_start_kernel+0xac/0xb0 ? startup_32_smp+0x164/0x168 Modules linked in: ppp_generic slhc capi bas_gigaset gigaset kernelcapi nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc ipw2200 iTCO_wdt gpio_ich snd_intel8x0 libipw iTCO_vendor_support snd_ac97_codec lib80211 ppdev ac97_bus snd_seq cfg80211 snd_seq_device pcspkr thinkpad_acpi lpc_ich snd_pcm i2c_i801 snd_timer ledtrig_audio snd soundcore rfkill parport_pc parport pcc_cpufreq acpi_cpufreq i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sdhci_pci sysimgblt cqhci fb_sys_fops drm sdhci mmc_core tg3 ata_generic serio_raw yenta_socket pata_acpi video CR2: 0000000000000000 ---[ end trace 1fe07487b9200c73 ]--- EIP: gigaset_init_bchannel+0x89/0x320 [bas_gigaset] Code: 75 07 83 8b 84 00 00 00 40 8d 47 74 c7 07 01 00 00 00 89 45 f0 8b 44 b7 68 85 c0 0f 84 6a 02 00 00 8b 48 28 8b 93 88 00 00 00 <8b> 09 8d 54 12 03 c1 e2 0f c1 e1 08 09 ca 8b 8b 8c 00 00 00 80 ca EAX: f05ec200 EBX: ed404200 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: f065a000 EBP: f30c9f40 ESP: cddcb3bc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010086 CR0: 80050033 CR2: 00000000 CR3: 0ddc7000 CR4: 000006d0 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0xcc00000 from 0xc0400000 (relocation range: 0xc0000000-0xf6ffdfff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- No-one noticed because this Oops is apparently only triggered by setting up an ISDN data connection on a live ISDN line on a gigaset base (ie, the PBX that the gigaset driver support). Very few people do that running present day kernels. Anyhow, a little code reorganization makes this problem go away, while avoiding the subtle ordering that was used in the past. So let's do that. Fixes: 78c696c ("isdn: gigaset: use usb_fill_int_urb()") Signed-off-by: Paul Bolle <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit f393562 upstream. When the memset code was added to pgd_alloc(), it failed to consider that kmem_cache_alloc() can return NULL. It's uncommon, but not impossible under heavy memory contention. Example oops: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000000a4000 Oops: Kernel access of bad area, sig: 11 [#1] LE SMP NR_CPUS=2048 NUMA pSeries CPU: 70 PID: 48471 Comm: entrypoint.sh Kdump: loaded Not tainted 4.14.0-115.6.1.el7a.ppc64le #1 task: c000000334a00000 task.stack: c000000331c00000 NIP: c0000000000a4000 LR: c00000000012f43c CTR: 0000000000000020 REGS: c000000331c039c0 TRAP: 0300 Not tainted (4.14.0-115.6.1.el7a.ppc64le) MSR: 800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 44022840 XER: 20040000 CFAR: c000000000008874 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1 ... NIP [c0000000000a4000] memset+0x68/0x104 LR [c00000000012f43c] mm_init+0x27c/0x2f0 Call Trace: mm_init+0x260/0x2f0 (unreliable) copy_mm+0x11c/0x638 copy_process.isra.28.part.29+0x6fc/0x1080 _do_fork+0xdc/0x4c0 ppc_clone+0x8/0xc Instruction dump: 409e000c b0860000 38c60002 409d000c 90860000 38c60004 78a0d183 78a506a0 7c0903a6 41820034 60000000 60420000 <f8860000> f8860008 f8860010 f8860018 Fixes: fc5c2f4 ("powerpc/mm/hash64: Zero PGD pages on allocation") Cc: [email protected] # v4.16+ Signed-off-by: Rick Lindsley <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit a3eec13 upstream. When using direct commands (DCMDs) on an RK3399, we get spurious CQE completion interrupts for the DCMD transaction slot (#31): [ 931.196520] ------------[ cut here ]------------ [ 931.201702] mmc1: cqhci: spurious TCN for tag 31 [ 931.206906] WARNING: CPU: 0 PID: 1433 at /usr/src/kernel/drivers/mmc/host/cqhci.c:725 cqhci_irq+0x2e4/0x490 [ 931.206909] Modules linked in: [ 931.206918] CPU: 0 PID: 1433 Comm: irq/29-mmc1 Not tainted 4.19.8-rt6-funkadelic #1 [ 931.206920] Hardware name: Theobroma Systems RK3399-Q7 SoM (DT) [ 931.206924] pstate: 40000005 (nZcv daif -PAN -UAO) [ 931.206927] pc : cqhci_irq+0x2e4/0x490 [ 931.206931] lr : cqhci_irq+0x2e4/0x490 [ 931.206933] sp : ffff00000e54bc80 [ 931.206934] x29: ffff00000e54bc80 x28: 0000000000000000 [ 931.206939] x27: 0000000000000001 x26: ffff000008f217e8 [ 931.206944] x25: ffff8000f02ef030 x24: ffff0000091417b0 [ 931.206948] x23: ffff0000090aa000 x22: ffff8000f008b000 [ 931.206953] x21: 0000000000000002 x20: 000000000000001f [ 931.206957] x19: ffff8000f02ef018 x18: ffffffffffffffff [ 931.206961] x17: 0000000000000000 x16: 0000000000000000 [ 931.206966] x15: ffff0000090aa6c8 x14: 0720072007200720 [ 931.206970] x13: 0720072007200720 x12: 0720072007200720 [ 931.206975] x11: 0720072007200720 x10: 0720072007200720 [ 931.206980] x9 : 0720072007200720 x8 : 0720072007200720 [ 931.206984] x7 : 0720073107330720 x6 : 00000000000005a0 [ 931.206988] x5 : ffff00000860d4b0 x4 : 0000000000000000 [ 931.206993] x3 : 0000000000000001 x2 : 0000000000000001 [ 931.206997] x1 : 1bde3a91b0d4d900 x0 : 0000000000000000 [ 931.207001] Call trace: [ 931.207005] cqhci_irq+0x2e4/0x490 [ 931.207009] sdhci_arasan_cqhci_irq+0x5c/0x90 [ 931.207013] sdhci_irq+0x98/0x930 [ 931.207019] irq_forced_thread_fn+0x2c/0xa0 [ 931.207023] irq_thread+0x114/0x1c0 [ 931.207027] kthread+0x128/0x130 [ 931.207032] ret_from_fork+0x10/0x20 [ 931.207035] ---[ end trace 0000000000000002 ]--- The driver shows this message only for the first spurious interrupt by using WARN_ONCE(). Changing this to WARN() shows, that this is happening quite frequently (up to once a second). Since the eMMC 5.1 specification, where CQE and CQHCI are specified, does not mention that spurious TCN interrupts for DCMDs can be simply ignored, we must assume that using this feature is not working reliably. The current implementation uses DCMD for REQ_OP_FLUSH only, and I could not see any performance/power impact when disabling this optional feature for RK3399. Therefore this patch disables DCMDs for RK3399. Signed-off-by: Christoph Muellner <[email protected]> Signed-off-by: Philipp Tomsich <[email protected]> Fixes: 84362d7 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1") Cc: [email protected] [the corresponding code changes are queued for 5.2 so doing that as well] Signed-off-by: Heiko Stuebner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

@address

…ed addresses commit fce86ff upstream. Starting with c6f3c5e ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") vmf_insert_pfn_pmd() internally calls pmdp_set_access_flags(). That helper enforces a pmd aligned @address argument via VM_BUG_ON() assertion. Update the implementation to take a 'struct vm_fault' argument directly and apply the address alignment fixup internally to fix crash signatures like: kernel BUG at arch/x86/mm/pgtable.c:515! invalid opcode: 0000 [#1] SMP NOPTI CPU: 51 PID: 43713 Comm: java Tainted: G OE 4.19.35 #1 [..] RIP: 0010:pmdp_set_access_flags+0x48/0x50 [..] Call Trace: vmf_insert_pfn_pmd+0x198/0x350 dax_iomap_fault+0xe82/0x1190 ext4_dax_huge_fault+0x103/0x1f0 ? __switch_to_asm+0x40/0x70 __handle_mm_fault+0x3f6/0x1370 ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 handle_mm_fault+0xda/0x200 __do_page_fault+0x249/0x4f0 do_page_fault+0x32/0x110 ? page_fault+0x8/0x30 page_fault+0x1e/0x30 Link: http://lkml.kernel.org/r/155741946350.372037.11148198430068238140.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: c6f3c5e ("mm/huge_memory.c: fix modifying of page protection by insert_pfn_pmd()") Signed-off-by: Dan Williams <[email protected]> Reported-by: Piotr Balcer <[email protected]> Tested-by: Yan Ma <[email protected]> Tested-by: Pankaj Gupta <[email protected]> Reviewed-by: Matthew Wilcox <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Cc: Chandan Rajendra <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 46ca3f7 upstream. The bug manifests as an attempt to access deallocated memory: BUG: unable to handle kernel paging request at ffff9c8735448000 #PF error: [PROT] [WRITE] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 80000007f5448161 Oops: 0003 [#1] PREEMPT SMP CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 RIP: 0010:__memmove+0x81/0x1a0 Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 <f3> 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 RSP: 0018:ffffa1b9002d7d08 EFLAGS: 00010203 RAX: ffff9c873541af43 RBX: ffff9c873541af43 RCX: 00000c6f105cd6bf RDX: 0000637882e986b6 RSI: ffff9c8735447ffb RDI: ffff9c8735447ffb RBP: ffff9c8739cd3800 R08: ffff9c873b802f00 R09: 00000000fffff73b R10: ffffffffb82b35f1 R11: 00505b1b004d5b1b R12: 0000000000000000 R13: ffff9c873541af3d R14: 000000000000000b R15: 000000000000000c FS: 00007f450c390580(0000) GS:ffff9c873f180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9c8735448000 CR3: 00000007e213c002 CR4: 00000000000606e0 Call Trace: vt_do_kdgkb_ioctl+0x34d/0x440 vt_ioctl+0xba3/0x1190 ? __bpf_prog_run32+0x39/0x60 ? mem_cgroup_commit_charge+0x7b/0x4e0 tty_ioctl+0x23f/0x920 ? preempt_count_sub+0x98/0xe0 ? __seccomp_filter+0x67/0x600 do_vfs_ioctl+0xa2/0x6a0 ? syscall_trace_enter+0x192/0x2d0 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The bug manifests on systemd systems with multiple vtcon devices: # cat /sys/devices/virtual/vtconsole/vtcon0/name (S) dummy device # cat /sys/devices/virtual/vtconsole/vtcon1/name (M) frame buffer device There systemd runs 'loadkeys' tool in tapallel for each vtcon instance. This causes two parallel ioctl(KDSKBSENT) calls to race into adding the same entry into 'func_table' array at: drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl() The function has no locking around writes to 'func_table'. The simplest reproducer is to have initrams with the following init on a 8-CPU machine x86_64: #!/bin/sh loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & wait The change adds lock on write path only. Reads are still racy. CC: Greg Kroah-Hartman <[email protected]> CC: Jiri Slaby <[email protected]> Link: https://lkml.org/lkml/2019/2/17/256 Signed-off-by: Sergei Trofimovich <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 448de47 upstream. [BUG] When reading a file from a fuzzed image, kernel can panic like: BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1 assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.h:3500! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs] Call Trace: btrfs_lookup_csum+0x52/0x150 [btrfs] __btrfs_lookup_bio_sums+0x209/0x640 [btrfs] btrfs_submit_bio_hook+0x103/0x170 [btrfs] submit_one_bio+0x59/0x80 [btrfs] extent_read_full_page+0x58/0x80 [btrfs] generic_file_read_iter+0x2f6/0x9d0 __vfs_read+0x14d/0x1a0 vfs_read+0x8d/0x140 ksys_read+0x52/0xc0 do_syscall_64+0x60/0x210 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] The fuzzed image has a corrupted leaf whose first key doesn't match its parent: checksum tree key (CSUM_TREE ROOT_ITEM 0) node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c ... key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19 leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE leaf 29761536 flags 0x1(WRITTEN) backref revision 1 fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6 chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244 range start 8798638964736 end 8798641262592 length 2297856 When reading the above tree block, we have extent_buffer->refs = 2 in the context: - initial one from __alloc_extent_buffer() alloc_extent_buffer() |- __alloc_extent_buffer() |- atomic_set(&eb->refs, 1) - one being added to fs_info->buffer_radix alloc_extent_buffer() |- check_buffer_tree_ref() |- atomic_inc(&eb->refs) So if even we call free_extent_buffer() in read_tree_block or other similar situation, we only decrease the refs by 1, it doesn't reach 0 and won't be freed right now. The staled eb and its corrupted content will still be kept cached. Furthermore, we have several extra cases where we either don't do first key check or the check is not proper for all callers: - scrub We just don't have first key in this context. - shared tree block One tree block can be shared by several snapshot/subvolume trees. In that case, the first key check for one subvolume doesn't apply to another. So for the above reasons, a corrupted extent buffer can sneak into the buffer cache. [FIX] Call verify_level_key in read_block_for_search to do another verification. For that purpose the function is exported. Due to above reasons, although we can free corrupted extent buffer from cache, we still need the check in read_block_for_search(), for scrub and shared tree blocks. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767 Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769 Reported-by: Yoon Jungyeon <[email protected]> CC: [email protected] # 4.19+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 03628cd upstream. During fiemap, for regular extents (non inline) we need to check if they are shared and if they are, set the shared bit. Checking if an extent is shared requires checking the delayed references of the currently running transaction, since some reference might have not yet hit the extent tree and be only in the in-memory delayed references. However we were using a transaction join for this, which creates a new transaction when there is no transaction currently running. That means that two more potential failures can happen: creating the transaction and committing it. Further, if no write activity is currently happening in the system, and fiemap calls keep being done, we end up creating and committing transactions that do nothing. In some extreme cases this can result in the commit of the transaction created by fiemap to fail with ENOSPC when updating the root item of a subvolume tree because a join does not reserve any space, leading to a trace like the following: heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]--- Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do a transaction join to avoid the overhead of creating a new transaction (if there is currently no running transaction) and introducing a potential point of failure when the new transaction gets committed, instead use a transaction attach to grab a handle for the currently running transaction if any. Reported-by: Christoph Anton Mitterer <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: afce772 ("btrfs: fix check_shared for fiemap ioctl") CC: [email protected] # 4.14+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit bfc61c3 upstream. When finding out which inodes have references on a particular extent, done by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both v1 and v2) ioctl and from scrub we use the transaction join API to grab a reference on the currently running transaction, since in order to give accurate results we need to inspect the delayed references of the currently running transaction. However, if there is currently no running transaction, the join operation will create a new transaction. This is inefficient as the transaction will eventually be committed, doing unnecessary IO and introducing a potential point of failure that will lead to a transaction abort due to -ENOSPC, as recently reported [1]. That's because the join, creates the transaction but does not reserve any space, so when attempting to update the root item of the root passed to btrfs_join_transaction(), during the transaction commit, we can end up failling with -ENOSPC. Users of a join operation are supposed to actually do some filesystem changes and reserve space by some means, which is not the case of iterate_extent_inodes(), it is a read-only operation for all contextes from which it is called. The reported [1] -ENOSPC failure stack trace is the following: heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]--- So fix that by using the attach API, which does not create a transaction when there is currently no running transaction. [1] https://lore.kernel.org/linux-btrfs/[email protected]/ Reported-by: Zygo Blaxell <[email protected]> CC: [email protected] # 4.4+ Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit a4b732a upstream. There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well. This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done & and results in the following oops, [ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Liang Chen <[email protected]> Cc: [email protected] Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 3ebe1bc ] BUG: unable to handle kernel paging request at ffffffffa018f000 PGD 3270067 P4D 3270067 PUD 3271063 PMD 2307eb067 PTE 0 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 4138 Comm: modprobe Not tainted 5.1.0-rc7+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:ppp_register_compressor+0x3e/0xd0 [ppp_generic] Code: 98 4a 3f e2 48 8b 15 c1 67 00 00 41 8b 0c 24 48 81 fa 40 f0 19 a0 75 0e eb 35 48 8b 12 48 81 fa 40 f0 19 a0 74 RSP: 0018:ffffc90000d93c68 EFLAGS: 00010287 RAX: ffffffffa018f000 RBX: ffffffffa01a3000 RCX: 000000000000001a RDX: ffff888230c750a0 RSI: 0000000000000000 RDI: ffffffffa019f000 RBP: ffffc90000d93c80 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0194080 R13: ffff88822ee1a700 R14: 0000000000000000 R15: ffffc90000d93e78 FS: 00007f2339557540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa018f000 CR3: 000000022bde4000 CR4: 00000000000006f0 Call Trace: ? 0xffffffffa01a3000 deflate_init+0x11/0x1000 [ppp_deflate] ? 0xffffffffa01a3000 do_one_initcall+0x6c/0x3cc ? kmem_cache_alloc_trace+0x248/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe If ppp_deflate fails to register in deflate_init, module initialization failed out, however ppp_deflate_draft may has been regiestred and not unregistered before return. Then the seconed modprobe will trigger crash like this. Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Acked-by: Guillaume Nault <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit ba95e5d ] Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are accessed (while handling interrupts) before they are initialized. [ 4.201410] BUG: unable to handle kernel paging request at ffffffffffffffe8 [ 4.207829] IP: vsock_addr_equals_addr+0x3/0x20 [ 4.211379] PGD 28210067 P4D 28210067 PUD 28212067 PMD 0 [ 4.211379] Oops: 0000 [#1] PREEMPT SMP PTI [ 4.211379] Modules linked in: [ 4.211379] CPU: 1 PID: 30 Comm: kworker/1:1 Not tainted 4.14.106-419297-gd7e28cc1f241 #1 [ 4.211379] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 4.211379] Workqueue: virtio_vsock virtio_transport_rx_work [ 4.211379] task: ffffa3273d175280 task.stack: ffffaea1800e8000 [ 4.211379] RIP: 0010:vsock_addr_equals_addr+0x3/0x20 [ 4.211379] RSP: 0000:ffffaea1800ebd28 EFLAGS: 00010286 [ 4.211379] RAX: 0000000000000002 RBX: 0000000000000000 RCX: ffffffffb94e42f0 [ 4.211379] RDX: 0000000000000400 RSI: ffffffffffffffe0 RDI: ffffaea1800ebdd0 [ 4.211379] RBP: ffffaea1800ebd58 R08: 0000000000000001 R09: 0000000000000001 [ 4.211379] R10: 0000000000000000 R11: ffffffffb89d5d60 R12: ffffaea1800ebdd0 [ 4.211379] R13: 00000000828cbfbf R14: 0000000000000000 R15: ffffaea1800ebdc0 [ 4.211379] FS: 0000000000000000(0000) GS:ffffa3273fd00000(0000) knlGS:0000000000000000 [ 4.211379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.211379] CR2: ffffffffffffffe8 CR3: 000000002820e001 CR4: 00000000001606e0 [ 4.211379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4.211379] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4.211379] Call Trace: [ 4.211379] ? vsock_find_connected_socket+0x6c/0xe0 [ 4.211379] virtio_transport_recv_pkt+0x15f/0x740 [ 4.211379] ? detach_buf+0x1b5/0x210 [ 4.211379] virtio_transport_rx_work+0xb7/0x140 [ 4.211379] process_one_work+0x1ef/0x480 [ 4.211379] worker_thread+0x312/0x460 [ 4.211379] kthread+0x132/0x140 [ 4.211379] ? process_one_work+0x480/0x480 [ 4.211379] ? kthread_destroy_worker+0xd0/0xd0 [ 4.211379] ret_from_fork+0x35/0x40 [ 4.211379] Code: c7 47 08 00 00 00 00 66 c7 07 28 00 c7 47 08 ff ff ff ff c7 47 04 ff ff ff ff c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 8b 47 08 <3b> 46 08 75 0a 8b 47 04 3b 46 04 0f 94 c0 c3 31 c0 c3 90 66 2e [ 4.211379] RIP: vsock_addr_equals_addr+0x3/0x20 RSP: ffffaea1800ebd28 [ 4.211379] CR2: ffffffffffffffe8 [ 4.211379] ---[ end trace f31cc4a2e6df3689 ]--- [ 4.211379] Kernel panic - not syncing: Fatal exception in interrupt [ 4.211379] Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 4.211379] Rebooting in 5 seconds.. Fixes: 22b5c0b ("vsock/virtio: fix kernel panic after device hot-unplug") Cc: Stefan Hajnoczi <[email protected]> Cc: Stefano Garzarella <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] [4.9+] Signed-off-by: Jorge E. Moreira <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Acked-by: Stefan Hajnoczi <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit de1887c ] We don't check for the validity of the lengths in the packet received from the firmware. If the MPDU length received in the rx descriptor is too short to contain the header length and the crypt length together, we may end up trying to copy a negative number of bytes (headlen - hdrlen < 0) which will underflow and cause us to try to copy a huge amount of data. This causes oopses such as this one: BUG: unable to handle kernel paging request at ffff896be2970000 PGD 5e201067 P4D 5e201067 PUD 5e205067 PMD 16110d063 PTE 8000000162970161 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 1824 Comm: irq/134-iwlwifi Not tainted 4.19.33-04308-geea41cf4930f #1 Hardware name: [...] RIP: 0010:memcpy_erms+0x6/0x10 Code: 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe RSP: 0018:ffffa4630196fc60 EFLAGS: 00010287 RAX: ffff896be2924618 RBX: ffff896bc8ecc600 RCX: 00000000fffb4610 RDX: 00000000fffffff8 RSI: ffff896a835e2a38 RDI: ffff896be2970000 RBP: ffffa4630196fd30 R08: ffff896bc8ecc600 R09: ffff896a83597000 R10: ffff896bd6998400 R11: 000000000200407f R12: ffff896a83597050 R13: 00000000fffffff8 R14: 0000000000000010 R15: ffff896a83597038 FS: 0000000000000000(0000) GS:ffff896be8280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff896be2970000 CR3: 000000005dc12002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iwl_mvm_rx_mpdu_mq+0xb51/0x121b [iwlmvm] iwl_pcie_rx_handle+0x58c/0xa89 [iwlwifi] iwl_pcie_irq_rx_msix_handler+0xd9/0x12a [iwlwifi] irq_thread_fn+0x24/0x49 irq_thread+0xb0/0x122 kthread+0x138/0x140 ret_from_fork+0x1f/0x40 Fix that by checking the lengths for correctness and trigger a warning to show that we have received wrong data. Signed-off-by: Luca Coelho <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 0c713cb upstream. When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode) that happens to be ranged, which happens during a msync() or writes for files opened with O_SYNC for example, we can end up with a corrupt log, due to different file extent items representing ranges that overlap with each other, or hit some assertion failures. When doing a ranged fsync we only flush delalloc and wait for ordered exents within that range. If while we are logging items from our inode ordered extents for adjacent ranges complete, we end up in a race that can make us insert the file extent items that overlap with others we logged previously and the assertion failures. For example, if tree-log.c:copy_items() receives a leaf that has the following file extents items, all with a length of 4K and therefore there is an implicit hole in the range 68K to 72K - 1: (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ... It copies them to the log tree. However due to the need to detect implicit holes, it may release the path, in order to look at the previous leaf to detect an implicit hole, and then later it will search again in the tree for the first file extent item key, with the goal of locking again the leaf (which might have changed due to concurrent changes to other inodes). However when it locks again the leaf containing the first key, the key corresponding to the extent at offset 72K may not be there anymore since there is an ordered extent for that range that is finishing (that is, somewhere in the middle of btrfs_finish_ordered_io()), and it just removed the file extent item but has not yet replaced it with a new file extent item, so the part of copy_items() that does hole detection will decide that there is a hole in the range starting from 68K to 76K - 1, and therefore insert a file extent item to represent that hole, having a key offset of 68K. After that we now have a log tree with 2 different extent items that have overlapping ranges: 1) The file extent item copied before copy_items() released the path, which has a key offset of 72K and a length of 4K, representing the file range 72K to 76K - 1. 2) And a file extent item representing a hole that has a key offset of 68K and a length of 8K, representing the range 68K to 76K - 1. This item was inserted after releasing the path, and overlaps with the extent item inserted before. The overlapping extent items can cause all sorts of unpredictable and incorrect behaviour, either when replayed or if a fast (non full) fsync happens later, which can trigger a BUG_ON() when calling btrfs_set_item_key_safe() through __btrfs_drop_extents(), producing a trace like the following: [61666.783269] ------------[ cut here ]------------ [61666.783943] kernel BUG at fs/btrfs/ctree.c:3182! [61666.784644] invalid opcode: 0000 [#1] PREEMPT SMP (...) [61666.786253] task: ffff880117b88c40 task.stack: ffffc90008168000 [61666.786253] RIP: 0010:btrfs_set_item_key_safe+0x7c/0xd2 [btrfs] [61666.786253] RSP: 0018:ffffc9000816b958 EFLAGS: 00010246 [61666.786253] RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000030000 [61666.786253] RDX: 0000000000000000 RSI: ffffc9000816ba4f RDI: ffffc9000816b937 [61666.786253] RBP: ffffc9000816b998 R08: ffff88011dae2428 R09: 0000000000001000 [61666.786253] R10: 0000160000000000 R11: 6db6db6db6db6db7 R12: ffff88011dae2418 [61666.786253] R13: ffffc9000816ba4f R14: ffff8801e10c4118 R15: ffff8801e715c000 [61666.786253] FS: 00007f6060a18700(0000) GS:ffff88023f5c0000(0000) knlGS:0000000000000000 [61666.786253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [61666.786253] CR2: 00007f6060a28000 CR3: 0000000213e69000 CR4: 00000000000006e0 [61666.786253] Call Trace: [61666.786253] __btrfs_drop_extents+0x5e3/0xaad [btrfs] [61666.786253] ? time_hardirqs_on+0x9/0x14 [61666.786253] btrfs_log_changed_extents+0x294/0x4e0 [btrfs] [61666.786253] ? release_extent_buffer+0x38/0xb4 [btrfs] [61666.786253] btrfs_log_inode+0xb6e/0xcdc [btrfs] [61666.786253] ? lock_acquire+0x131/0x1c5 [61666.786253] ? btrfs_log_inode_parent+0xee/0x659 [btrfs] [61666.786253] ? arch_local_irq_save+0x9/0xc [61666.786253] ? btrfs_log_inode_parent+0x1f5/0x659 [btrfs] [61666.786253] btrfs_log_inode_parent+0x223/0x659 [btrfs] [61666.786253] ? arch_local_irq_save+0x9/0xc [61666.786253] ? lockref_get_not_zero+0x2c/0x34 [61666.786253] ? rcu_read_unlock+0x3e/0x5d [61666.786253] btrfs_log_dentry_safe+0x60/0x7b [btrfs] [61666.786253] btrfs_sync_file+0x317/0x42c [btrfs] [61666.786253] vfs_fsync_range+0x8c/0x9e [61666.786253] SyS_msync+0x13c/0x1c9 [61666.786253] entry_SYSCALL_64_fastpath+0x18/0xad A sample of a corrupt log tree leaf with overlapping extents I got from running btrfs/072: item 14 key (295 108 200704) itemoff 2599 itemsize 53 extent data disk bytenr 0 nr 0 extent data offset 0 nr 458752 ram 458752 item 15 key (295 108 659456) itemoff 2546 itemsize 53 extent data disk bytenr 4343541760 nr 770048 extent data offset 606208 nr 163840 ram 770048 item 16 key (295 108 663552) itemoff 2493 itemsize 53 extent data disk bytenr 4343541760 nr 770048 extent data offset 610304 nr 155648 ram 770048 item 17 key (295 108 819200) itemoff 2440 itemsize 53 extent data disk bytenr 4334788608 nr 4096 extent data offset 0 nr 4096 ram 4096 The file extent item at offset 659456 (item 15) ends at offset 823296 (659456 + 163840) while the next file extent item (item 16) starts at offset 663552. Another different problem that the race can trigger is a failure in the assertions at tree-log.c:copy_items(), which expect that the first file extent item key we found before releasing the path exists after we have released path and that the last key we found before releasing the path also exists after releasing the path: $ cat -n fs/btrfs/tree-log.c 4080 if (need_find_last_extent) { 4081 /* btrfs_prev_leaf could return 1 without releasing the path */ 4082 btrfs_release_path(src_path); 4083 ret = btrfs_search_slot(NULL, inode->root, &first_key, 4084 src_path, 0, 0); 4085 if (ret < 0) 4086 return ret; 4087 ASSERT(ret == 0); (...) 4103 if (i >= btrfs_header_nritems(src_path->nodes[0])) { 4104 ret = btrfs_next_leaf(inode->root, src_path); 4105 if (ret < 0) 4106 return ret; 4107 ASSERT(ret == 0); 4108 src = src_path->nodes[0]; 4109 i = 0; 4110 need_find_last_extent = true; 4111 } (...) The second assertion implicitly expects that the last key before the path release still exists, because the surrounding while loop only stops after we have found that key. When this assertion fails it produces a stack like this: [139590.037075] assertion failed: ret == 0, file: fs/btrfs/tree-log.c, line: 4107 [139590.037406] ------------[ cut here ]------------ [139590.037707] kernel BUG at fs/btrfs/ctree.h:3546! [139590.038034] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [139590.038340] CPU: 1 PID: 31841 Comm: fsstress Tainted: G W 5.0.0-btrfs-next-46 #1 (...) [139590.039354] RIP: 0010:assfail.constprop.24+0x18/0x1a [btrfs] (...) [139590.040397] RSP: 0018:ffffa27f48f2b9b0 EFLAGS: 00010282 [139590.040730] RAX: 0000000000000041 RBX: ffff897c635d92c8 RCX: 0000000000000000 [139590.041105] RDX: 0000000000000000 RSI: ffff897d36a96868 RDI: ffff897d36a96868 [139590.041470] RBP: ffff897d1b9a0708 R08: 0000000000000000 R09: 0000000000000000 [139590.041815] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000013 [139590.042159] R13: 0000000000000227 R14: ffff897cffcbba88 R15: 0000000000000001 [139590.042501] FS: 00007f2efc8dee80(0000) GS:ffff897d36a80000(0000) knlGS:0000000000000000 [139590.042847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [139590.043199] CR2: 00007f8c064935e0 CR3: 0000000232252002 CR4: 00000000003606e0 [139590.043547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [139590.043899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [139590.044250] Call Trace: [139590.044631] copy_items+0xa3f/0x1000 [btrfs] [139590.045009] ? generic_bin_search.constprop.32+0x61/0x200 [btrfs] [139590.045396] btrfs_log_inode+0x7b3/0xd70 [btrfs] [139590.045773] btrfs_log_inode_parent+0x2b3/0xce0 [btrfs] [139590.046143] ? do_raw_spin_unlock+0x49/0xc0 [139590.046510] btrfs_log_dentry_safe+0x4a/0x70 [btrfs] [139590.046872] btrfs_sync_file+0x3b6/0x440 [btrfs] [139590.047243] btrfs_file_write_iter+0x45b/0x5c0 [btrfs] [139590.047592] __vfs_write+0x129/0x1c0 [139590.047932] vfs_write+0xc2/0x1b0 [139590.048270] ksys_write+0x55/0xc0 [139590.048608] do_syscall_64+0x60/0x1b0 [139590.048946] entry_SYSCALL_64_after_hwframe+0x49/0xbe [139590.049287] RIP: 0033:0x7f2efc4be190 (...) [139590.050342] RSP: 002b:00007ffe743243a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [139590.050701] RAX: ffffffffffffffda RBX: 0000000000008d58 RCX: 00007f2efc4be190 [139590.051067] RDX: 0000000000008d58 RSI: 00005567eca0f370 RDI: 0000000000000003 [139590.051459] RBP: 0000000000000024 R08: 0000000000000003 R09: 0000000000008d60 [139590.051863] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000003 [139590.052252] R13: 00000000003d3507 R14: 00005567eca0f370 R15: 0000000000000000 (...) [139590.055128] ---[ end trace 193f35d0215cdeeb ]--- So fix this race between a full ranged fsync and writeback of adjacent ranges by flushing all delalloc and waiting for all ordered extents to complete before logging the inode. This is the simplest way to solve the problem because currently the full fsync path does not deal with ranges at all (it assumes a full range from 0 to LLONG_MAX) and it always needs to look at adjacent ranges for hole detection. For use cases of ranged fsyncs this can make a few fsyncs slower but on the other hand it can make some following fsyncs to other ranges do less work or no need to do anything at all. A full fsync is rare anyway and happens only once after loading/creating an inode and once after less common operations such as a shrinking truncate. This is an issue that exists for a long time, and was often triggered by generic/127, because it does mmap'ed writes and msync (which triggers a ranged fsync). Adding support for the tree checker to detect overlapping extents (next patch in the series) and trigger a WARN() when such cases are found, and then calling btrfs_check_leaf_full() at the end of btrfs_insert_file_extent() made the issue much easier to detect. Running btrfs/072 with that change to the tree checker and making fsstress open files always with O_SYNC made it much easier to trigger the issue (as triggering it with generic/127 is very rare). CC: [email protected] # 3.16+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit cf84807 upstream. To fix following divide-by-zero error found by Syzkaller: divide error: 0000 [#1] SMP PTI CPU: 7 PID: 8447 Comm: test Kdump: loaded Not tainted 4.19.24-8.al7.x86_64 #1 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 RIP: 0010:fb_var_to_videomode+0xae/0xc0 Code: 04 44 03 46 78 03 4e 7c 44 03 46 68 03 4e 70 89 ce d1 ee 69 c0 e8 03 00 00 f6 c2 01 0f 45 ce 83 e2 02 8d 34 09 0f 45 ce 31 d2 <41> f7 f0 31 d2 f7 f1 89 47 08 f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 RSP: 0018:ffffb7e189347bf0 EFLAGS: 00010246 RAX: 00000000e1692410 RBX: ffffb7e189347d60 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb7e189347c10 RBP: ffff99972a091c00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000100 R13: 0000000000010000 R14: 00007ffd66baf6d0 R15: 0000000000000000 FS: 00007f2054d11740(0000) GS:ffff99972fbc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f205481fd20 CR3: 00000004288a0001 CR4: 00000000001606a0 Call Trace: fb_set_var+0x257/0x390 ? lookup_fast+0xbb/0x2b0 ? fb_open+0xc0/0x140 ? chrdev_open+0xa6/0x1a0 do_fb_ioctl+0x445/0x5a0 do_vfs_ioctl+0x92/0x5f0 ? __alloc_fd+0x3d/0x160 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f20548258d7 Code: 44 00 00 48 8b 05 b9 15 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 89 15 2d 00 f7 d8 64 89 01 48 It can be triggered easily with following test code: #include <linux/fb.h> #include <fcntl.h> #include <sys/ioctl.h> int main(void) { struct fb_var_screeninfo var = {.activate = 0x100, .pixclock = 60}; int fd = open("/dev/fb0", O_RDWR); if (fd < 0) return 1; if (ioctl(fd, FBIOPUT_VSCREENINFO, &var)) return 1; return 0; } Signed-off-by: Shile Zhang <[email protected]> Cc: Fredrik Noring <[email protected]> Cc: Daniel Vetter <[email protected]> Reviewed-by: Mukesh Ojha <[email protected]> Signed-off-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit b2c01aa upstream. Syzkaller report this: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 4492 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 RIP: 0010:sysfs_remove_file_ns+0x27/0x70 fs/sysfs/file.c:468 Code: 00 00 00 41 54 55 48 89 fd 53 49 89 d4 48 89 f3 e8 ee 76 9c ff 48 8d 7d 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 75 2d 48 89 da 48 b8 00 00 00 00 00 fc ff df 48 8b 6d RSP: 0018:ffff8881e9d9fc00 EFLAGS: 00010206 RAX: dffffc0000000000 RBX: ffffffff900367e0 RCX: ffffffff81a95952 RDX: 0000000000000006 RSI: ffffc90001405000 RDI: 0000000000000030 RBP: 0000000000000000 R08: fffffbfff1fa22ed R09: fffffbfff1fa22ed R10: 0000000000000001 R11: fffffbfff1fa22ec R12: 0000000000000000 R13: ffffffffc1abdac0 R14: 1ffff1103d3b3f8b R15: 0000000000000000 FS: 00007fe409dc1700(0000) GS:ffff8881f1200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2d721000 CR3: 00000001e98b6005 CR4: 00000000007606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: sysfs_remove_file include/linux/sysfs.h:519 [inline] driver_remove_file+0x40/0x50 drivers/base/driver.c:122 pcmcia_remove_newid_file drivers/pcmcia/ds.c:163 [inline] pcmcia_unregister_driver+0x7d/0x2b0 drivers/pcmcia/ds.c:209 ssb_modexit+0xa/0x1b [ssb] __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fe409dc0c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe409dc16bc R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff Modules linked in: ssb(-) 3c59x nvme_core macvlan tap pata_hpt3x3 rt2x00pci null_blk tsc40 pm_notifier_error_inject notifier_error_inject mdio cdc_wdm nf_reject_ipv4 ath9k_common ath9k_hw ath pppox ppp_generic slhc ehci_platform wl12xx wlcore tps6507x_ts ioc4 nf_synproxy_core ide_gd_mod ax25 can_dev iwlwifi can_raw atm tm2_touchkey can_gw can sundance adp5588_keys rt2800mmio rt2800lib rt2x00mmio rt2x00lib eeprom_93cx6 pn533 lru_cache elants_i2c ip_set nfnetlink gameport tipc hampshire nhc_ipv6 nhc_hop nhc_udp nhc_fragment nhc_routing nhc_mobility nhc_dest 6lowpan silead brcmutil nfc mt76_usb mt76 mac80211 iptable_security iptable_raw iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_gre sit hsr veth vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon vcan bridge stp llc ip6_gre ip6_tunnel tunnel6 tun joydev mousedev serio_raw ide_pci_generic piix floppy ide_core sch_fq_codel ip_tables x_tables ipv6 [last unloaded: 3c59x] Dumping ftrace buffer: (ftrace buffer empty) ---[ end trace 3913cbf8011e1c05 ]--- In ssb_modinit, it does not fail SSB init when ssb_host_pcmcia_init failed, however in ssb_modexit, ssb_host_pcmcia_exit calls pcmcia_unregister_driver unconditionally, which may tigger a NULL pointer dereference issue as above. Reported-by: Hulk Robot <[email protected]> Fixes: 399500d ("ssb: pick PCMCIA host code support from b43 driver") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 09ac269 upstream. Syzkaller report this: [ 1213.468581] BUG: unable to handle kernel paging request at fffffbfff83bf338 [ 1213.469530] #PF error: [normal kernel read fault] [ 1213.469530] PGD 237fe4067 P4D 237fe4067 PUD 237e60067 PMD 1c868b067 PTE 0 [ 1213.473514] Oops: 0000 [#1] SMP KASAN PTI [ 1213.473514] CPU: 0 PID: 6321 Comm: syz-executor.0 Tainted: G C 5.1.0-rc3+ #8 [ 1213.473514] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1213.473514] RIP: 0010:strcmp+0x31/0xa0 [ 1213.473514] Code: 00 00 00 00 fc ff df 55 53 48 83 ec 08 eb 0a 84 db 48 89 ef 74 5a 4c 89 e6 48 89 f8 48 89 fa 48 8d 6f 01 48 c1 e8 03 83 e2 07 <42> 0f b6 04 28 38 d0 7f 04 84 c0 75 50 48 89 f0 48 89 f2 0f b6 5d [ 1213.473514] RSP: 0018:ffff8881f2b7f950 EFLAGS: 00010246 [ 1213.473514] RAX: 1ffffffff83bf338 RBX: ffff8881ea6f7240 RCX: ffffffff825350c6 [ 1213.473514] RDX: 0000000000000000 RSI: ffffffffc1ee19c0 RDI: ffffffffc1df99c0 [ 1213.473514] RBP: ffffffffc1df99c1 R08: 0000000000000001 R09: 0000000000000004 [ 1213.473514] R10: 0000000000000000 R11: ffff8881de353f00 R12: ffff8881ee727900 [ 1213.473514] R13: dffffc0000000000 R14: 0000000000000001 R15: ffffffffc1eeaaf0 [ 1213.473514] FS: 00007fa66fa01700(0000) GS:ffff8881f7200000(0000) knlGS:0000000000000000 [ 1213.473514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1213.473514] CR2: fffffbfff83bf338 CR3: 00000001ebb9e005 CR4: 00000000007606f0 [ 1213.473514] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1213.473514] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1213.473514] PKRU: 55555554 [ 1213.473514] Call Trace: [ 1213.473514] led_trigger_register+0x112/0x3f0 [ 1213.473514] led_trigger_register_simple+0x7a/0x110 [ 1213.473514] ? 0xffffffffc1c10000 [ 1213.473514] at76_mod_init+0x77/0x1000 [at76c50x_usb] [ 1213.473514] do_one_initcall+0xbc/0x47d [ 1213.473514] ? perf_trace_initcall_level+0x3a0/0x3a0 [ 1213.473514] ? kasan_unpoison_shadow+0x30/0x40 [ 1213.473514] ? kasan_unpoison_shadow+0x30/0x40 [ 1213.473514] do_init_module+0x1b5/0x547 [ 1213.473514] load_module+0x6405/0x8c10 [ 1213.473514] ? module_frob_arch_sections+0x20/0x20 [ 1213.473514] ? kernel_read_file+0x1e6/0x5d0 [ 1213.473514] ? find_held_lock+0x32/0x1c0 [ 1213.473514] ? cap_capable+0x1ae/0x210 [ 1213.473514] ? __do_sys_finit_module+0x162/0x190 [ 1213.473514] __do_sys_finit_module+0x162/0x190 [ 1213.473514] ? __ia32_sys_init_module+0xa0/0xa0 [ 1213.473514] ? __mutex_unlock_slowpath+0xdc/0x690 [ 1213.473514] ? wait_for_completion+0x370/0x370 [ 1213.473514] ? vfs_write+0x204/0x4a0 [ 1213.473514] ? do_syscall_64+0x18/0x450 [ 1213.473514] do_syscall_64+0x9f/0x450 [ 1213.473514] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1213.473514] RIP: 0033:0x462e99 [ 1213.473514] Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 [ 1213.473514] RSP: 002b:00007fa66fa00c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 1213.473514] RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 [ 1213.473514] RDX: 0000000000000000 RSI: 0000000020000300 RDI: 0000000000000003 [ 1213.473514] RBP: 00007fa66fa00c70 R08: 0000000000000000 R09: 0000000000000000 [ 1213.473514] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fa66fa016bc [ 1213.473514] R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004 If usb_register failed, no need to call led_trigger_register_simple. Reported-by: Hulk Robot <[email protected]> Fixes: 1264b95 ("at76c50x-usb: add driver") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit a314777 ] BUG: unable to handle kernel paging request at ffffffffa016a270 PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bbd067 PTE 0 Oops: 0000 [#1 CPU: 0 PID: 6134 Comm: modprobe Not tainted 5.1.0+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:atomic_notifier_chain_register+0x24/0x60 Code: 1f 80 00 00 00 00 55 48 89 e5 41 54 49 89 f4 53 48 89 fb e8 ae b4 38 01 48 8b 53 38 48 8d 4b 38 48 85 d2 74 20 45 8b 44 24 10 <44> 3b 42 10 7e 08 eb 13 44 39 42 10 7c 0d 48 8d 4a 08 48 8b 52 08 RSP: 0018:ffffc90000e2bc60 EFLAGS: 00010086 RAX: 0000000000000292 RBX: ffffffff83467240 RCX: ffffffff83467278 RDX: ffffffffa016a260 RSI: ffffffff83752140 RDI: ffffffff83467240 RBP: ffffc90000e2bc70 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 00000000014fa61f R12: ffffffffa01c8260 R13: ffff888231091e00 R14: 0000000000000000 R15: ffffc90000e2be78 FS: 00007fbd8d7cd540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa016a270 CR3: 000000022c7e3000 CR4: 00000000000006f0 Call Trace: register_inet6addr_notifier+0x13/0x20 cxgb4_init_module+0x6c/0x1000 [cxgb4 ? 0xffffffffa01d7000 do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe If pci_register_driver fails, register inet6addr_notifier is pointless. This patch fix the error path in cxgb4_init_module. Fixes: b5a02f5 ("cxgb4 : Update ipv6 address handling api") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit f030e41 ] Following kernel panic is seen during DMA driver unload->load sequence ========================================================================== Unable to handle kernel paging request at virtual address ffffff8001198880 Internal error: Oops: 86000007 [#1] PREEMPT SMP CPU: 0 PID: 5907 Comm: HwBinder:4123_1 Tainted: G C 4.9.128-tegra-g065839f Hardware name: galen (DT) task: ffffffc3590d1a80 task.stack: ffffffc3d0678000 PC is at 0xffffff8001198880 LR is at of_dma_request_slave_channel+0xd8/0x1f8 pc : [<ffffff8001198880>] lr : [<ffffff8008746f30>] pstate: 60400045 sp : ffffffc3d067b710 x29: ffffffc3d067b710 x28: 000000000000002f x27: ffffff800949e000 x26: ffffff800949e750 x25: ffffff800949e000 x24: ffffffbefe817d84 x23: ffffff8009f77cb0 x22: 0000000000000028 x21: ffffffc3ffda49c8 x20: 0000000000000029 x19: 0000000000000001 x18: ffffffffffffffff x17: 0000000000000000 x16: ffffff80082b66a0 x15: ffffff8009e78250 x14: 000000000000000a x13: 0000000000000038 x12: 0101010101010101 x11: 0000000000000030 x10: 0101010101010101 x9 : fffffffffffffffc x8 : 7f7f7f7f7f7f7f7f x7 : 62ff726b6b64622c x6 : 0000000000008064 x5 : 6400000000000000 x4 : ffffffbefe817c44 x3 : ffffffc3ffda3e08 x2 : ffffff8001198880 x1 : ffffffc3d48323c0 x0 : ffffffc3d067b788 Process HwBinder:4123_1 (pid: 5907, stack limit = 0xffffffc3d0678028) Call trace: [<ffffff8001198880>] 0xffffff8001198880 [<ffffff80087459f8>] dma_request_chan+0x50/0x1f0 [<ffffff8008745bc0>] dma_request_slave_channel+0x28/0x40 [<ffffff8001552c44>] tegra_alt_pcm_open+0x114/0x170 [<ffffff8008d65fa4>] soc_pcm_open+0x10c/0x878 [<ffffff8008d18618>] snd_pcm_open_substream+0xc0/0x170 [<ffffff8008d1878c>] snd_pcm_open+0xc4/0x240 [<ffffff8008d189e0>] snd_pcm_playback_open+0x58/0x80 [<ffffff8008cfc6d4>] snd_open+0xb4/0x178 [<ffffff8008250628>] chrdev_open+0xb8/0x1d0 [<ffffff8008246fdc>] do_dentry_open+0x214/0x318 [<ffffff80082485d0>] vfs_open+0x58/0x88 [<ffffff800825bce0>] do_last+0x450/0xde0 [<ffffff800825c718>] path_openat+0xa8/0x368 [<ffffff800825dd84>] do_filp_open+0x8c/0x110 [<ffffff8008248a74>] do_sys_open+0x164/0x220 [<ffffff80082b66dc>] compat_SyS_openat+0x3c/0x50 [<ffffff8008083040>] el0_svc_naked+0x34/0x38 ---[ end trace 67e6d544e65b5145 ]--- Kernel panic - not syncing: Fatal exception ========================================================================== In device probe(), of_dma_controller_register() registers DMA controller. But when driver is removed, this is not freed. During driver reload this results in data abort and kernel panic. Add of_dma_controller_free() in driver remove path to fix the issue. Fixes: f46b195 ("dmaengine: tegra-adma: Add support for Tegra210 ADMA") Signed-off-by: Sameer Pujar <[email protected]> Reviewed-by: Jon Hunter <[email protected]> Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit f80c5da ] This commit makes the kernel not send the next queued HCI command until a command complete arrives for the last HCI command sent to the controller. This change avoids a problem with some buggy controllers (seen on two SKUs of QCA9377) that send an extra command complete event for the previous command after the kernel had already sent a new HCI command to the controller. The problem was reproduced when starting an active scanning procedure, where an extra command complete event arrives for the LE_SET_RANDOM_ADDR command. When this happends the kernel ends up not processing the command complete for the following commmand, LE_SET_SCAN_PARAM, and ultimately behaving as if a passive scanning procedure was being performed, when in fact controller is performing an active scanning procedure. This makes it impossible to discover BLE devices as no device found events are sent to userspace. This problem is reproducible on 100% of the attempts on the affected controllers. The extra command complete event can be seen at timestamp 27.420131 on the btmon logs bellow. Bluetooth monitor ver 5.50 = Note: Linux version 5.0.0+ (x86_64) 0.352340 = Note: Bluetooth subsystem version 2.22 0.352343 = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0) [hci0] 0.352344 = Open Index: 80:C5:F2:8F:87:84 [hci0] 0.352345 = Index Info: 80:C5:F2:8F:87:84 (Qualcomm) [hci0] 0.352346 @ MGMT Open: bluetoothd (privileged) version 1.14 {0x0001} 0.352347 @ MGMT Open: btmon (privileged) version 1.14 {0x0002} 0.352366 @ MGMT Open: btmgmt (privileged) version 1.14 {0x0003} 27.302164 @ MGMT Command: Start Discovery (0x0023) plen 1 {0x0003} [hci0] 27.302310 Address type: 0x06 LE Public LE Random < HCI Command: LE Set Random Address (0x08|0x0005) plen 6 #1 [hci0] 27.302496 Address: 15:60:F2:91:B2:24 (Non-Resolvable) > HCI Event: Command Complete (0x0e) plen 4 #2 [hci0] 27.419117 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7 #3 [hci0] 27.419244 Type: Active (0x01) Interval: 11.250 msec (0x0012) Window: 11.250 msec (0x0012) Own address type: Random (0x01) Filter policy: Accept all advertisement (0x00) > HCI Event: Command Complete (0x0e) plen 4 #4 [hci0] 27.420131 LE Set Random Address (0x08|0x0005) ncmd 1 Status: Success (0x00) < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2 #5 [hci0] 27.420259 Scanning: Enabled (0x01) Filter duplicates: Enabled (0x01) > HCI Event: Command Complete (0x0e) plen 4 #6 [hci0] 27.420969 LE Set Scan Parameters (0x08|0x000b) ncmd 1 Status: Success (0x00) > HCI Event: Command Complete (0x0e) plen 4 #7 [hci0] 27.421983 LE Set Scan Enable (0x08|0x000c) ncmd 1 Status: Success (0x00) @ MGMT Event: Command Complete (0x0001) plen 4 {0x0003} [hci0] 27.422059 Start Discovery (0x0023) plen 1 Status: Success (0x00) Address type: 0x06 LE Public LE Random @ MGMT Event: Discovering (0x0013) plen 2 {0x0003} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0002} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) @ MGMT Event: Discovering (0x0013) plen 2 {0x0001} [hci0] 27.422067 Address type: 0x06 LE Public LE Random Discovery: Enabled (0x01) Signed-off-by: João Paulo Rechi Vita <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

[ Upstream commit 41a91c6 ] dwc3_gadget_suspend() is called under dwc->lock spinlock. In such context calling synchronize_irq() is not allowed. Move the problematic call out of the protected block to fix the following kernel BUG during system suspend: BUG: sleeping function called from invalid context at kernel/irq/manage.c:112 in_atomic(): 1, irqs_disabled(): 128, pid: 1601, name: rtcwake 6 locks held by rtcwake/1601: #0: f70ac2a2 (sb_writers#7){.+.+}, at: vfs_write+0x130/0x16c #1: b5fe1270 (&of->mutex){+.+.}, at: kernfs_fop_write+0xc0/0x1e4 #2: 7e597705 (kn->count#60){.+.+}, at: kernfs_fop_write+0xc8/0x1e4 #3: 8b3527d0 (system_transition_mutex){+.+.}, at: pm_suspend+0xc4/0xc04 #4: fc7f1c42 (&dev->mutex){....}, at: __device_suspend+0xd8/0x74c #5: 4b36507e (&(&dwc->lock)->rlock){....}, at: dwc3_gadget_suspend+0x24/0x3c irq event stamp: 11252 hardirqs last enabled at (11251): [<c09c54a4>] _raw_spin_unlock_irqrestore+0x6c/0x74 hardirqs last disabled at (11252): [<c09c4d44>] _raw_spin_lock_irqsave+0x1c/0x5c softirqs last enabled at (9744): [<c0102564>] __do_softirq+0x3a4/0x66c softirqs last disabled at (9737): [<c0128528>] irq_exit+0x140/0x168 Preemption disabled at: [<00000000>] (null) CPU: 7 PID: 1601 Comm: rtcwake Not tainted 5.0.0-rc3-next-20190122-00039-ga3f4ee4f8a52 #5252 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c01110f0>] (unwind_backtrace) from [<c010d120>] (show_stack+0x10/0x14) [<c010d120>] (show_stack) from [<c09a4d04>] (dump_stack+0x90/0xc8) [<c09a4d04>] (dump_stack) from [<c014c700>] (___might_sleep+0x22c/0x2c8) [<c014c700>] (___might_sleep) from [<c0189d68>] (synchronize_irq+0x28/0x84) [<c0189d68>] (synchronize_irq) from [<c05cbbf8>] (dwc3_gadget_suspend+0x34/0x3c) [<c05cbbf8>] (dwc3_gadget_suspend) from [<c05bd020>] (dwc3_suspend_common+0x154/0x410) [<c05bd020>] (dwc3_suspend_common) from [<c05bd34c>] (dwc3_suspend+0x14/0x2c) [<c05bd34c>] (dwc3_suspend) from [<c051c730>] (platform_pm_suspend+0x2c/0x54) [<c051c730>] (platform_pm_suspend) from [<c05285d4>] (dpm_run_callback+0xa4/0x3dc) [<c05285d4>] (dpm_run_callback) from [<c0528a40>] (__device_suspend+0x134/0x74c) [<c0528a40>] (__device_suspend) from [<c052c508>] (dpm_suspend+0x174/0x588) [<c052c508>] (dpm_suspend) from [<c0182134>] (suspend_devices_and_enter+0xc0/0xe74) [<c0182134>] (suspend_devices_and_enter) from [<c0183658>] (pm_suspend+0x770/0xc04) [<c0183658>] (pm_suspend) from [<c0180ddc>] (state_store+0x6c/0xcc) [<c0180ddc>] (state_store) from [<c09a9a70>] (kobj_attr_store+0x14/0x20) [<c09a9a70>] (kobj_attr_store) from [<c02d6800>] (sysfs_kf_write+0x4c/0x50) [<c02d6800>] (sysfs_kf_write) from [<c02d594c>] (kernfs_fop_write+0xfc/0x1e4) [<c02d594c>] (kernfs_fop_write) from [<c02593d8>] (__vfs_write+0x2c/0x160) [<c02593d8>] (__vfs_write) from [<c0259694>] (vfs_write+0xa4/0x16c) [<c0259694>] (vfs_write) from [<c0259870>] (ksys_write+0x40/0x8c) [<c0259870>] (ksys_write) from [<c0101000>] (ret_fast_syscall+0x0/0x28) Exception stack(0xed55ffa8 to 0xed55fff0) ... Fixes: 01c1088 ("usb: dwc3: gadget: synchronize_irq dwc irq in suspend") Signed-off-by: Marek Szyprowski <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

OpenAnolis Bug Tracker: 0000260 commit e5854b1 upstream. With DEBUG_PAGEALLOC on, the following triggers. BUG: unable to handle page fault for address: ffff88859367c000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3001067 P4D 3001067 PUD 406d3a8067 PMD 406d30c067 PTE 800ffffa6c983060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 38 PID: 3110657 Comm: python2.7 RIP: 0010:fuse_readdir+0x88f/0xe7a [fuse] Code: 49 8b 4d 08 49 39 4e 60 0f 84 44 04 00 00 48 8b 43 08 43 8d 1c 3c 4d 01 7e 68 49 89 dc 48 03 5c 24 38 49 89 46 60 8b 44 24 30 <8b> 4b 10 44 29 e0 48 89 ca 48 83 c1 1f 48 83 e1 f8 83 f8 17 49 89 RSP: 0018:ffffc90035edbde0 EFLAGS: 00010286 RAX: 0000000000001000 RBX: ffff88859367bff0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88859367bfed RDI: 0000000000920907 RBP: ffffc90035edbe90 R08: 000000000000014b R09: 0000000000000004 R10: ffff88859367b000 R11: 0000000000000000 R12: 0000000000000ff0 R13: ffffc90035edbee0 R14: ffff889fb8546180 R15: 0000000000000020 FS: 00007f80b5f4a740(0000) GS:ffff889fffa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88859367c000 CR3: 0000001c170c2001 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: iterate_dir+0x122/0x180 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 It's in fuse_parse_cache(). %rbx (ffff88859367bff0) is fuse_dirent pointer - addr + offset. FUSE_DIRENT_SIZE() is trying to dereference namelen off of it but that derefs into the next page which is disabled by pagealloc debug causing a PF. This is caused by dirent->namelen being accessed before ensuring that there's enough bytes in the page for the dirent. Fix it by pushing down reclen calculation. Signed-off-by: Tejun Heo <[email protected]> Fixes: 5d7bc7e ("fuse: allow using readdir cache") Cc: [email protected] # v4.20+ Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

fix #36837630 When building kernel with idle age not in page's flag, kernel will panic as below: [ 13.977004] BUG: unable to handle kernel paging request at ffffc90000eba2b9 [ 13.978021] PGD 13ad35067 P4D 13ad35067 PUD 13ad36067 PMD 139b88067 PTE 0 [ 13.979014] Oops: 0002 [#1] SMP PTI [ 13.979533] CPU: 12 PID: 112 Comm: kidled Not tainted 4.19.91+ #586 [ 13.980450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 13.982136] RIP: 0010:free_pcp_prepare+0x49/0xc0 [ 13.982945] Code: 44 00 00 48 8b 15 1f 9d 13 01 48 8b 0d f8 9c 13 01 48 b8 00 00 00 00 00 16 00 00 48 01 d8 48 c1 f8 06 48 85 0 [ 13.985674] RSP: 0018:ffffc900003ffe20 EFLAGS: 00010202 [ 13.986429] RAX: 00000000001352b9 RBX: ffffea0004d4ae80 RCX: 0000000000000001 [ 13.987468] RDX: ffffc90000d85000 RSI: 0000000000000000 RDI: ffffea0004d4ae80 [ 13.988504] RBP: ffffea0004d4ae80 R08: ffffc90000ec6000 R09: 0000000000000000 [ 13.989534] R10: 0000000000008e1c R11: ffffffff828c1b6d R12: ffffc90000d85000 [ 13.990581] R13: ffffffff82306700 R14: 0000000000000001 R15: ffff88813adbab50 [ 13.991634] FS: 0000000000000000(0000) GS:ffff88813bb00000(0000) knlGS:0000000000000000 [ 13.992814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 13.993648] CR2: ffffc90000eba2b9 CR3: 000000000220a006 CR4: 00000000003706e0 [ 13.994681] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 13.995721] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 13.996763] Call Trace: [ 13.997137] free_unref_page+0x11/0x60 [ 13.997693] __vunmap+0x4e/0xb0 [ 13.998159] kidled.cold+0x1b/0x53 [ 13.998680] ? __schedule+0x31c/0x6d0 [ 13.999222] ? finish_wait+0x80/0x80 [ 13.999751] ? kidled_mem_cgroup_move_stats+0x270/0x270 [ 14.000514] kthread+0x117/0x130 [ 14.001006] ? kthread_create_worker_on_cpu+0x70/0x70 [ 14.001751] ret_from_fork+0x35/0x40 This patch uses rcu lock to fix this race window, caller can only access the idle age under read lock, see kidled_get/set/inc_page_age(). Note the kidled and the memory hotplug process will also use the mem_hotplug_lock to avoid race between alloc and free. Since it may sleep in kidle_free_page_age(), call it earlier to avoid sleep with pgdat_resize_lock held. Signed-off-by: Gang Deng <[email protected]> Reviewed-by: zhongjiang-ali <[email protected]> Reviewed-by: Xu Yu <[email protected]>

OpenAnolis Bug Tracker: 0000366 commit 594cced upstream. khugepaged has to drop mmap lock several times while collapsing a page. The situation can change while the lock is dropped and we need to re-validate that the VMA is still in place and the PMD is still subject for collapse. But we miss one corner case: while collapsing an anonymous pages the VMA could be replaced with file VMA. If the file VMA doesn't have any private pages we get NULL pointer dereference: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] anon_vma_lock_write include/linux/rmap.h:120 [inline] collapse_huge_page mm/khugepaged.c:1110 [inline] khugepaged_scan_pmd mm/khugepaged.c:1349 [inline] khugepaged_scan_mm_slot mm/khugepaged.c:2110 [inline] khugepaged_do_scan mm/khugepaged.c:2193 [inline] khugepaged+0x3bba/0x5a10 mm/khugepaged.c:2238 The fix is to make sure that the VMA is anonymous in hugepage_vma_revalidate(). The helper is only used for collapsing anonymous pages. Fixes: 99cb0db ("mm,thp: add read-only THP support for (non-shmem) FS") Reported-by: [email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Yang Shi <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Xu Yu <[email protected]> Reviewed-by: Gang Deng <[email protected]>

OpenAnolis Bug Tracker: 0000366 Transparent huge page has supported read-only non-shmem files. The file- backed THP is collapsed by khugepaged and truncated when written (for shared libraries). However, there is race in two possible places. 1) multiple writers truncate the same page cache concurrently; 2) collapse_file rolls back when writer truncates the page cache; In both cases, subpage(s) of file THP can be revealed by find_get_entry in truncate_inode_pages_range, which will trigger PageTail BUG_ON in truncate_inode_page, as follows. [40326.247034] page:000000009e420ff2 refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff pfn:0x50c3ff [40326.247041] head:0000000075ff816d order:9 compound_mapcount:0 compound_pincount:0 [40326.247046] flags: 0x37fffe0000010815(locked|uptodate|lru|arch_1|head) [40326.247051] raw: 37fffe0000000000 fffffe0013108001 dead000000000122 dead000000000400 [40326.247053] raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000 [40326.247055] head: 37fffe0000010815 fffffe001066bd48 ffff000404183c20 0000000000000000 [40326.247057] head: 0000000000000600 0000000000000000 00000001ffffffff ffff000c0345a000 [40326.247058] page dumped because: VM_BUG_ON_PAGE(PageTail(page)) [40326.247077] ------------[ cut here ]------------ [40326.247080] kernel BUG at mm/truncate.c:213! [40326.280581] Internal error: Oops - BUG: 0 [#1] SMP ... [40326.285130] CPU: 14 PID: 11394 Comm: check_madvise_d Kdump: loaded Tainted: G W E [40326.286202] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 [40326.286968] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) [40326.287584] pc : truncate_inode_page+0x64/0x70 [40326.288040] lr : truncate_inode_page+0x64/0x70 [40326.288498] sp : ffff80001b60b900 [40326.288837] x29: ffff80001b60b900 x28: 00000000000007ff [40326.289377] x27: ffff80001b60b9a0 x26: 0000000000000000 [40326.289943] x25: 000000000000000f x24: ffff80001b60b9a0 [40326.290485] x23: ffff80001b60ba18 x22: ffff0001e0999ea8 [40326.291027] x21: ffff0000c21db300 x20: ffffffffffffffff [40326.291566] x19: fffffe001310ffc0 x18: 0000000000000020 [40326.292106] x17: 0000000000000000 x16: 0000000000000000 [40326.292655] x15: ffff0000c21db960 x14: 3030306666666620 [40326.293197] x13: 6666666666666666 x12: 3130303030303030 [40326.293746] x11: ffff8000117b69b8 x10: 00000000ffff8000 [40326.294313] x9 : ffff80001012690c x8 : 0000000000000000 [40326.294851] x7 : ffff8000114f69b8 x6 : 0000000000017ffd [40326.295392] x5 : ffff0007fffbcbc8 x4 : ffff80001b60b5c0 [40326.295942] x3 : 0000000000000001 x2 : 0000000000000000 [40326.296497] x1 : 0000000000000000 x0 : 0000000000000000 [40326.297047] Call trace: [40326.297304] truncate_inode_page+0x64/0x70 [40326.297724] truncate_inode_pages_range+0x550/0x7e4 [40326.298251] truncate_pagecache+0x58/0x80 [40326.298662] do_dentry_open+0x1e4/0x3c0 [40326.299052] vfs_open+0x38/0x44 [40326.299377] do_open+0x1f0/0x310 [40326.299709] path_openat+0x114/0x1dc [40326.300077] do_filp_open+0x84/0x134 [40326.300444] do_sys_openat2+0xbc/0x164 [40326.300825] __arm64_sys_openat+0x74/0xc0 [40326.301236] el0_svc_common.constprop.0+0x88/0x220 [40326.301723] do_el0_svc+0x30/0xa0 [40326.302089] el0_svc+0x20/0x30 [40326.302404] el0_sync_handler+0x1a4/0x1b0 [40326.302814] el0_sync+0x180/0x1c0 [40326.303157] Code: aa0103e0 900061e1 910ec021 9400d300 (d4210000) [40326.303775] ---[ end trace f70cdb42cb7c2d42 ]--- [40326.304244] Kernel panic - not syncing: Oops - BUG: Fatal exception This checks the page mapping and retries when subpage of file THP is found, in truncate_inode_pages_range. Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs") Signed-off-by: Xu Yu <[email protected]> Signed-off-by: Rongwei Wang <[email protected]> Reviewed-by: Gang Deng <[email protected]>

OpenAnolis Bug Tracker: 0000366 Currently collapse_file does not explicitly check PG_writeback, instead, page_has_private and try_to_release_page are used to filter writeback pages. This does not work for xfs with blocksize equal to or larger than pagesize, because in such case xfs has no page->private. This makes collapse_file bail out early for writeback page. Otherwise, xfs end_page_writeback will panic as follows. [ 6411.448211] page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:ffff0003f88c86a8 index:0x0 pfn:0x84ef32 [ 6411.448304] aops:xfs_address_space_operations [xfs] ino:30000b7 dentry name:"libtest.so" [ 6411.448312] flags: 0x57fffe0000008027(locked|referenced|uptodate|active|writeback) [ 6411.448317] raw: 57fffe0000008027 ffff80001b48bc28 ffff80001b48bc28 ffff0003f88c86a8 [ 6411.448321] raw: 0000000000000000 0000000000000000 00000000ffffffff ffff0000c3e9a000 [ 6411.448324] page dumped because: VM_BUG_ON_PAGE(((unsigned int) page_ref_count(page) + 127u <= 127u)) [ 6411.448327] page->mem_cgroup:ffff0000c3e9a000 [ 6411.448340] ------------[ cut here ]------------ [ 6411.448343] kernel BUG at include/linux/mm.h:1212! [ 6411.449288] Internal error: Oops - BUG: 0 [#1] SMP [ 6411.449786] Modules linked in: [ 6411.449790] BUG: Bad page state in process khugepaged pfn:84ef32 [ 6411.450143] xfs(E) [ 6411.450459] page:fffffe00201bcc80 refcount:0 mapcount:0 mapping:0 index:0x0 pfn:0x84ef32 ... [ 6411.451387] CPU: 25 PID: 0 Comm: swapper/25 Kdump: loaded Tainted: G W E [ 6411.451389] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) [ 6411.451393] pc : end_page_writeback+0x1c0/0x214 [ 6411.451394] lr : end_page_writeback+0x1c0/0x214 [ 6411.451395] sp : ffff800011ce3cc0 [ 6411.451396] x29: ffff800011ce3cc0 x28: 0000000000000000 [ 6411.451398] x27: ffff000c04608040 x26: 0000000000000000 [ 6411.451399] x25: ffff000c04608040 x24: 0000000000001000 [ 6411.451401] x23: ffff0003f88c8530 x22: 0000000000001000 [ 6411.451403] x21: ffff0003f88c8530 x20: 0000000000000000 [ 6411.451404] x19: fffffe00201bcc80 x18: 0000000000000030 [ 6411.451406] x17: 0000000000000000 x16: 0000000000000000 [ 6411.451407] x15: ffff000c018f9760 x14: ffffffffffffffff [ 6411.451409] x13: ffff8000119d72b0 x12: ffff8000119d6ee3 [ 6411.451410] x11: ffff8000117b69b8 x10: 00000000ffff8000 [ 6411.451412] x9 : ffff800010617534 x8 : 0000000000000000 [ 6411.451413] x7 : ffff8000114f69b8 x6 : 000000000000000f [ 6411.451415] x5 : 0000000000000000 x4 : 0000000000000000 [ 6411.451416] x3 : 0000000000000400 x2 : 0000000000000000 [ 6411.451418] x1 : 0000000000000000 x0 : 0000000000000000 [ 6411.451420] Call trace: [ 6411.451421] end_page_writeback+0x1c0/0x214 [ 6411.451424] iomap_finish_page_writeback+0x13c/0x204 [ 6411.451425] iomap_finish_ioend+0xe8/0x19c [ 6411.451426] iomap_writepage_end_bio+0x38/0x50 [ 6411.451427] bio_endio+0x168/0x1ec [ 6411.451430] blk_update_request+0x278/0x3f0 [ 6411.451432] blk_mq_end_request+0x34/0x15c [ 6411.451435] virtblk_request_done+0x38/0x74 [virtio_blk] [ 6411.451437] blk_done_softirq+0xc4/0x110 [ 6411.451439] __do_softirq+0x128/0x38c [ 6411.451441] __irq_exit_rcu+0x118/0x150 [ 6411.451442] irq_exit+0x1c/0x30 [ 6411.451445] __handle_domain_irq+0x8c/0xf0 [ 6411.451446] gic_handle_irq+0x84/0x108 [ 6411.451447] el1_irq+0xcc/0x180 [ 6411.451448] arch_cpu_idle+0x18/0x40 [ 6411.451450] default_idle_call+0x4c/0x1a0 [ 6411.451453] cpuidle_idle_call+0x168/0x1e0 [ 6411.451454] do_idle+0xb4/0x104 [ 6411.451455] cpu_startup_entry+0x30/0x9c [ 6411.451458] secondary_start_kernel+0x104/0x180 [ 6411.451460] Code: d4210000 b0006161 910c8021 94013f4d (d4210000) [ 6411.451462] ---[ end trace 4a88c6a074082f8c ]--- [ 6411.451464] Kernel panic - not syncing: Oops - BUG: Fatal exception in interrupt Fixes: eb6ecbed0aa2 ("mm, thp: relax the VM_DENYWRITE constraint on file-backed THPs") Signed-off-by: Xu Yu <[email protected]> Signed-off-by: Rongwei Wang <[email protected]> Reviewed-by: Gang Deng <[email protected]>

OpenAnolis Bug Tracker: 0000346 commit 30e29a9a2bc6a4888335a6ede968b75cd329657a upstream. In prealloc_elems_and_freelist(), the multiplication to calculate the size passed to bpf_map_area_alloc() could lead to an integer overflow. As a result, out-of-bounds write could occur in pcpu_freelist_populate() as reported by KASAN: [...] [ 16.968613] BUG: KASAN: slab-out-of-bounds in pcpu_freelist_populate+0xd9/0x100 [ 16.969408] Write of size 8 at addr ffff888104fc6ea0 by task crash/78 [ 16.970038] [ 16.970195] CPU: 0 PID: 78 Comm: crash Not tainted 5.15.0-rc2+ #1 [ 16.970878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 16.972026] Call Trace: [ 16.972306] dump_stack_lvl+0x34/0x44 [ 16.972687] print_address_description.constprop.0+0x21/0x140 [ 16.973297] ? pcpu_freelist_populate+0xd9/0x100 [ 16.973777] ? pcpu_freelist_populate+0xd9/0x100 [ 16.974257] kasan_report.cold+0x7f/0x11b [ 16.974681] ? pcpu_freelist_populate+0xd9/0x100 [ 16.975190] pcpu_freelist_populate+0xd9/0x100 [ 16.975669] stack_map_alloc+0x209/0x2a0 [ 16.976106] __sys_bpf+0xd83/0x2ce0 [...] The possibility of this overflow was originally discussed in [0], but was overlooked. Fix the integer overflow by changing elem_size to u64 from u32. [0] https://lore.kernel.org/bpf/[email protected]/ Fixes: 557c0c6 ("bpf: convert stackmap to pre-allocation") Signed-off-by: Tatsuhiko Yasumatsu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Fixes: CVE-2021-41864 Signed-off-by: Shile Zhang <[email protected]> Reviewed-by: Tianjia Zhang <[email protected]>

OpenAnolis Bug Tracker: 0000546 commit f27ce0e upstream. zone_watermark_fast was introduced by commit 48ee5f3 ("mm, page_alloc: shortcut watermark checks for order-0 pages"). The commit simply checks if free pages is bigger than watermark without additional calculation such like reducing watermark. It considered free cma pages but it did not consider highatomic reserved. This may incur exhaustion of free pages except high order atomic free pages. Assume that reserved_highatomic pageblock is bigger than watermark min, and there are only few free pages except high order atomic free. Because zone_watermark_fast passes the allocation without considering high order atomic free, normal reclaimable allocation like GFP_HIGHUSER will consume all the free pages. Then finally order-0 atomic allocation may fail on allocation. This means watermark min is not protected against non-atomic allocation. The order-0 atomic allocation with ALLOC_HARDER unwantedly can be failed. Additionally the __GFP_MEMALLOC allocation with ALLOC_NO_WATERMARKS also can be failed. To avoid the problem, zone_watermark_fast should consider highatomic reserve. If the actual size of high atomic free is counted accurately like cma free, we may use it. On this patch just use nr_reserved_highatomic. Additionally introduce __zone_watermark_unusable_free to factor out common parts between zone_watermark_fast and __zone_watermark_ok. This is an example of ALLOC_HARDER allocation failure using v4.19 based kernel. Binder:9343_3: page allocation failure: order:0, mode:0x480020(GFP_ATOMIC), nodemask=(null) Call trace: [<ffffff8008f40f8c>] dump_stack+0xb8/0xf0 [<ffffff8008223320>] warn_alloc+0xd8/0x12c [<ffffff80082245e4>] __alloc_pages_nodemask+0x120c/0x1250 [<ffffff800827f6e8>] new_slab+0x128/0x604 [<ffffff800827b0cc>] ___slab_alloc+0x508/0x670 [<ffffff800827ba00>] __kmalloc+0x2f8/0x310 [<ffffff80084ac3e0>] context_struct_to_string+0x104/0x1cc [<ffffff80084ad8fc>] security_sid_to_context_core+0x74/0x144 [<ffffff80084ad880>] security_sid_to_context+0x10/0x18 [<ffffff800849bd80>] selinux_secid_to_secctx+0x20/0x28 [<ffffff800849109c>] security_secid_to_secctx+0x3c/0x70 [<ffffff8008bfe118>] binder_transaction+0xe68/0x454c Mem-Info: active_anon:102061 inactive_anon:81551 isolated_anon:0 active_file:59102 inactive_file:68924 isolated_file:64 unevictable:611 dirty:63 writeback:0 unstable:0 slab_reclaimable:13324 slab_unreclaimable:44354 mapped:83015 shmem:4858 pagetables:26316 bounce:0 free:2727 free_pcp:1035 free_cma:178 Node 0 active_anon:408244kB inactive_anon:326204kB active_file:236408kB inactive_file:275696kB unevictable:2444kB isolated(anon):0kB isolated(file):256kB mapped:332060kB dirty:252kB writeback:0kB shmem:19432kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:10908kB min:6192kB low:44388kB high:47060kB active_anon:409160kB inactive_anon:325924kB active_file:235820kB inactive_file:276628kB unevictable:2444kB writepending:252kB present:3076096kB managed:2673676kB mlocked:2444kB kernel_stack:62512kB pagetables:105264kB bounce:0kB free_pcp:4140kB local_pcp:40kB free_cma:712kB lowmem_reserve[]: 0 0 Normal: 505*4kB (H) 357*8kB (H) 201*16kB (H) 65*32kB (H) 1*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10236kB 138826 total pagecache pages 5460 pages in swap cache Swap cache stats: add 8273090, delete 8267506, find 1004381/4060142 This is an example of ALLOC_NO_WATERMARKS allocation failure using v4.14 based kernel. kswapd0: page allocation failure: order:0, mode:0x140000a(GFP_NOIO|__GFP_HIGHMEM|__GFP_MOVABLE), nodemask=(null) kswapd0 cpuset=/ mems_allowed=0 CPU: 4 PID: 1221 Comm: kswapd0 Not tainted 4.14.113-18770262-userdebug #1 Call trace: [<0000000000000000>] dump_backtrace+0x0/0x248 [<0000000000000000>] show_stack+0x18/0x20 [<0000000000000000>] __dump_stack+0x20/0x28 [<0000000000000000>] dump_stack+0x68/0x90 [<0000000000000000>] warn_alloc+0x104/0x198 [<0000000000000000>] __alloc_pages_nodemask+0xdc0/0xdf0 [<0000000000000000>] zs_malloc+0x148/0x3d0 [<0000000000000000>] zram_bvec_rw+0x410/0x798 [<0000000000000000>] zram_rw_page+0x88/0xdc [<0000000000000000>] bdev_write_page+0x70/0xbc [<0000000000000000>] __swap_writepage+0x58/0x37c [<0000000000000000>] swap_writepage+0x40/0x4c [<0000000000000000>] shrink_page_list+0xc30/0xf48 [<0000000000000000>] shrink_inactive_list+0x2b0/0x61c [<0000000000000000>] shrink_node_memcg+0x23c/0x618 [<0000000000000000>] shrink_node+0x1c8/0x304 [<0000000000000000>] kswapd+0x680/0x7c4 [<0000000000000000>] kthread+0x110/0x120 [<0000000000000000>] ret_from_fork+0x10/0x18 Mem-Info: active_anon:111826 inactive_anon:65557 isolated_anon:0\x0a active_file:44260 inactive_file:83422 isolated_file:0\x0a unevictable:4158 dirty:117 writeback:0 unstable:0\x0a slab_reclaimable:13943 slab_unreclaimable:43315\x0a mapped:102511 shmem:3299 pagetables:19566 bounce:0\x0a free:3510 free_pcp:553 free_cma:0 Node 0 active_anon:447304kB inactive_anon:262228kB active_file:177040kB inactive_file:333688kB unevictable:16632kB isolated(anon):0kB isolated(file):0kB mapped:410044kB d irty:468kB writeback:0kB shmem:13196kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no Normal free:14040kB min:7440kB low:94500kB high:98136kB reserved_highatomic:32768KB active_anon:447336kB inactive_anon:261668kB active_file:177572kB inactive_file:333768k B unevictable:16632kB writepending:480kB present:4081664kB managed:3637088kB mlocked:16632kB kernel_stack:47072kB pagetables:78264kB bounce:0kB free_pcp:2280kB local_pcp:720kB free_cma:0kB [ 4738.329607] lowmem_reserve[]: 0 0 Normal: 860*4kB (H) 453*8kB (H) 180*16kB (H) 26*32kB (H) 34*64kB (H) 6*128kB (H) 2*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 14232kB This is trace log which shows GFP_HIGHUSER consumes free pages right before ALLOC_NO_WATERMARKS. <...>-22275 [006] .... 889.213383: mm_page_alloc: page=00000000d2be5665 pfn=970744 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213385: mm_page_alloc: page=000000004b2335c2 pfn=970745 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213387: mm_page_alloc: page=00000000017272e1 pfn=970278 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213389: mm_page_alloc: page=00000000c4be79fb pfn=970279 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213391: mm_page_alloc: page=00000000f8a51d4f pfn=970260 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213393: mm_page_alloc: page=000000006ba8f5ac pfn=970261 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213395: mm_page_alloc: page=00000000819f1cd3 pfn=970196 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO <...>-22275 [006] .... 889.213396: mm_page_alloc: page=00000000f6b72a64 pfn=970197 order=0 migratetype=0 nr_free=3650 gfp_flags=GFP_HIGHUSER|__GFP_ZERO kswapd0-1207 [005] ...1 889.213398: mm_page_alloc: page= (null) pfn=0 order=0 migratetype=1 nr_free=3650 gfp_flags=GFP_NOWAIT|__GFP_HIGHMEM|__GFP_NOWARN|__GFP_MOVABLE [[email protected]: remove redundant code for high-order] Link: http://lkml.kernel.org/r/[email protected] Reported-by: Yong-Taek Lee <[email protected]> Suggested-by: Minchan Kim <[email protected]> Signed-off-by: Jaewon Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Yong-Taek Lee <[email protected]> Cc: Michal Hocko <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Xu Yu <[email protected]> Reviewed-by: Xunlei Pang <[email protected]> Signed-off-by: Xunlei Pang <[email protected]>

…locked() OpenAnolis Bug Tracker: 0000546 commit da74240eb3fcd806edb1643874363e954d9e948b upstream. Commit 3fea5a4 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") introduced a bug in __add_to_page_cache_locked() causing the following splat: page dumped because: VM_BUG_ON_PAGE(page_memcg(page)) pages's memcg:ffff8889a4116000 ------------[ cut here ]------------ kernel BUG at mm/memcontrol.c:2924! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 35 PID: 12345 Comm: cat Tainted: G S W I 5.11.0-rc4-debug+ #1 Hardware name: HP HP Z8 G4 Workstation/81C7, BIOS P60 v01.25 12/06/2017 RIP: commit_charge+0xf4/0x130 Call Trace: mem_cgroup_charge+0x175/0x770 __add_to_page_cache_locked+0x712/0xad0 add_to_page_cache_lru+0xc5/0x1f0 cachefiles_read_or_alloc_pages+0x895/0x2e10 [cachefiles] __fscache_read_or_alloc_pages+0x6c0/0xa00 [fscache] __nfs_readpages_from_fscache+0x16d/0x630 [nfs] nfs_readpages+0x24e/0x540 [nfs] read_pages+0x5b1/0xc40 page_cache_ra_unbounded+0x460/0x750 generic_file_buffered_read_get_pages+0x290/0x1710 generic_file_buffered_read+0x2a9/0xc30 nfs_file_read+0x13f/0x230 [nfs] new_sync_read+0x3af/0x610 vfs_read+0x339/0x4b0 ksys_read+0xf1/0x1c0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Before that commit, there was a try_charge() and commit_charge() in __add_to_page_cache_locked(). These two separated charge functions were replaced by a single mem_cgroup_charge(). However, it forgot to add a matching mem_cgroup_uncharge() when the xarray insertion failed with the page released back to the pool. Fix this by adding a mem_cgroup_uncharge() call when insertion error happens. Link: https://lkml.kernel.org/r/[email protected] Fixes: 3fea5a4 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Alex Shi <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Miaohe Lin <[email protected]> Cc: Muchun Song <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: zhongjiang-ali <[email protected]> Acked-by: Xunlei Pang <[email protected]> Signed-off-by: Xunlei Pang <[email protected]>

OpenAnolis Bug Tracker: 0000546 task #29494601 commit 52f2347 upstream The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault. Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain. BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated. [[email protected]: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: Dongli Zhang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Joe Jin <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: zhongjiang-ali <[email protected]> Acked-by: Xunlei Pang <[email protected]> Acked-by: Joseph Qi <[email protected]>

OpenAnolis Bug Tracker: 0000566 commit 77bb2f2 upstream. There is one one corner case at dma_fence_signal_locked which will raise the NULL pointer problem just like below. ->dma_fence_signal ->dma_fence_signal_locked ->test_and_set_bit here trigger dma_fence_release happen due to the zero of fence refcount. ->dma_fence_put ->dma_fence_release ->drm_sched_fence_release_scheduled ->call_rcu here make the union fled “cb_list” at finished fence to NULL because struct rcu_head contains two pointer which is same as struct list_head cb_list Therefore, to hold the reference of finished fence at drm_sched_process_job to prevent the null pointer during finished fence dma_fence_signal [ 732.912867] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 732.914815] #PF: supervisor write access in kernel mode [ 732.915731] #PF: error_code(0x0002) - not-present page [ 732.916621] PGD 0 P4D 0 [ 732.917072] Oops: 0002 [#1] SMP PTI [ 732.917682] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G OE 5.4.0-rc7 #1 [ 732.918980] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 732.920906] RIP: 0010:dma_fence_signal_locked+0x3e/0x100 [ 732.938569] Call Trace: [ 732.939003] <IRQ> [ 732.939364] dma_fence_signal+0x29/0x50 [ 732.940036] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [ 732.940996] drm_sched_process_job+0x34/0xa0 [gpu_sched] [ 732.941910] dma_fence_signal_locked+0x85/0x100 [ 732.942692] dma_fence_signal+0x29/0x50 [ 732.943457] amdgpu_fence_process+0x99/0x120 [amdgpu] [ 732.944393] sdma_v4_0_process_trap_irq+0x81/0xa0 [amdgpu] v2: hold the finished fence at drm_sched_process_job instead of amdgpu_fence_process v3: resume the blank line Signed-off-by: Yintian Tao <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Shannon Zhao <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Signed-off-by: Xunlei Pang <[email protected]>

OpenAnolis Bug Tracker: 0000566 fix #31401845 commit a0e4001 upstream Raven provides retimer feature support that requires i2c interaction in order to make it work well, all settings required for this configuration are loaded from the Atom bios which include the i2c address. If the retimer feature is not available, we should abort the attempt to set this feature, otherwise, it makes the following line return I2C_CHANNEL_OPERATION_NO_RESPONSE: i2c_success = i2c_write(pipe_ctx, slave_address, buffer, sizeof(buffer)); ... if (!i2c_success) ASSERT(i2c_success); This ends up causing problems with hotplugging HDMI displays on Raven, and causes retimer settings to warn like so: WARNING: CPU: 1 PID: 429 at drivers/gpu/drm/amd/amdgpu/../dal/dc/core/dc_link.c:1998 write_i2c_retimer_setting+0xc2/0x3c0 [amdgpu] Modules linked in: edac_mce_amd ccp kvm irqbypass binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel amdgpu(+) snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel snd_seq amd_iommu_v2 gpu_sched aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device ttm drm_kms_helper snd_timer eeepc_wmi wmi_bmof asus_wmi sparse_keymap drm mxm_wmi snd k10temp fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore joydev input_leds mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 igb i2c_algo_bit hid_generic usbhid i2c_piix4 dca ahci hid libahci video wmi gpio_amdpt gpio_generic CPU: 1 PID: 429 Comm: systemd-udevd Tainted: G W 5.2.0-rc1sept162019+ #1 Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 2605 08/06/2019 RIP: 0010:write_i2c_retimer_setting+0xc2/0x3c0 [amdgpu] Code: ff 0f b6 4d ce 44 0f b6 45 cf 44 0f b6 c8 45 89 cf 44 89 e2 48 c7 c6 f0 34 bc c0 bf 04 00 00 00 e8 63 b0 90 ff 45 84 ff 75 02 <0f> 0b 42 0f b6 04 73 8d 50 f6 80 fa 02 77 8c 3c 0a 0f 85 c8 00 00 RSP: 0018:ffffa99d02726fd0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa99d02727035 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff976acc857440 RBP: ffffa99d02727018 R08: 0000000000000002 R09: 000000000002a600 R10: ffffe90610193680 R11: 00000000000005e3 R12: 000000000000005d R13: ffff976ac4b201b8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f14f99e1680(0000) GS:ffff976acc840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdf212843b8 CR3: 0000000408906000 CR4: 00000000003406e0 Call Trace: core_link_enable_stream+0x626/0x680 [amdgpu] dce110_apply_ctx_to_hw+0x414/0x4e0 [amdgpu] dc_commit_state+0x331/0x5e0 [amdgpu] ? drm_calc_timestamping_constants+0xf9/0x150 [drm] amdgpu_dm_atomic_commit_tail+0x395/0x1e00 [amdgpu] ? dm_plane_helper_prepare_fb+0x20c/0x280 [amdgpu] commit_tail+0x42/0x70 [drm_kms_helper] drm_atomic_helper_commit+0x10c/0x120 [drm_kms_helper] amdgpu_dm_atomic_commit+0x95/0xa0 [amdgpu] drm_atomic_commit+0x4a/0x50 [drm] restore_fbdev_mode_atomic+0x1c0/0x1e0 [drm_kms_helper] restore_fbdev_mode+0x4c/0x160 [drm_kms_helper] ? _cond_resched+0x19/0x40 drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_helper] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] fbcon_init+0x471/0x630 visual_init+0xd5/0x130 do_bind_con_driver+0x20a/0x430 do_take_over_console+0x7d/0x1b0 do_fbcon_takeover+0x5c/0xb0 fbcon_event_notify+0x6cd/0x8a0 notifier_call_chain+0x4c/0x70 blocking_notifier_call_chain+0x43/0x60 fb_notifier_call_chain+0x1b/0x20 register_framebuffer+0x254/0x360 __drm_fb_helper_initial_config_and_unlock+0x2c5/0x510 [drm_kms_helper] drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper] amdgpu_fbdev_init+0xcd/0x100 [amdgpu] amdgpu_device_init+0x1156/0x1930 [amdgpu] amdgpu_driver_load_kms+0x8d/0x2e0 [amdgpu] drm_dev_register+0x12b/0x1c0 [drm] amdgpu_pci_probe+0xd3/0x160 [amdgpu] local_pci_probe+0x47/0xa0 pci_device_probe+0x142/0x1b0 really_probe+0xf5/0x3d0 driver_probe_device+0x11b/0x130 device_driver_attach+0x58/0x60 __driver_attach+0xa3/0x140 ? device_driver_attach+0x60/0x60 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x74/0xb0 ? kmem_cache_alloc_trace+0x1a3/0x1c0 driver_attach+0x1e/0x20 bus_add_driver+0x147/0x220 ? 0xffffffffc0cb9000 driver_register+0x60/0x100 ? 0xffffffffc0cb9000 __pci_register_driver+0x5a/0x60 amdgpu_init+0x74/0x83 [amdgpu] do_one_initcall+0x4a/0x1fa ? _cond_resched+0x19/0x40 ? kmem_cache_alloc_trace+0x3f/0x1c0 ? __vunmap+0x1cc/0x200 do_init_module+0x5f/0x227 load_module+0x2330/0x2b40 __do_sys_finit_module+0xfc/0x120 ? __do_sys_finit_module+0xfc/0x120 __x64_sys_finit_module+0x1a/0x20 do_syscall_64+0x5a/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f14f9500839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff9bc4f5a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055afb5abce30 RCX: 00007f14f9500839 RDX: 0000000000000000 RSI: 000055afb5ace0f0 RDI: 0000000000000017 RBP: 000055afb5ace0f0 R08: 0000000000000000 R09: 000000000000000a R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000000000 R13: 000055afb5aad800 R14: 0000000000020000 R15: 0000000000000000 ---[ end trace c286e96563966f08 ]--- This commit reworks the way that we handle i2c write for retimer in the way that we abort this configuration if the feature is not available in the device. For debug sake, we kept a simple log message in case the retimer is not available. Signed-off-by: Rodrigo Siqueira <[email protected]> Reviewed-by: Hersen Wu <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Jerry Yao <[email protected]> Acked-by: Baolin Wang <[email protected]> Signed-off-by: Xunlei Pang <[email protected]>

OpenAnolis Bug Tracker: 0000566 fix #31401845 commit 1d0e16a upstream Set ttm->sg to NULL after kfree, to avoid memory corruption backtrace: [ 420.932812] kernel BUG at /build/linux-do9eLF/linux-4.15.0/mm/slub.c:295! [ 420.934182] invalid opcode: 0000 [#1] SMP NOPTI [ 420.935445] Modules linked in: xt_conntrack ipt_MASQUERADE [ 420.951332] Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 1.5.4 07/09/2020 [ 420.952887] RIP: 0010:__slab_free+0x180/0x2d0 [ 420.954419] RSP: 0018:ffffbe426291fa60 EFLAGS: 00010246 [ 420.955963] RAX: ffff9e29263e9c30 RBX: ffff9e29263e9c30 RCX: 000000018100004b [ 420.957512] RDX: ffff9e29263e9c30 RSI: fffff3d33e98fa40 RDI: ffff9e297e407a80 [ 420.959055] RBP: ffffbe426291fb00 R08: 0000000000000001 R09: ffffffffc0d39ade [ 420.960587] R10: ffffbe426291fb20 R11: ffff9e49ffdd4000 R12: ffff9e297e407a80 [ 420.962105] R13: fffff3d33e98fa40 R14: ffff9e29263e9c30 R15: ffff9e2954464fd8 [ 420.963611] FS: 00007fa2ea097780(0000) GS:ffff9e297e840000(0000) knlGS:0000000000000000 [ 420.965144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 420.966663] CR2: 00007f16bfffefb8 CR3: 0000001ff0c62000 CR4: 0000000000340ee0 [ 420.968193] Call Trace: [ 420.969703] ? __page_cache_release+0x3c/0x220 [ 420.971294] ? amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.972789] kfree+0x168/0x180 [ 420.974353] ? amdgpu_ttm_tt_set_user_pages+0x64/0xc0 [amdgpu] [ 420.975850] ? kfree+0x168/0x180 [ 420.977403] amdgpu_ttm_tt_unpopulate+0x5e/0x80 [amdgpu] [ 420.978888] ttm_tt_unpopulate.part.10+0x53/0x60 [amdttm] [ 420.980357] ttm_tt_destroy.part.11+0x4f/0x60 [amdttm] [ 420.981814] ttm_tt_destroy+0x13/0x20 [amdttm] [ 420.983273] ttm_bo_cleanup_memtype_use+0x36/0x80 [amdttm] [ 420.984725] ttm_bo_release+0x1c9/0x360 [amdttm] [ 420.986167] amdttm_bo_put+0x24/0x30 [amdttm] [ 420.987663] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 420.989165] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x9ca/0xb10 [amdgpu] [ 420.990666] kfd_ioctl_alloc_memory_of_gpu+0xef/0x2c0 [amdgpu] Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Jerry Yao <[email protected]> Acked-by: Baolin Wang <[email protected]> Signed-off-by: Xunlei Pang <[email protected]>

ANBZ: #26 commit 97abb2b upstream Add commit description from patch #1 as a stand-alone documentation under Documentation/bpf, as it might be more convenient format, in long term perspective. Suggested-by: Stanislav Fomichev <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Qiao Ma <[email protected]> Acked-by: Tony Lu <[email protected]>

ANBZ: #208 commit c1e63117711977cc4295b2ce73de29dd17066c82 upstream. To clear a user buffer we cannot simply use memset, we have to use clear_user(). With a virtio-mem device that registers a vmcore_cb and has some logically unplugged memory inside an added Linux memory block, I can easily trigger a BUG by copying the vmcore via "cp": systemd[1]: Starting Kdump Vmcore Save Service... kdump[420]: Kdump is using the default log level(3). kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/ kdump[465]: saving vmcore-dmesg.txt complete kdump[467]: saving vmcore BUG: unable to handle page fault for address: 00007f2374e01000 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86 Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81 RSP: 0018:ffffc9000073be08 EFLAGS: 00010212 RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000 RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008 RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50 R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000 R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8 FS: 00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0 Call Trace: read_vmcore+0x236/0x2c0 proc_reg_read+0x55/0xa0 vfs_read+0x95/0x190 ksys_read+0x4f/0xc0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access Prevention (SMAP)", which is used to detect wrong access from the kernel to user buffers like this: SMAP triggers a permissions violation on wrong access. In the x86-64 variant of clear_user(), SMAP is properly handled via clac()+stac(). To fix, properly use clear_user() when we're dealing with a user buffer. Link: https://lkml.kernel.org/r/[email protected] Fixes: 997c136 ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Hongnan Li <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #219 commit 92063f3ca73aab794bd5408d3361fd5b5ea33079 upstream. The kernel may be built with multiple LSMs, but only a subset may be enabled on the boot command line by specifying "lsm=". Not including "integrity" on the ordered LSM list may result in a NULL deref. As reported by Dmitry Vyukov: in qemu: qemu-system-x86_64 -enable-kvm -machine q35,nvdimm -cpu max,migratable=off -smp 4 -m 4G,slots=4,maxmem=16G -hda wheezy.img -kernel arch/x86/boot/bzImage -nographic -vga std -soundhw all -usb -usbdevice tablet -bt hci -bt device:keyboard -net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic,model=virtio-net-pci -object memory-backend-file,id=pmem1,share=off,mem-path=/dev/zero,size=64M -device nvdimm,id=nvdimm1,memdev=pmem1 -append "console=ttyS0 root=/dev/sda earlyprintk=serial rodata=n oops=panic panic_on_warn=1 panic=86400 lsm=smack numa=fake=2 nopcid dummy_hcd.num=8" -pidfile vm_pid -m 2G -cpu host But it crashes on NULL deref in integrity_inode_get during boot: Run /sbin/init as init process BUG: kernel NULL pointer dereference, address: 000000000000001c PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc2+ #97 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-44-g88ab0c15525c-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmem_cache_alloc+0x2b/0x370 mm/slub.c:2920 Code: 57 41 56 41 55 41 54 41 89 f4 55 48 89 fd 53 48 83 ec 10 44 8b 3d d9 1f 90 0b 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 <8b> 5f 1c 4cf RSP: 0000:ffffc9000032f9d8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888017fc4f00 RCX: 0000000000000000 RDX: ffff888040220000 RSI: 0000000000000c40 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff888019263627 R10: ffffffff83937cd1 R11: 0000000000000000 R12: 0000000000000c40 R13: ffff888019263538 R14: 0000000000000000 R15: 0000000000ffffff FS: 0000000000000000(0000) GS:ffff88802d180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000001c CR3: 000000000b48e000 CR4: 0000000000750ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: integrity_inode_get+0x47/0x260 security/integrity/iint.c:105 process_measurement+0x33d/0x17e0 security/integrity/ima/ima_main.c:237 ima_bprm_check+0xde/0x210 security/integrity/ima/ima_main.c:474 security_bprm_check+0x7d/0xa0 security/security.c:845 search_binary_handler fs/exec.c:1708 [inline] exec_binprm fs/exec.c:1761 [inline] bprm_execve fs/exec.c:1830 [inline] bprm_execve+0x764/0x19a0 fs/exec.c:1792 kernel_execve+0x370/0x460 fs/exec.c:1973 try_to_run_init_process+0x14/0x4e init/main.c:1366 kernel_init+0x11d/0x1b8 init/main.c:1477 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Modules linked in: CR2: 000000000000001c ---[ end trace 22d601a500de7d79 ]--- Since LSMs and IMA may be configured at build time, but not enabled at run time, panic the system if "integrity" was not initialized before use. Reported-by: Dmitry Vyukov <[email protected]> Fixes: 79f7865 ("LSM: Introduce "lsm=" for boottime LSM selection") Cc: [email protected] Signed-off-by: Mimi Zohar <[email protected]> Signed-off-by: Tianjia Zhang <[email protected]> Reviewed-by: Shile Zhang <[email protected]> Reviewed-by: Jia Zhang <[email protected]>

ANBZ: #239 commit a850e932df657c11f2030920dbda5f5621cef091 upstream. On non-preemptible kernel builds the watchdog can complain about soft lockups when vfree() is called against large vmalloc areas: [ 210.851798] kvmalloc-test: vmalloc(2199023255552) succeeded [ 238.654842] watchdog: BUG: soft lockup - CPU#181 stuck for 26s! [rmmod:5203] [ 238.662716] Modules linked in: kvmalloc_test(OE-) ... [ 238.772671] CPU: 181 PID: 5203 Comm: rmmod Tainted: G S OE 5.13.0-rc7+ #1 [ 238.781413] Hardware name: Intel Corporation PURLEY/PURLEY, BIOS PLYXCRB1.86B.0553.D01.1809190614 09/19/2018 [ 238.792383] RIP: 0010:free_unref_page+0x52/0x60 [ 238.797447] Code: 48 c1 fd 06 48 89 ee e8 9c d0 ff ff 84 c0 74 19 9c 41 5c fa 48 89 ee 48 89 df e8 b9 ea ff ff 41 f7 c4 00 02 00 00 74 01 fb 5b <5d> 41 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 f0 29 77 [ 238.818406] RSP: 0018:ffffb4d87868fe98 EFLAGS: 00000206 [ 238.824236] RAX: 0000000000000000 RBX: 000000001da0c945 RCX: ffffb4d87868fe40 [ 238.832200] RDX: ffffd79d3beed108 RSI: ffffd7998501dc08 RDI: ffff9c6fbffd7010 [ 238.840166] RBP: 000000000d518cbd R08: ffffd7998501dc08 R09: 0000000000000001 [ 238.848131] R10: 0000000000000000 R11: ffffd79d3beee088 R12: 0000000000000202 [ 238.856095] R13: ffff9e5be3eceec0 R14: 0000000000000000 R15: 0000000000000000 [ 238.864059] FS: 00007fe082c2d740(0000) GS:ffff9f4c69b40000(0000) knlGS:0000000000000000 [ 238.873089] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 238.879503] CR2: 000055a000611128 CR3: 000000f6094f6006 CR4: 00000000007706e0 [ 238.887467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 238.895433] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 238.903397] PKRU: 55555554 [ 238.906417] Call Trace: [ 238.909149] __vunmap+0x17c/0x220 [ 238.912851] __x64_sys_delete_module+0x13a/0x250 [ 238.918008] ? syscall_trace_enter.isra.20+0x13c/0x1b0 [ 238.923746] do_syscall_64+0x39/0x80 [ 238.927740] entry_SYSCALL_64_after_hwframe+0x44/0xae Like in other range zapping routines that iterate over a large list, lets just add cond_resched() within __vunmap()'s page-releasing loop in order to avoid the watchdog splats. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rafael Aquini <[email protected]> Acked-by: Nicholas Piggin <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Reviewed-by: Xunlei Pang <[email protected]> Signed-off-by: Guixin Liu <[email protected]>

ANBZ: #442 commit 380a0091cab482489e9b19e07f2a166ad2b76d5c upstream. We got issue as follows when run syzkaller: [ 167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user [ 167.938306] EXT4-fs (loop0): Remounting filesystem read-only [ 167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0' [ 167.983601] ------------[ cut here ]------------ [ 167.984245] kernel BUG at fs/ext4/inode.c:847! [ 167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G B 5.16.0-rc5-next-20211217+ #123 [ 167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 167.988590] RIP: 0010:ext4_getblk+0x17e/0x504 [ 167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244 [ 167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282 [ 167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000 [ 167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66 [ 167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09 [ 167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8 [ 167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001 [ 167.997158] FS: 00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000 [ 167.998238] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0 [ 167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.001899] Call Trace: [ 168.002235] <TASK> [ 168.007167] ext4_bread+0xd/0x53 [ 168.007612] ext4_quota_write+0x20c/0x5c0 [ 168.010457] write_blk+0x100/0x220 [ 168.010944] remove_free_dqentry+0x1c6/0x440 [ 168.011525] free_dqentry.isra.0+0x565/0x830 [ 168.012133] remove_tree+0x318/0x6d0 [ 168.014744] remove_tree+0x1eb/0x6d0 [ 168.017346] remove_tree+0x1eb/0x6d0 [ 168.019969] remove_tree+0x1eb/0x6d0 [ 168.022128] qtree_release_dquot+0x291/0x340 [ 168.023297] v2_release_dquot+0xce/0x120 [ 168.023847] dquot_release+0x197/0x3e0 [ 168.024358] ext4_release_dquot+0x22a/0x2d0 [ 168.024932] dqput.part.0+0x1c9/0x900 [ 168.025430] __dquot_drop+0x120/0x190 [ 168.025942] ext4_clear_inode+0x86/0x220 [ 168.026472] ext4_evict_inode+0x9e8/0xa22 [ 168.028200] evict+0x29e/0x4f0 [ 168.028625] dispose_list+0x102/0x1f0 [ 168.029148] evict_inodes+0x2c1/0x3e0 [ 168.030188] generic_shutdown_super+0xa4/0x3b0 [ 168.030817] kill_block_super+0x95/0xd0 [ 168.031360] deactivate_locked_super+0x85/0xd0 [ 168.031977] cleanup_mnt+0x2bc/0x480 [ 168.033062] task_work_run+0xd1/0x170 [ 168.033565] do_exit+0xa4f/0x2b50 [ 168.037155] do_group_exit+0xef/0x2d0 [ 168.037666] __x64_sys_exit_group+0x3a/0x50 [ 168.038237] do_syscall_64+0x3b/0x90 [ 168.038751] entry_SYSCALL_64_after_hwframe+0x44/0xae In order to reproduce this problem, the following conditions need to be met: 1. Ext4 filesystem with no journal; 2. Filesystem image with incorrect quota data; 3. Abort filesystem forced by user; 4. umount filesystem; As in ext4_quota_write: ... if (EXT4_SB(sb)->s_journal && !handle) { ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)" " cancelled because transaction is not started", (unsigned long long)off, (unsigned long long)len); return -EIO; } ... We only check handle if NULL when filesystem has journal. There is need check handle if NULL even when filesystem has no journal. Signed-off-by: Ye Bin <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Hongnan Li <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #543 commit b43a9e76b4cc78cdaa8c809dd31cd452797b7661 upstream. Boyang reported that the commit c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") causes the kernel to crash while running xfstests generic/256 on ext4 on aarch64 and ppc64le. run fstests generic/256 at 2021-07-12 05:41:40 EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: . Quota mode: none. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=00000000b0502000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP Modules linked in: dm_flakey dm_snapshot dm_bufio dm_zero dm_mod loop tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc ext4 vfat fat mbcache jbd2 drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover virtio_console failover virtio_mmio aes_neon_bs [last unloaded: scsi_debug] CPU: 0 PID: 408468 Comm: kworker/u8:5 Tainted: G X --------- --- 5.14.0-0.rc1.15.bx.el9.aarch64 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: events_unbound cleanup_offline_cgwbs_workfn pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO BTYPE=--) pc : cleanup_offline_cgwbs_workfn+0x320/0x394 lr : cleanup_offline_cgwbs_workfn+0xe0/0x394 sp : ffff80001554fd10 x29: ffff80001554fd10 x28: 0000000000000000 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000000000e0 x24: ffffd2a2fbe671a8 x23: ffff80001554fd88 x22: ffffd2a2fbe67198 x21: ffffd2a2fc25a730 x20: ffff210412bc3000 x19: ffff210412bc3280 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000040 x11: ffff210481572238 x10: ffff21048157223a x9 : ffffd2a2fa276c60 x8 : ffff210484106b60 x7 : 0000000000000000 x6 : 000000000007d18a x5 : ffff210416a86400 x4 : ffff210412bc0280 x3 : 0000000000000000 x2 : ffff80001554fd88 x1 : ffff210412bc0280 x0 : 0000000000000003 Call trace: cleanup_offline_cgwbs_workfn+0x320/0x394 process_one_work+0x1f4/0x4b0 worker_thread+0x184/0x540 kthread+0x114/0x120 ret_from_fork+0x10/0x18 Code: d63f0020 97f99963 17ffffa6 f8588263 (f9400061) ---[ end trace e250fe289272792a ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 0-2 Kernel Offset: 0x52a2e9fa0000 from 0xffff800010000000 PHYS_OFFSET: 0xfff0defca0000000 CPU features: 0x00200251,23200840 Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- The problem happens when cgwb_release_workfn() races with cleanup_offline_cgwbs_workfn(): wb_tryget() in cleanup_offline_cgwbs_workfn() can be called after percpu_ref_exit() is cgwb_release_workfn(), which is basically a use-after-free error. Fix the problem by making removing the writeback structure from the offline list before releasing the percpu reference counter. It will guarantee that cleanup_offline_cgwbs_workfn() will not see and not access writeback structures which are about to be released. Link: https://lkml.kernel.org/r/[email protected] Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Boyang Xue <[email protected]> Suggested-by: Jan Kara <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Cc: Will Deacon <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Murphy Zhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #543 commit 593311e85b26ecc6e4d45b6fb81b942b6672df09 upstream. The inode switching code is not suited for dax inodes. An attempt to switch a dax inode to a parent writeback structure (as a part of a writeback cleanup procedure) results in a panic like this: run fstests generic/270 at 2021-07-15 05:54:02 XFS (pmem0p2): EXPERIMENTAL big timestamp feature in use. Use at your own risk! XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk XFS (pmem0p2): EXPERIMENTAL inode btree counters feature in use. Use at your own risk! XFS (pmem0p2): Mounting V5 Filesystem XFS (pmem0p2): Ending clean mount XFS (pmem0p2): Quotacheck needed: Please wait. XFS (pmem0p2): Quotacheck: Done. XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) BUG: unable to handle page fault for address: 0000000005b0f669 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted 5.14.0-rc1-master-8096acd7442e+ #8 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016 Workqueue: inode_switch_wbs inode_switch_wbs_work_fn RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Call Trace: inode_switch_wbs_work_fn+0xb6/0x2a0 process_one_work+0x1e6/0x380 worker_thread+0x53/0x3d0 kthread+0x10f/0x130 ret_from_fork+0x22/0x30 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3 ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000005b0f669 ---[ end trace ed2105faff8384f3 ]--- RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x15200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The crash happens on an attempt to iterate over attached pagecache pages and check the dirty flag: a dax inode's xarray contains pfn's instead of generic struct page pointers. This happens for DAX and not for other kinds of non-page entries in the inodes because it's a tagged iteration, and shadow/swap entries are never tagged; only DAX entries get tagged. Fix the problem by bailing out (with the false return value) of inode_prepare_sbs_switch() if a dax inode is passed. [[email protected]: changelog addition] Link: https://lkml.kernel.org/r/[email protected] Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Murphy Zhou <[email protected]> Reported-by: Darrick J. Wong <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Tested-by: Murphy Zhou <[email protected]> Acked-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #691 reclaim_wmark() can even run after the corresponding memory cgroup is offline, so it's not safe to call css_get() which needs the caller must already have a reference of the memory cgroup. Or it may cause kernel panic as below: [32258.819208] BUG: kernel NULL pointer dereference, address: 0000000000000000 [32258.827789] #PF: supervisor write access in kernel mode [32258.834264] #PF: error_code(0x0002) - not-present page [32258.840731] PGD 1f02aa067 P4D 1f02aa067 PUD 104837067 PMD 0 [32258.847688] Oops: 0002 [#1] SMP NOPTI [32258.877042] Workqueue: memcg_wmark wmark_work_func [32258.883330] RIP: 0010:reclaim_wmark+0x119/0x130 [32258.912162] RSP: 0018:ffffc90031157e60 EFLAGS: 00010206 [32258.918863] RAX: 0000000000000000 RBX: ffff889efeab0000 RCX: 0000000000000017 [32258.927814] RDX: 0000000000198340 RSI: 0000000003a1e7da RDI: 0018ad6ea0a759c0 [32258.936760] RBP: 00000036006cc432 R08: 0000000000004b2c R09: ffff88a053da2768 [32258.945683] R10: ffff88a05c271410 R11: 000000000383db12 R12: 00001d204c3bef5f [32258.954617] R13: 0000000000000000 R14: ffff888102abc100 R15: 0000000000000000 [32258.963561] FS: 0000000000000000(0000) GS:ffff88fe7e5c0000(0000) knlGS:0000000000000000 [32258.973597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [32258.980977] CR2: 0000000000000000 CR3: 000000011d0f6000 CR4: 00000000003506e0 [32258.989940] Call Trace: [32258.993603] wmark_work_func+0x22/0x30 [32258.998729] process_one_work+0x1aa/0x340 [32259.004158] worker_thread+0x1f3/0x300 [32259.009307] ? process_one_work+0x340/0x340 [32259.014936] kthread+0x118/0x130 [32259.019467] ? __kthread_bind_mask+0x60/0x60 [32259.025200] ret_from_fork+0x1f/0x30 Use css_tryget_online() to fix this issue. Fixs: '40969475355ab ("ck: mm, memcg: record latency of memcg wmark reclaim")' Signed-off-by: Gang Deng <[email protected]> Reviewed-by: Xu Yu <[email protected]> Acked-by: Xunlei Pang <[email protected]>

commit 349d43127dac00c15231e8ffbcaabd70f7b0e544 net-next. A crash occurs when smc_cdc_tx_handler() tries to access smc_sock but smc_release() has already freed it. [ 4570.695099] BUG: unable to handle page fault for address: 000000002eae9e88 [ 4570.696048] #PF: supervisor write access in kernel mode [ 4570.696728] #PF: error_code(0x0002) - not-present page [ 4570.697401] PGD 0 P4D 0 [ 4570.697716] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 4570.698228] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc4+ #111 [ 4570.699013] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/0 [ 4570.699933] RIP: 0010:_raw_spin_lock+0x1a/0x30 <...> [ 4570.711446] Call Trace: [ 4570.711746] <IRQ> [ 4570.711992] smc_cdc_tx_handler+0x41/0xc0 [ 4570.712470] smc_wr_tx_tasklet_fn+0x213/0x560 [ 4570.712981] ? smc_cdc_tx_dismisser+0x10/0x10 [ 4570.713489] tasklet_action_common.isra.17+0x66/0x140 [ 4570.714083] __do_softirq+0x123/0x2f4 [ 4570.714521] irq_exit_rcu+0xc4/0xf0 [ 4570.714934] common_interrupt+0xba/0xe0 Though smc_cdc_tx_handler() checked the existence of smc connection, smc_release() may have already dismissed and released the smc socket before smc_cdc_tx_handler() further visits it. smc_cdc_tx_handler() |smc_release() if (!conn) | | |smc_cdc_tx_dismiss_slots() | smc_cdc_tx_dismisser() | |sock_put(&smc->sk) <- last sock_put, | smc_sock freed bh_lock_sock(&smc->sk) (panic) | To make sure we won't receive any CDC messages after we free the smc_sock, add a refcount on the smc_connection for inflight CDC message(posted to the QP but haven't received related CQE), and don't release the smc_connection until all the inflight CDC messages haven been done, for both success or failed ones. Using refcount on CDC messages brings another problem: when the link is going to be destroyed, smcr_link_clear() will reset the QP, which then remove all the pending CQEs related to the QP in the CQ. To make sure all the CQEs will always come back so the refcount on the smc_connection can always reach 0, smc_ib_modify_qp_reset() was replaced by smc_ib_modify_qp_error(). And remove the timeout in smc_wr_tx_wait_no_pending_sends() since we need to wait for all pending WQEs done, or we may encounter use-after- free when handling CQEs. For IB device removal routine, we need to wait for all the QPs on that device been destroyed before we can destroy CQs on the device, or the refcount on smc_connection won't reach 0 and smc_sock cannot be released. Fixes: 5f08318 ("smc: connection data control (CDC)") Reported-by: Wen Gu <[email protected]> Signed-off-by: Dust Li <[email protected]> Signed-off-by: David S. Miller <[email protected]> Acked-by: Tony Lu <[email protected]>

ANBZ: #264 SMC connections might fail to be registered to a link group due to things like unable to find a link to assign to in its creation. As a result, connection creation will return a failure and most resources related to the connection won't be applied or initialized, such as conn->abort_work or conn->lnk. If smc_conn_free() is invoked later, it will try to access the resources related to the connection, which wasn't initialized, thus causing a panic. Here is an example, a SMC-R connection failed to be registered to a link group and conn->lnk is NULL. The following crash will happen if smc_conn_free() tries to access conn->lnk in smc_cdc_tx_dismiss_slots(). BUG: kernel NULL pointer dereference, address: 0000000000000168 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 4 PID: 68 Comm: kworker/4:1 Kdump: loaded Tainted: G E 5.16.0-rc5+ #52 Workqueue: smc_hs_wq smc_listen_work [smc] RIP: 0010:smc_wr_tx_dismiss_slots+0x1e/0xc0 [smc] Call Trace: <TASK> smc_conn_free+0xd8/0x100 [smc] smc_lgr_cleanup_early+0x15/0x90 [smc] smc_listen_work+0x302/0x1230 [smc] ? process_one_work+0x25c/0x600 process_one_work+0x25c/0x600 worker_thread+0x4f/0x3a0 ? process_one_work+0x600/0x600 kthread+0x15d/0x1a0 ? set_kthread_struct+0x40/0x40 ret_from_fork+0x1f/0x30 </TASK> This patch tries to fix this by resetting conn->lgr to NULL if an abnormal exit due to lgr register failure occurs in smc_conn_create(), thus avoiding the crash caused by accessing the uninitialized resources in smc_conn_free(). Signed-off-by: Wen Gu <[email protected]> Acked-by: Tony Lu <[email protected]>

ANBZ: #264 We encountered a crash in smc_setsockopt() and it is caused by accessing smc->clcsock after clcsock was released. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E 5.16.0-rc4+ #53 RIP: 0010:smc_setsockopt+0x59/0x280 [smc] Call Trace: <TASK> __sys_setsockopt+0xfc/0x190 __x64_sys_setsockopt+0x20/0x30 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f16ba83918e </TASK> This patch tries to fix it by holding clcsock_release_lock and checking whether clcsock has already been released. In case that a crash of the same reason happens in smc_getsockopt(), this patch also checkes smc->clcsock in smc_getsockopt(). Signed-off-by: Wen Gu <[email protected]> Acked-by: Tony Lu <[email protected]>

ANBZ: #173 commit d298b03506d3e161f7492c440babb0bfae35e650 upstream. Ser Olmy reported a boot failure: init[1] bad frame in sigreturn frame:(ptrval) ip:b7c9fbe6 sp:bf933310 orax:ffffffff \ in libc-2.33.so[b7bed000+156000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 0 PID: 1 Comm: init Tainted: G W 5.14.9 #1 Hardware name: Hewlett-Packard HP PC/HP Board, BIOS JD.00.06 12/06/2001 Call Trace: dump_stack_lvl dump_stack panic do_exit.cold do_group_exit get_signal arch_do_signal_or_restart ? force_sig_info_to_task ? force_sig exit_to_user_mode_prepare syscall_exit_to_user_mode do_int80_syscall_32 entry_INT80_32 on an old 32-bit Intel CPU: vendor_id : GenuineIntel cpu family : 6 model : 6 model name : Celeron (Mendocino) stepping : 5 microcode : 0x3 Ser bisected the problem to the commit in Fixes. tglx suggested reverting the rejection of invalid MXCSR values which this commit introduced and replacing it with what the old code did - simply masking them out to zero. Further debugging confirmed his suggestion: fpu->state.fxsave.mxcsr: 0xb7be13b4, mxcsr_feature_mask: 0xffbf WARNING: CPU: 0 PID: 1 at arch/x86/kernel/fpu/signal.c:384 __fpu_restore_sig+0x51f/0x540 so restore the original behavior only for 32-bit kernels where you have ancient machines with buggy hardware. For 32-bit programs on 64-bit kernels, user space which supplies wrong MXCSR values is considered malicious so fail the sigframe restoration there. Intel-SIG: commit d298b03506d3 x86/fpu: Restore the masking out of reserved MXCSR bits. This patch fixes an issue introduced in previous FPU cleanup. Fixes: 6f9866a166cd ("x86/fpu/signal: Let xrstor handle the features to init") Reported-by: Ser Olmy <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Tested-by: Ser Olmy <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] [ Lin Wang: amend commit log ] Signed-off-by: Lin Wang <[email protected]> Reviewed-by: Artie Ding <[email protected]>

ANBZ: #481 commit 8bad28d8a305b0e5ae444c8c3051e8744f5a4296 upstream. Abaci reported the below issue: [ 141.400455] hrtimer: interrupt took 205853 ns [ 189.869316] process 'usr/local/ilogtail/ilogtail_0.16.26' started with executable stack [ 250.188042] [ 250.188327] ============================================ [ 250.189015] WARNING: possible recursive locking detected [ 250.189732] 5.11.0-rc4 #1 Not tainted [ 250.190267] -------------------------------------------- [ 250.190917] a.out/7363 is trying to acquire lock: [ 250.191506] ffff888114dbcbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __io_req_task_submit+0x29/0xa0 [ 250.192599] [ 250.192599] but task is already holding lock: [ 250.193309] ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.194426] [ 250.194426] other info that might help us debug this: [ 250.195238] Possible unsafe locking scenario: [ 250.195238] [ 250.196019] CPU0 [ 250.196411] ---- [ 250.196803] lock(&ctx->uring_lock); [ 250.197420] lock(&ctx->uring_lock); [ 250.197966] [ 250.197966] *** DEADLOCK *** [ 250.197966] [ 250.198837] May be due to missing lock nesting notation [ 250.198837] [ 250.199780] 1 lock held by a.out/7363: [ 250.200373] #0: ffff888114dbfbe8 (&ctx->uring_lock){+.+.}-{3:3}, at: __x64_sys_io_uring_register+0xad/0x210 [ 250.201645] [ 250.201645] stack backtrace: [ 250.202298] CPU: 0 PID: 7363 Comm: a.out Not tainted 5.11.0-rc4 #1 [ 250.203144] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 250.203887] Call Trace: [ 250.204302] dump_stack+0xac/0xe3 [ 250.204804] __lock_acquire+0xab6/0x13a0 [ 250.205392] lock_acquire+0x2c3/0x390 [ 250.205928] ? __io_req_task_submit+0x29/0xa0 [ 250.206541] __mutex_lock+0xae/0x9f0 [ 250.207071] ? __io_req_task_submit+0x29/0xa0 [ 250.207745] ? 0xffffffffa0006083 [ 250.208248] ? __io_req_task_submit+0x29/0xa0 [ 250.208845] ? __io_req_task_submit+0x29/0xa0 [ 250.209452] ? __io_req_task_submit+0x5/0xa0 [ 250.210083] __io_req_task_submit+0x29/0xa0 [ 250.210687] io_async_task_func+0x23d/0x4c0 [ 250.211278] task_work_run+0x89/0xd0 [ 250.211884] io_run_task_work_sig+0x50/0xc0 [ 250.212464] io_sqe_files_unregister+0xb2/0x1f0 [ 250.213109] __io_uring_register+0x115a/0x1750 [ 250.213718] ? __x64_sys_io_uring_register+0xad/0x210 [ 250.214395] ? __fget_files+0x15a/0x260 [ 250.214956] __x64_sys_io_uring_register+0xbe/0x210 [ 250.215620] ? trace_hardirqs_on+0x46/0x110 [ 250.216205] do_syscall_64+0x2d/0x40 [ 250.216731] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 250.217455] RIP: 0033:0x7f0fa17e5239 [ 250.218034] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 48 [ 250.220343] RSP: 002b:00007f0fa1eeac48 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab [ 250.221360] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0fa17e5239 [ 250.222272] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000008 [ 250.223185] RBP: 00007f0fa1eeae20 R08: 0000000000000000 R09: 0000000000000000 [ 250.224091] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 250.224999] R13: 0000000000021000 R14: 0000000000000000 R15: 00007f0fa1eeb700 This is caused by calling io_run_task_work_sig() to do work under uring_lock while the caller io_sqe_files_unregister() already held uring_lock. To fix this issue, briefly drop uring_lock when calling io_run_task_work_sig(), and there are two things to concern: - hold uring_lock in io_ring_ctx_free() around io_sqe_files_unregister() this is for consistency of lock/unlock. - add new fixed rsrc ref node before dropping uring_lock it's not safe to do io_uring_enter-->percpu_ref_get() with a dying one. - check if rsrc_data->refs is dying to avoid parallel io_sqe_files_unregister [ANCK backport notes] This patch cannot apply to 5.10 cleanly because buffer registration enhancements introduces much refactoring. So take the core idea to mutex_unlock before io_run_task_work_sig() and then mutext_lock again during io_sqe_files_unregister() to fix the issue. Reported-by: Abaci <[email protected]> Fixes: 1ffc54220c44 ("io_uring: fix io_sqe_files_unregister() hangs") Suggested-by: Pavel Begunkov <[email protected]> Signed-off-by: Hao Xu <[email protected]> [axboe: fixes from Pavel folded in] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Joseph Qi <[email protected]> Reviewed-by: Xiaoguang Wang <[email protected]> Reviewed-by: Hao Xu <[email protected]>

ANBZ: #464 commit 79f9bc5843142b649575f887dccdf1c07ad75c20 upstream. Check for a NULL page->mapping before dereferencing the mapping in page_is_secretmem(), as the page's mapping can be nullified while gup() is running, e.g. by reclaim or truncation. BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G W RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0 Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046 RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900 ... CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0 Call Trace: get_user_pages_fast_only+0x13/0x20 hva_to_pfn+0xa9/0x3e0 try_async_pf+0xa1/0x270 direct_page_fault+0x113/0xad0 kvm_mmu_page_fault+0x69/0x680 vmx_handle_exit+0xe1/0x5d0 kvm_arch_vcpu_ioctl_run+0xd81/0x1c70 kvm_vcpu_ioctl+0x267/0x670 __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/[email protected] Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Sean Christopherson <[email protected]> Reported-by: Darrick J. Wong <[email protected]> Reported-by: Stephen <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Yan Yan <[email protected]> Reviewed-by: Xu Yu <[email protected]> Reviewed-by: Tianjia Zhang <[email protected]>

ANBZ: #544 commit b43a9e76b4cc78cdaa8c809dd31cd452797b7661 upstream. Boyang reported that the commit c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") causes the kernel to crash while running xfstests generic/256 on ext4 on aarch64 and ppc64le. run fstests generic/256 at 2021-07-12 05:41:40 EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: . Quota mode: none. Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 user pgtable: 64k pages, 48-bit VAs, pgdp=00000000b0502000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] SMP Modules linked in: dm_flakey dm_snapshot dm_bufio dm_zero dm_mod loop tls rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc ext4 vfat fat mbcache jbd2 drm fuse xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce virtio_blk virtio_net net_failover virtio_console failover virtio_mmio aes_neon_bs [last unloaded: scsi_debug] CPU: 0 PID: 408468 Comm: kworker/u8:5 Tainted: G X --------- --- 5.14.0-0.rc1.15.bx.el9.aarch64 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Workqueue: events_unbound cleanup_offline_cgwbs_workfn pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO BTYPE=--) pc : cleanup_offline_cgwbs_workfn+0x320/0x394 lr : cleanup_offline_cgwbs_workfn+0xe0/0x394 sp : ffff80001554fd10 x29: ffff80001554fd10 x28: 0000000000000000 x27: 0000000000000001 x26: 0000000000000000 x25: 00000000000000e0 x24: ffffd2a2fbe671a8 x23: ffff80001554fd88 x22: ffffd2a2fbe67198 x21: ffffd2a2fc25a730 x20: ffff210412bc3000 x19: ffff210412bc3280 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000040 x11: ffff210481572238 x10: ffff21048157223a x9 : ffffd2a2fa276c60 x8 : ffff210484106b60 x7 : 0000000000000000 x6 : 000000000007d18a x5 : ffff210416a86400 x4 : ffff210412bc0280 x3 : 0000000000000000 x2 : ffff80001554fd88 x1 : ffff210412bc0280 x0 : 0000000000000003 Call trace: cleanup_offline_cgwbs_workfn+0x320/0x394 process_one_work+0x1f4/0x4b0 worker_thread+0x184/0x540 kthread+0x114/0x120 ret_from_fork+0x10/0x18 Code: d63f0020 97f99963 17ffffa6 f8588263 (f9400061) ---[ end trace e250fe289272792a ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 0-2 Kernel Offset: 0x52a2e9fa0000 from 0xffff800010000000 PHYS_OFFSET: 0xfff0defca0000000 CPU features: 0x00200251,23200840 Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- The problem happens when cgwb_release_workfn() races with cleanup_offline_cgwbs_workfn(): wb_tryget() in cleanup_offline_cgwbs_workfn() can be called after percpu_ref_exit() is cgwb_release_workfn(), which is basically a use-after-free error. Fix the problem by making removing the writeback structure from the offline list before releasing the percpu reference counter. It will guarantee that cleanup_offline_cgwbs_workfn() will not see and not access writeback structures which are about to be released. Link: https://lkml.kernel.org/r/[email protected] Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Boyang Xue <[email protected]> Suggested-by: Jan Kara <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Cc: Will Deacon <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Murphy Zhou <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #544 commit 593311e85b26ecc6e4d45b6fb81b942b6672df09 upstream. The inode switching code is not suited for dax inodes. An attempt to switch a dax inode to a parent writeback structure (as a part of a writeback cleanup procedure) results in a panic like this: run fstests generic/270 at 2021-07-15 05:54:02 XFS (pmem0p2): EXPERIMENTAL big timestamp feature in use. Use at your own risk! XFS (pmem0p2): DAX enabled. Warning: EXPERIMENTAL, use at your own risk XFS (pmem0p2): EXPERIMENTAL inode btree counters feature in use. Use at your own risk! XFS (pmem0p2): Mounting V5 Filesystem XFS (pmem0p2): Ending clean mount XFS (pmem0p2): Quotacheck needed: Please wait. XFS (pmem0p2): Quotacheck: Done. XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) XFS (pmem0p2): xlog_verify_grant_tail: space > BBTOB(tail_blocks) BUG: unable to handle page fault for address: 0000000005b0f669 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 13 PID: 10479 Comm: kworker/13:16 Not tainted 5.14.0-rc1-master-8096acd7442e+ #8 Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 09/13/2016 Workqueue: inode_switch_wbs inode_switch_wbs_work_fn RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Call Trace: inode_switch_wbs_work_fn+0xb6/0x2a0 process_one_work+0x1e6/0x380 worker_thread+0x53/0x3d0 kthread+0x10f/0x130 ret_from_fork+0x22/0x30 Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables nfnetlink bridge stp llc rfkill sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit iTCO_wdt irqbypass drm_kms_helper iTCO_vendor_support acpi_ipmi rapl syscopyarea sysfillrect intel_cstate ipmi_si sysimgblt ioatdma dax_pmem_compat fb_sys_fops ipmi_devintf device_dax i2c_i801 pcspkr intel_uncore hpilo nd_pmem cec dax_pmem_core dca i2c_smbus acpi_tad lpc_ich ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel tg3 ghash_clmulni_intel serio_raw hpsa hpwdt scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod CR2: 0000000005b0f669 ---[ end trace ed2105faff8384f3 ]--- RIP: 0010:inode_do_switch_wbs+0xaf/0x470 Code: 00 30 0f 85 c1 03 00 00 0f 1f 44 00 00 31 d2 48 c7 c6 ff ff ff ff 48 8d 7c 24 08 e8 eb 49 1a 00 48 85 c0 74 4a bb ff ff ff ff <48> 8b 50 08 48 8d 4a ff 83 e2 01 48 0f 45 c1 48 8b 00 a8 08 0f 85 RSP: 0018:ffff9c66691abdc8 EFLAGS: 00010002 RAX: 0000000005b0f661 RBX: 00000000ffffffff RCX: ffff89e6a21382b0 RDX: 0000000000000001 RSI: ffff89e350230248 RDI: ffffffffffffffff RBP: ffff89e681d19400 R08: 0000000000000000 R09: 0000000000000228 R10: ffffffffffffffff R11: ffffffffffffffc0 R12: ffff89e6a2138130 R13: ffff89e316af7400 R14: ffff89e316af6e78 R15: ffff89e6a21382b0 FS: 0000000000000000(0000) GS:ffff89ee5fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005b0f669 CR3: 0000000cb2410004 CR4: 00000000001706e0 Kernel panic - not syncing: Fatal exception Kernel Offset: 0x15200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception ]--- The crash happens on an attempt to iterate over attached pagecache pages and check the dirty flag: a dax inode's xarray contains pfn's instead of generic struct page pointers. This happens for DAX and not for other kinds of non-page entries in the inodes because it's a tagged iteration, and shadow/swap entries are never tagged; only DAX entries get tagged. Fix the problem by bailing out (with the false return value) of inode_prepare_sbs_switch() if a dax inode is passed. [[email protected]: changelog addition] Link: https://lkml.kernel.org/r/[email protected] Fixes: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes") Signed-off-by: Roman Gushchin <[email protected]> Reported-by: Murphy Zhou <[email protected]> Reported-by: Darrick J. Wong <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Tested-by: Murphy Zhou <[email protected]> Acked-by: Matthew Wilcox (Oracle) <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dave Chinner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Joseph Qi <[email protected]>

ANBZ: #689 reclaim_wmark() can even run after the corresponding memory cgroup is offline, so it's not safe to call css_get() which needs the caller must already have a reference of the memory cgroup. Or it may cause kernel panic as below: [32258.819208] BUG: kernel NULL pointer dereference, address: 0000000000000000 [32258.827789] #PF: supervisor write access in kernel mode [32258.834264] #PF: error_code(0x0002) - not-present page [32258.840731] PGD 1f02aa067 P4D 1f02aa067 PUD 104837067 PMD 0 [32258.847688] Oops: 0002 [#1] SMP NOPTI [32258.877042] Workqueue: memcg_wmark wmark_work_func [32258.883330] RIP: 0010:reclaim_wmark+0x119/0x130 [32258.912162] RSP: 0018:ffffc90031157e60 EFLAGS: 00010206 [32258.918863] RAX: 0000000000000000 RBX: ffff889efeab0000 RCX: 0000000000000017 [32258.927814] RDX: 0000000000198340 RSI: 0000000003a1e7da RDI: 0018ad6ea0a759c0 [32258.936760] RBP: 00000036006cc432 R08: 0000000000004b2c R09: ffff88a053da2768 [32258.945683] R10: ffff88a05c271410 R11: 000000000383db12 R12: 00001d204c3bef5f [32258.954617] R13: 0000000000000000 R14: ffff888102abc100 R15: 0000000000000000 [32258.963561] FS: 0000000000000000(0000) GS:ffff88fe7e5c0000(0000) knlGS:0000000000000000 [32258.973597] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [32258.980977] CR2: 0000000000000000 CR3: 000000011d0f6000 CR4: 00000000003506e0 [32258.989940] Call Trace: [32258.993603] wmark_work_func+0x22/0x30 [32258.998729] process_one_work+0x1aa/0x340 [32259.004158] worker_thread+0x1f3/0x300 [32259.009307] ? process_one_work+0x340/0x340 [32259.014936] kthread+0x118/0x130 [32259.019467] ? __kthread_bind_mask+0x60/0x60 [32259.025200] ret_from_fork+0x1f/0x30 Use css_tryget_online() to fix this issue. Fixs: '22b17b19ef20 ("ck: mm,memcg: record latency of memcg wmark reclaim")' Signed-off-by: Gang Deng <[email protected]> Reviewed-by: Xu Yu <[email protected]> Acked-by: Xunlei Pang <[email protected]>

casparant self-assigned this Jun 13, 2019

casparant added the doc-improvement Documentation Improvement label Jun 13, 2019

gopower mentioned this issue Mar 27, 2024

Alibaba Cloud Linux release 3无wireguard模块么？ #17

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please add "set the type of aliyun.qcow2 as virtio" #1

Please add "set the type of aliyun.qcow2 as virtio" #1

Vidos commented Jun 13, 2019

casparant commented Jun 13, 2019

Vidos commented Jun 13, 2019

casparant commented Jun 13, 2019

Please add "set the type of aliyun.qcow2 as virtio" #1

Please add "set the type of aliyun.qcow2 as virtio" #1

Comments

Vidos commented Jun 13, 2019

casparant commented Jun 13, 2019

Vidos commented Jun 13, 2019

casparant commented Jun 13, 2019