#23 [docs] add misc section about e1000e troubleshooting

Open
opened 3 years ago by swiftgeek · 2 comments

Example error, ¿may happen on weird and complex routing schemes(citation needed for cause):

e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang

Possible solution, tested by Nazara:

Disable C-states:

for i in /sys/devices/system/cpu/cpu/cpuidle/state/disable; do echo 1 > $i; done

Example error, ¿may happen on weird and complex routing schemes(citation needed for cause): e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang Possible solution, tested by Nazara: Disable C-states: for i in /sys/devices/system/cpu/cpu*/cpuidle/state*/disable; do echo 1 > $i; done
NRoach44 commented 3 years ago

Most easily reproduced by sending a large amount of any traffic cross subnets on the same interface

(e.g. 10.1.1.2 > (eth0) x200 router (eth0) > 10.1.2.2)

The relevant part of the logs:

Nov 03 04:56:27 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                                TDH                  <db>
                                TDT                  <25>
                                next_to_use          <25>
                                next_to_clean        <d8>
                              buffer_info[next_to_clean]:
                                time_stamp           <ffffdd7f>
                                next_to_watch        <db>
                                jiffies              <ffffde70>
                                next_to_watch.status <0>
                              MAC Status             <80283>
                              PHY Status             <796d>
                              PHY 1000BASE-T Status  <3800>
                              PHY Extended Status    <3000>
                              PCI Status             <10>
Nov 03 04:56:29 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                                TDH                  <db>
                                TDT                  <25>
                                next_to_use          <25>
                                next_to_clean        <d8>
                              buffer_info[next_to_clean]:
                                time_stamp           <ffffdd7f>
                                next_to_watch        <db>
                                jiffies              <ffffdf38>
                                next_to_watch.status <0>
                              MAC Status             <80283>
                              PHY Status             <796d>
                              PHY 1000BASE-T Status  <3800>
                              PHY Extended Status    <3000>
                              PCI Status             <10>
Nov 03 04:56:31 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                                TDH                  <db>
                                TDT                  <25>
                                next_to_use          <25>
                                next_to_clean        <d8>
                              buffer_info[next_to_clean]:
                                time_stamp           <ffffdd7f>
                                next_to_watch        <db>
                                jiffies              <ffffe000>
                                next_to_watch.status <0>
                              MAC Status             <80283>
                              PHY Status             <796d>
                              PHY 1000BASE-T Status  <3800>
                              PHY Extended Status    <3000>
                              PCI Status             <10>
Nov 03 04:56:33 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                                TDH                  <db>
                                TDT                  <25>
                                next_to_use          <25>
                                next_to_clean        <d8>
                              buffer_info[next_to_clean]:
                                time_stamp           <ffffdd7f>
                                next_to_watch        <db>
                                jiffies              <ffffe0c8>
                                next_to_watch.status <0>
                              MAC Status             <80283>
                              PHY Status             <796d>
                              PHY 1000BASE-T Status  <3800>
                              PHY Extended Status    <3000>
                              PCI Status             <10>
Nov 03 04:56:35 pluto kernel: ------------[ cut here ]------------
Nov 03 04:56:35 pluto kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x201/0x210
Nov 03 04:56:35 pluto kernel: NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0 timed out
Nov 03 04:56:35 pluto kernel: Modules linked in: tun nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 8021q garp mrp stp llc btusb btintel kvm_intel bluetooth ath9k ath9k_common kvm asix usbnet mii libphy irqbypass uas ath9k_hw ath e1000e
Nov 03 04:56:35 pluto kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I     4.8.4 #1
Nov 03 04:56:35 pluto kernel: Hardware name: LENOVO *INVALID*/*INVALID*, BIOS CBET4000 libreboot-r20150518fix-688-gbcf3ffd 01/25/2016
Nov 03 04:56:35 pluto kernel:  0000000000000000 ffffffff8133b1a9 ffff8801bfc03df8 0000000000000000
Nov 03 04:56:35 pluto kernel:  ffffffff8105d52e 0000000000000000 ffff8801bfc03e48 0000000000000001
Nov 03 04:56:35 pluto kernel:  0000000000000000 ffff8801b597c000 ffff8801bfc0ee68 ffffffff8105d59f
Nov 03 04:56:35 pluto kernel: Call Trace:
Nov 03 04:56:35 pluto kernel:  <IRQ>  [<ffffffff8133b1a9>] ? dump_stack+0x46/0x5d
Nov 03 04:56:35 pluto kernel:  [<ffffffff8105d52e>] ? __warn+0xbe/0xe0
Nov 03 04:56:35 pluto kernel:  [<ffffffff8105d59f>] ? warn_slowpath_fmt+0x4f/0x60
Nov 03 04:56:35 pluto kernel:  [<ffffffff81073b36>] ? __queue_work+0x136/0x460
Nov 03 04:56:35 pluto kernel:  [<ffffffff81660371>] ? dev_watchdog+0x201/0x210
Nov 03 04:56:35 pluto kernel:  [<ffffffff81660170>] ? qdisc_rcu_free+0x40/0x40
Nov 03 04:56:35 pluto kernel:  [<ffffffff810be040>] ? call_timer_fn+0x30/0x120
Nov 03 04:56:35 pluto kernel:  [<ffffffff810be719>] ? run_timer_softirq+0x1f9/0x460
Nov 03 04:56:35 pluto kernel:  [<ffffffff8101e191>] ? timer_interrupt+0x11/0x20
Nov 03 04:56:35 pluto kernel:  [<ffffffff817d8a32>] ? __do_softirq+0xf2/0x284
Nov 03 04:56:35 pluto kernel:  [<ffffffff810629c5>] ? irq_exit+0x95/0xa0
Nov 03 04:56:35 pluto kernel:  [<ffffffff817d878f>] ? do_IRQ+0x4f/0xd0
Nov 03 04:56:35 pluto kernel:  [<ffffffff817d6dbf>] ? common_interrupt+0x7f/0x7f
Nov 03 04:56:35 pluto kernel:  <EOI>  [<ffffffff815f5aaa>] ? cpuidle_enter_state+0x12a/0x2b0
Nov 03 04:56:35 pluto kernel:  [<ffffffff8109aa22>] ? cpu_startup_entry+0x282/0x310
Nov 03 04:56:35 pluto kernel:  [<ffffffff81d7be9e>] ? start_kernel+0x3f5/0x3fd
Nov 03 04:56:35 pluto kernel: ---[ end trace f68728a0d3053b54 ]---
Nov 03 04:56:35 pluto kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
Most easily reproduced by sending a large amount of any traffic cross subnets on the same interface (e.g. 10.1.1.2 > (eth0) x200 router (eth0) > 10.1.2.2) The relevant part of the logs: Nov 03 04:56:27 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <db> TDT <25> next_to_use <25> next_to_clean <d8> buffer_info[next_to_clean]: time_stamp <ffffdd7f> next_to_watch <db> jiffies <ffffde70> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Nov 03 04:56:29 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <db> TDT <25> next_to_use <25> next_to_clean <d8> buffer_info[next_to_clean]: time_stamp <ffffdd7f> next_to_watch <db> jiffies <ffffdf38> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Nov 03 04:56:31 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <db> TDT <25> next_to_use <25> next_to_clean <d8> buffer_info[next_to_clean]: time_stamp <ffffdd7f> next_to_watch <db> jiffies <ffffe000> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Nov 03 04:56:33 pluto kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <db> TDT <25> next_to_use <25> next_to_clean <d8> buffer_info[next_to_clean]: time_stamp <ffffdd7f> next_to_watch <db> jiffies <ffffe0c8> next_to_watch.status <0> MAC Status <80283> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> Nov 03 04:56:35 pluto kernel: ------------[ cut here ]------------ Nov 03 04:56:35 pluto kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316 dev_watchdog+0x201/0x210 Nov 03 04:56:35 pluto kernel: NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0 timed out Nov 03 04:56:35 pluto kernel: Modules linked in: tun nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 8021q garp mrp stp llc btusb btintel kvm_intel bluetooth ath9k ath9k_common kvm asix usbnet mii libphy irqbypass uas ath9k_hw ath e1000e Nov 03 04:56:35 pluto kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G I 4.8.4 #1 Nov 03 04:56:35 pluto kernel: Hardware name: LENOVO *INVALID*/*INVALID*, BIOS CBET4000 libreboot-r20150518fix-688-gbcf3ffd 01/25/2016 Nov 03 04:56:35 pluto kernel: 0000000000000000 ffffffff8133b1a9 ffff8801bfc03df8 0000000000000000 Nov 03 04:56:35 pluto kernel: ffffffff8105d52e 0000000000000000 ffff8801bfc03e48 0000000000000001 Nov 03 04:56:35 pluto kernel: 0000000000000000 ffff8801b597c000 ffff8801bfc0ee68 ffffffff8105d59f Nov 03 04:56:35 pluto kernel: Call Trace: Nov 03 04:56:35 pluto kernel: <IRQ> [<ffffffff8133b1a9>] ? dump_stack+0x46/0x5d Nov 03 04:56:35 pluto kernel: [<ffffffff8105d52e>] ? __warn+0xbe/0xe0 Nov 03 04:56:35 pluto kernel: [<ffffffff8105d59f>] ? warn_slowpath_fmt+0x4f/0x60 Nov 03 04:56:35 pluto kernel: [<ffffffff81073b36>] ? __queue_work+0x136/0x460 Nov 03 04:56:35 pluto kernel: [<ffffffff81660371>] ? dev_watchdog+0x201/0x210 Nov 03 04:56:35 pluto kernel: [<ffffffff81660170>] ? qdisc_rcu_free+0x40/0x40 Nov 03 04:56:35 pluto kernel: [<ffffffff810be040>] ? call_timer_fn+0x30/0x120 Nov 03 04:56:35 pluto kernel: [<ffffffff810be719>] ? run_timer_softirq+0x1f9/0x460 Nov 03 04:56:35 pluto kernel: [<ffffffff8101e191>] ? timer_interrupt+0x11/0x20 Nov 03 04:56:35 pluto kernel: [<ffffffff817d8a32>] ? __do_softirq+0xf2/0x284 Nov 03 04:56:35 pluto kernel: [<ffffffff810629c5>] ? irq_exit+0x95/0xa0 Nov 03 04:56:35 pluto kernel: [<ffffffff817d878f>] ? do_IRQ+0x4f/0xd0 Nov 03 04:56:35 pluto kernel: [<ffffffff817d6dbf>] ? common_interrupt+0x7f/0x7f Nov 03 04:56:35 pluto kernel: <EOI> [<ffffffff815f5aaa>] ? cpuidle_enter_state+0x12a/0x2b0 Nov 03 04:56:35 pluto kernel: [<ffffffff8109aa22>] ? cpu_startup_entry+0x282/0x310 Nov 03 04:56:35 pluto kernel: [<ffffffff81d7be9e>] ? start_kernel+0x3f5/0x3fd Nov 03 04:56:35 pluto kernel: ---[ end trace f68728a0d3053b54 ]--- Nov 03 04:56:35 pluto kernel: e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
Leah Rowe commented 3 years ago
Owner

swiftgeek, can you submit a patch adding thisn to the docs

swiftgeek, can you submit a patch adding thisn to the docs
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.