#66 T400 (maybe X200): GbE NIC driver e1000e failing to load when using libgfxinit with legacy vga text mode startup on Trisquel 9

Closed
opened 3 years ago by vimuser · 8 comments

same descriptor+GbE each time. The default descriptor+GbE setup used, as generated by ich9gen in Libreboot 20160907, or in the latest Retroboot version which produces the exact same checksum for this descriptor+GbE file.

UPDATE: more testing now performed. see posts below

same descriptor+GbE each time. The default descriptor+GbE setup used, as generated by `ich9gen` in Libreboot 20160907, or in the latest Retroboot version which produces the exact same checksum for this descriptor+GbE file. UPDATE: more testing now performed. see posts below
Leah Rowe commented 3 years ago
Owner

error:

Jan 17 14:20:44 user-ThinkPad-T400 kernel: [    1.923217] e1000e 0000:00:19.0: BAR 0: can't reserve [mem 0x000a0000-0x000bffff]
Jan 17 14:20:44 user-ThinkPad-T400 kernel: [    1.923331] e1000e: probe of 0000:00:19.0 failed with error -16

If I rmmod and then modprobe the e1000e kernel module, the NIC works again.

If I test on Libreboot 20160907, this issue never appears. Not yet sure whether this is a coreboot bug, or a linux bug, or both.

It appears that initramfs in Trisquel is trying to load e1000e too early, from dmesg logs. It should be loading i915 video driver first!

However, Libreboot 20160907 has no issues.

error: ``` Jan 17 14:20:44 user-ThinkPad-T400 kernel: [ 1.923217] e1000e 0000:00:19.0: BAR 0: can't reserve [mem 0x000a0000-0x000bffff] Jan 17 14:20:44 user-ThinkPad-T400 kernel: [ 1.923331] e1000e: probe of 0000:00:19.0 failed with error -16 ``` If I rmmod and then modprobe the `e1000e` kernel module, the NIC works again. If I test on Libreboot 20160907, this issue never appears. Not yet sure whether this is a coreboot bug, or a linux bug, or both. It appears that initramfs in Trisquel is trying to load e1000e too early, from dmesg logs. It should be loading i915 video driver first! However, Libreboot 20160907 has no issues.
Leah Rowe commented 3 years ago
Owner

When it works, this is what happens:

Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.584854] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.584856] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.585260] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.758632] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:f5:f0:40:71:fe
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.758639] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.759833] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 8, PBA No: 1008FF-0FF
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.768506] e1000e 0000:00:19.0 enp0s25: renamed from eth0
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.800293] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready
Jan 17 13:57:42 user-ThinkPad-T400 kernel: [  527.984473] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready
Jan 17 13:57:47 user-ThinkPad-T400 kernel: [  532.033042] e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 17 13:57:47 user-ThinkPad-T400 kernel: [  532.033223] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s25: link becomes ready
Jan 17 13:57:52 user-ThinkPad-T400 kernel: [  537.346681] IPv6: enp0s25: IPv6 duplicate address fd65:55cd:d051:4::3f2 used by 88:25:2c:1d:05:06 detected!
Jan 17 13:57:52 user-ThinkPad-T400 kernel: [  537.602775] IPv6: enp0s25: IPv6 duplicate address fd65:55cd:d051::3f2 used by 88:25:2c:1d:05:06 detected!
Jan 17 13:58:49 user-ThinkPad-T400 kernel: [  594.116169] usb 4-2: USB disconnect, device number 3

Coreboot revision used here is: ccceb2250eeb820fccfb62d1f3ab407582d2e79f

linux kernel version: 4.15.0-121-generic from Trisquel 9 (linux-libre)

When it works, this is what happens: ``` Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.584854] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.584856] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.585260] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.758632] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) 00:f5:f0:40:71:fe Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.758639] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.759833] e1000e 0000:00:19.0 eth0: MAC: 7, PHY: 8, PBA No: 1008FF-0FF Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.768506] e1000e 0000:00:19.0 enp0s25: renamed from eth0 Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.800293] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready Jan 17 13:57:42 user-ThinkPad-T400 kernel: [ 527.984473] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready Jan 17 13:57:47 user-ThinkPad-T400 kernel: [ 532.033042] e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Jan 17 13:57:47 user-ThinkPad-T400 kernel: [ 532.033223] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s25: link becomes ready Jan 17 13:57:52 user-ThinkPad-T400 kernel: [ 537.346681] IPv6: enp0s25: IPv6 duplicate address fd65:55cd:d051:4::3f2 used by 88:25:2c:1d:05:06 detected! Jan 17 13:57:52 user-ThinkPad-T400 kernel: [ 537.602775] IPv6: enp0s25: IPv6 duplicate address fd65:55cd:d051::3f2 used by 88:25:2c:1d:05:06 detected! Jan 17 13:58:49 user-ThinkPad-T400 kernel: [ 594.116169] usb 4-2: USB disconnect, device number 3 ``` Coreboot revision used here is: `ccceb2250eeb820fccfb62d1f3ab407582d2e79f` linux kernel version: `4.15.0-121-generic` from Trisquel 9 (linux-libre)
Leah Rowe commented 3 years ago
Owner

One more thing: the default descriptor+GbE in retroboot is unchanged since Libreboot 20160907 on ICH9M laptops.

Descriptor+GbE with default MAC address has the same checksum (sh512sum):

68992c19faaedbde5eab2ea6ae6a13786390a84e682bb05bf3196817cc7cfde318b98ae2da1661a51e8eee30577f18e1492f98f5d92492515ea0d8862d720606

One more thing: the default descriptor+GbE in retroboot is *unchanged* since Libreboot 20160907 on ICH9M laptops. Descriptor+GbE with default MAC address has the same checksum (sh512sum): `68992c19faaedbde5eab2ea6ae6a13786390a84e682bb05bf3196817cc7cfde318b98ae2da1661a51e8eee30577f18e1492f98f5d92492515ea0d8862d720606`
Leah Rowe commented 3 years ago
Owner

Regardless of whether the GbE NIC works, WiFi works just fine.

Regardless of whether the GbE NIC works, WiFi works just fine.
Leah Rowe commented 3 years ago
Owner
15:07 <swiftgeek> leah: i would enable console at spew level in coreboot, (conform it works), and then see if it happens there as well
15:07 <swiftgeek> *confirm
15:10 <swiftgeek> also linux kernel could do strange thing as well
15:11 <swiftgeek> hmm that should be easy
15:12 <swiftgeek> leah: blacklist e1000e
15:12 <swiftgeek> then load manually
15:12 <swiftgeek> if that works fine, then issue is entirely contained in this message > BAR 0: can't reserve [mem 0x000a0000-0x000bffff]
15:17 <swiftgeek> leah: google says it could be libgfxinit related, have you tried with intel vbios?
``` 15:07 <swiftgeek> leah: i would enable console at spew level in coreboot, (conform it works), and then see if it happens there as well 15:07 <swiftgeek> *confirm 15:10 <swiftgeek> also linux kernel could do strange thing as well 15:11 <swiftgeek> hmm that should be easy 15:12 <swiftgeek> leah: blacklist e1000e 15:12 <swiftgeek> then load manually 15:12 <swiftgeek> if that works fine, then issue is entirely contained in this message > BAR 0: can't reserve [mem 0x000a0000-0x000bffff] 15:17 <swiftgeek> leah: google says it could be libgfxinit related, have you tried with intel vbios? ```
Leah Rowe commented 3 years ago
Owner

OK, I've been talking to swiftgeek on IRC. First, some notes from him:

<swiftgeek> https://wiki.osdev.org/VGA_Hardware#Getting_started
<swiftgeek> > MMIO: The VGA uses uncached byte accesses to 0xA0000-0xBFFFF. In several cases, larger writes are also allowed.
<swiftgeek> and TL;DR, we see a linux doing a funny thing
<swiftgeek> putting BAR in VGA

Now, some dmesg logs.

Random X200 log from vendor firmware, found on the internet: https://paste.debian.net/plainh/e794e729

Now my logs, that I made myself:

with linux kernel version: 4.15.0-121-generic from Trisquel 9 (linux-libre)

NOTE: For Lenovo T400, Libreboot 20160907 is using a much older coreboot revision; specifically, git commit d83b0e9ac4174cca92ac2c3b83a7e8491a9a1ff4

Libreboot 20160907, native video init (pre-libgfxinit), legagy vga text mode startup (Intel NIC works perfectly; see entries for e1000e): https://paste.debian.net/plainh/2631b6fe

Libreboot 20160907, native video init (pre-libgfxinit), high-res vesafb startup (Intel NIC works perfectly; see entries for e1000e): https://paste.debian.net/plainh/e65b3d53

coreboot ccceb2250eeb820fccfb62d1f3ab407582d2e79f, libgfxinit, high-res vesafb startup (Intel NIC works perfectly; see entries for e1000e): https://paste.debian.net/plainh/7f9e389b

coreboot ccceb2250eeb820fccfb62d1f3ab407582d2e79f, libgfxinit, legacy vga text mode startup (Intel NIC doesn't work, see log entries related to e1000e and address 000a0000-000bffff which is meant for VGA; linux kernel seems to be trying to load e1000e Intel GbE NIC driver in that space): https://paste.debian.net/plainh/5b79de2e

Now, the final log:

In this final log, I blacklisted e1000e which was being loaded in initramfs. Now e1000e isn't being loaded at all at boot. I then ran sudo modprobe e1000e manually, after boot up at Trisquel 9 desktop, and at that point the GbE NIC worked perfectly: https://paste.debian.net/plainh/86547545

(for that last log, I did dmesg after manually loading e1000e)

OK, I've been talking to swiftgeek on IRC. First, some notes from him: ``` <swiftgeek> https://wiki.osdev.org/VGA_Hardware#Getting_started <swiftgeek> > MMIO: The VGA uses uncached byte accesses to 0xA0000-0xBFFFF. In several cases, larger writes are also allowed. <swiftgeek> and TL;DR, we see a linux doing a funny thing <swiftgeek> putting BAR in VGA ``` Now, some dmesg logs. Random X200 log from vendor firmware, found on the internet: <https://paste.debian.net/plainh/e794e729> Now my logs, that I made myself: with linux kernel version: 4.15.0-121-generic from Trisquel 9 (linux-libre) NOTE: For Lenovo T400, Libreboot 20160907 is using a much older coreboot revision; specifically, git commit `d83b0e9ac4174cca92ac2c3b83a7e8491a9a1ff4` Libreboot 20160907, native video init (pre-libgfxinit), legagy vga text mode startup (Intel NIC works perfectly; see entries for `e1000e`): <https://paste.debian.net/plainh/2631b6fe> Libreboot 20160907, native video init (pre-libgfxinit), high-res vesafb startup (Intel NIC works perfectly; see entries for `e1000e`): <https://paste.debian.net/plainh/e65b3d53> coreboot `ccceb2250eeb820fccfb62d1f3ab407582d2e79f`, libgfxinit, high-res vesafb startup (Intel NIC works perfectly; see entries for `e1000e`): <https://paste.debian.net/plainh/7f9e389b> coreboot `ccceb2250eeb820fccfb62d1f3ab407582d2e79f`, libgfxinit, legacy vga text mode startup (Intel NIC doesn't work, see log entries related to `e1000e` and address `000a0000-000bffff` which is meant for VGA; linux kernel seems to be trying to load `e1000e` Intel GbE NIC driver in that space): <https://paste.debian.net/plainh/5b79de2e> Now, the final log: In this final log, I blacklisted `e1000e` which was being loaded in initramfs. Now `e1000e` isn't being loaded at all at boot. I then ran `sudo modprobe e1000e` manually, after boot up at Trisquel 9 desktop, and at that point the GbE NIC worked perfectly: <https://paste.debian.net/plainh/86547545> (for that last log, I did dmesg after manually loading `e1000e`)
Leah Rowe commented 3 years ago
Owner

According to icon, it's because of coreboot's new resource allocation logic. Basically, coreboot wasn't properly reserving lower memory on gm45 for VGA MMIO so this will affect x200 t500 etc aswell. Because it wasn't reserved, the resource allocator logic in coreboot was making use of that space, leading to very strange behaviour.

He wrote this patch for me to test which should hopefully fix it: https://review.coreboot.org/c/coreboot/+/49603

I am going to test it, and I will report back here.

According to icon, it's because of coreboot's new resource allocation logic. Basically, coreboot wasn't properly reserving lower memory on gm45 for VGA MMIO so this will affect x200 t500 etc aswell. Because it wasn't reserved, the resource allocator logic in coreboot was making use of that space, leading to very strange behaviour. He wrote this patch for me to test which should hopefully fix it: <https://review.coreboot.org/c/coreboot/+/49603> I am going to test it, and I will report back here.
Leah Rowe commented 3 years ago
Owner

That patch works.

dmesg log with txtmode setup: https://paste.debian.net/plainh/2295cbca

dmesg log with vesafb setup: https://paste.debian.net/plainh/9af9d6a6

That patch works. dmesg log with txtmode setup: <https://paste.debian.net/plainh/2295cbca> dmesg log with vesafb setup: <https://paste.debian.net/plainh/9af9d6a6>
Sign in to join this conversation.
No Label
No Milestone
No assignee
1 Participants
Loading...
Cancel
Save
There is no content yet.