#3 [FIXED; it is MITIGATED, but keeping issue open for now until SOLVED; see comment] the thinkpad X200/T400 fail to reboot with Libreboot 20210522 (normal boot works fine)

Closed
opened 3 years ago by weimzh · 17 comments

Rebooting does not work after flashing Libreboot 20210522. Just stuck at black screen.

OS is Ubuntu 18.04 but that seems irrelevant (just select reboot in the initial grub menu and the problem also exists).

Both GRUB and SeaBIOS ROMs have the issue.

OSBoot 20210205 previously works well with the same laptop.

Video: https://www.youtube.com/watch?v=IV0SLRCyY9U

Rebooting does not work after flashing Libreboot 20210522. Just stuck at black screen. OS is Ubuntu 18.04 but that seems irrelevant (just select reboot in the initial grub menu and the problem also exists). Both GRUB and SeaBIOS ROMs have the issue. OSBoot 20210205 previously works well with the same laptop. Video: https://www.youtube.com/watch?v=IV0SLRCyY9U
Leah Rowe commented 3 years ago
Owner

Try doing a git bisect.

coreboot 4.13 is what osboot master currently uses.

libreboot 20210522 uses coreboot 4.14

so: try coreboot master (from git)

if that doesn't work with reboot, then try coreboot 4.13. if that works, great!

now do a git bisect. using git bisect, you can find which commit broke rebooting. see:

https://git-scm.com/docs/git-bisect

If you find anything, tell me! I can try this on a T400 myself. It's probably a single commit in coreboot that broke this (regression), and probably a simple fix.

Try doing a git bisect. coreboot 4.13 is what osboot master currently uses. libreboot 20210522 uses coreboot 4.14 so: try coreboot master (from git) if that doesn't work with reboot, then try coreboot 4.13. if that works, great! now do a git bisect. using git bisect, you can find which commit broke rebooting. see: https://git-scm.com/docs/git-bisect If you find anything, tell me! I can try this on a T400 myself. It's probably a single commit in coreboot that broke this (regression), and probably a simple fix.
Wei Mingzhi commented 3 years ago
Poster

for revision ccceb2250e (which is the same revision as osboot 20210205), if I enable microcode updates then it works. Disabling microcode updates will result in same issue

for revision ccceb2250eeb820fccfb62d1f3ab407582d2e79f (which is the same revision as osboot 20210205), if I enable microcode updates then it works. Disabling microcode updates will result in same issue
Wei Mingzhi commented 3 years ago
Poster

coreboot master without any patches = same issue: enable microcode updates = working, disable microcode updates = cannot reboot.

libreboot 20160907 does not have this problem on the same laptop

CPU is Core 2 Duo P8700

coreboot master without any patches = same issue: enable microcode updates = working, disable microcode updates = cannot reboot. libreboot 20160907 does not have this problem on the same laptop CPU is Core 2 Duo P8700
Leah Rowe commented 3 years ago
Owner

I changed the title to X200/T400, where previously you wrote only T400.

See: #11

it happens on X200 aswell. This seems to be on any gm45 machine.

Microcode or no-microcode shouldn't be a problem, because it previously worked with or without microcode. In December 2020 I created the tree in osboot that you used, based on coreboot 4.13.

So I know coreboot 4.13 works nicely, but coreboot 4.14 doesn't. Can you try to bisect? Don't worry if you can't, I'll do it myself when I have time. Now that I know it happens on all gm45 machines, I'll just grab random ones from my pile.

Specifically: git bisect on coreboot 4.13-4.14, with the assumption that 4.14 has b0rked reboot and that 4.13 works perfectly. Coreboot has been rewriting and re-factoring a few things, so it'll probably be a single commit somewhere that just broke something, and likely a trivial fix.

EDIT: I also edited the title to clarify that normal booting does work, regardless, and that it's only reboot which is broken.

I changed the title to X200/T400, where previously you wrote only T400. See: https://notabug.org/libreboot/lbmk/issues/11 it happens on X200 aswell. This seems to be on any gm45 machine. Microcode or no-microcode shouldn't be a problem, because it previously worked with or without microcode. In December 2020 I created the tree in osboot that you used, based on coreboot 4.13. So I know coreboot 4.13 works nicely, but coreboot 4.14 doesn't. Can you try to bisect? Don't worry if you can't, I'll do it myself when I have time. Now that I know it happens on all gm45 machines, I'll just grab random ones from my pile. Specifically: git bisect on coreboot 4.13-4.14, with the assumption that 4.14 has b0rked reboot and that 4.13 works perfectly. Coreboot has been rewriting and re-factoring a few things, so it'll probably be a single commit somewhere that just broke something, and likely a trivial fix. EDIT: I also edited the title to clarify that normal booting does work, regardless, and that it's only reboot which is broken.
Wei Mingzhi commented 3 years ago
Poster

Microcode or no-microcode shouldn't be a problem, because it previously worked with or without microcode

I already run git bisect with no result (always not working), but later I found coreboot 4.13 (more exactly, the same revision as osboot) also has the same issue if I don't include the microcode updates.

If I include the microcode updates then both the osboot revision and current coreboot master revision work well.

> Microcode or no-microcode shouldn't be a problem, because it previously worked with or without microcode I already run git bisect with no result (always not working), but later I found coreboot 4.13 (more exactly, the same revision as osboot) also has the same issue if I don't include the microcode updates. If I include the microcode updates then both the osboot revision and current coreboot master revision work well.
Leah Rowe commented 3 years ago
Owner

oh, so i was mistaken then. yes, i neglected to consider that the osboot rom you used had microcode in it already.

well, can you bisect coreboot revisions between libreboot 20160907 and libreboot 20210522 then?

it's thousands of commits, but with bisect the maximum number of builds is probably like 30 or less, that you have to go through. it's a lot to ask though. i'll do it myself otherwise, when i have time

staying on libreboot 20160907 for the time being is an acceptable solution, in my opinion, until this can get fixed in an upcoming stable release. clearly there is a regression in coreboot, but it's been 5 years. a bisect can reveal when this problem started happening, though now it's between libreboot 2016 and 2021.

regarding 20160907, i think libreboot was using coreboot 4.4? maybe 4.5.

oh, so i was mistaken then. yes, i neglected to consider that the osboot rom you used had microcode in it already. well, can you bisect coreboot revisions between libreboot 20160907 and libreboot 20210522 then? it's thousands of commits, but with bisect the maximum number of builds is probably like 30 or less, that you have to go through. it's a lot to ask though. i'll do it myself otherwise, when i have time staying on libreboot 20160907 for the time being is an acceptable solution, in my opinion, until this can get fixed in an upcoming stable release. clearly there is a regression in coreboot, but it's been 5 years. a bisect can reveal when this problem started happening, though now it's between libreboot 2016 and 2021. regarding 20160907, i think libreboot was using coreboot 4.4? maybe 4.5.
Leah Rowe commented 3 years ago
Owner

belgin on IRC discovered that commit df7aecd926 in coreboot caused introduced this issue. here's a view on coreboot's git site:

https://review.coreboot.org/plugins/gitiles/coreboot/+/df7aecd92643d207feaf7fd840f8835097346644

idea from qeeg: only write to smrr if lock bit isn't already set. coreboot doesn't seem to be checking for that, which could lead to a general protection fault, which might be what is happening

i've asked belgin to grab detailed coreboot logs, with and without that patch

belgin on IRC discovered that commit df7aecd92643d207feaf7fd840f8835097346644 in coreboot caused introduced this issue. here's a view on coreboot's git site: https://review.coreboot.org/plugins/gitiles/coreboot/+/df7aecd92643d207feaf7fd840f8835097346644 idea from qeeg: only write to smrr if lock bit isn't already set. coreboot doesn't seem to be checking for that, which could lead to a general protection fault, which might be what is happening i've asked belgin to grab detailed coreboot logs, with and without that patch
Leah Rowe commented 3 years ago
Owner

qeeg on irc wrote this patch:

https://review.coreboot.org/c/coreboot/+/57089

it didn't fix the issue, but something similar might be what coreboot needs

qeeg on irc wrote this patch: https://review.coreboot.org/c/coreboot/+/57089 it didn't fix the issue, but something similar might be what coreboot needs
Leah Rowe commented 3 years ago
Owner

I'm actually leaving this issue open, BUT:

reverting the offending commit in coreboot does fix rebooting, so this fixes the issue defined here for libreboot at least

however, simply reverting it might not be the most technically correct solution. i think a report should be made upstream, to the coreboot project, and see what they say

I'm actually leaving this issue open, BUT: reverting the offending commit in coreboot does fix rebooting, so this fixes the issue defined here for libreboot at least however, simply reverting it might not be the most technically correct solution. i think a report should be made upstream, to the coreboot project, and see what they say
Leah Rowe commented 3 years ago
Owner

TODO: i'm waiting for belgin's logs (with and without the coreboot commit that broke rebooting)

TODO: i'm waiting for belgin's logs (with and without the coreboot commit that broke rebooting)
Leah Rowe commented 3 years ago
Owner

reverting the coreboot commit fixes reboot anyway. see:

4b7be66596

i could close this issue, but again i want to exhaust it first. waiting for belgin's logs and then i'll file a report on coreboot.org

reverting the coreboot commit fixes reboot anyway. see: https://notabug.org/libreboot/lbmk/commit/4b7be665968b67463ec36b9afc7e8736be0c9b51 i could close this issue, but again i want to exhaust it first. waiting for belgin's logs and then i'll file a report on coreboot.org
Ghost commented 3 years ago

Here are the logs from the commit right before rebooting broke and the commit that broke rebooting, on an x200. Gathered with cbmem -c.

For some reason I can't upload txt files here, so here are links to them on my website.

https://belgin.ro/0_working_845a96dfd601c08f2d5c2cf362a7021f912c4857.txt

https://belgin.ro/1_broken_df7aecd92643d207feaf7fd840f8835097346644.txt

Here are the logs from the commit right before rebooting broke and the commit that broke rebooting, on an x200. Gathered with cbmem -c. For some reason I can't upload txt files here, so here are links to them on my website. https://belgin.ro/0_working_845a96dfd601c08f2d5c2cf362a7021f912c4857.txt https://belgin.ro/1_broken_df7aecd92643d207feaf7fd840f8835097346644.txt
Leah Rowe commented 3 years ago
Owner

https://review.coreboot.org/plugins/gitiles/coreboot/+/667108199a04972ebd83c8a0430f2dddd6009879 is the upstream fix. i'm inclined to think that the "bug" in coreboot isn't a bug, but that coreboot is correct

no-microcode is an "error state", and everything works upstream when you include microcode updates. so if something buggy occurs without microcode updates, but doesn't with them, then i consider it not a bug. we were just lucky before, and coreboot's fix a few years ago exposed a bug that was there all along

the revert patch will simply be maintained in lbmk. we can maintain it for libreboot indefinitely. not having microcode updates is technically wrong, but it's libreboot policy to exclude them anyway, so we do what we must to get things working nicely

i could close this issue now if i want, but i'll wait first. i want to see what coreboot thinks

https://review.coreboot.org/plugins/gitiles/coreboot/+/667108199a04972ebd83c8a0430f2dddd6009879 is the upstream fix. i'm inclined to think that the "bug" in coreboot isn't a bug, but that coreboot is correct no-microcode is an "error state", and everything works upstream when you include microcode updates. so if something buggy occurs without microcode updates, but doesn't with them, then i consider it not a bug. we were just lucky before, and coreboot's fix a few years ago exposed a bug that was there all along the revert patch will simply be maintained in lbmk. we can maintain it for libreboot indefinitely. not having microcode updates is technically wrong, but it's libreboot policy to exclude them anyway, so we do what we must to get things working nicely i could close this issue now if i want, but i'll wait first. i want to see what coreboot thinks
Leah Rowe commented 3 years ago
Owner

4b7be66596 and 777316eb4f in libreboot are the two commits that revert what coreboot did, meaning reboot now works again in libreboot

i think we're good now

https://notabug.org/libreboot/lbmk/commit/4b7be665968b67463ec36b9afc7e8736be0c9b51 and https://notabug.org/libreboot/lbmk/commit/777316eb4f836563ce0e4e6f9dd2fca4312e8ac1 in libreboot are the two commits that revert what coreboot did, meaning reboot now works again in libreboot i think we're good now
Leah Rowe commented 3 years ago
Owner

by the way:

belgin and expert975 both confirm that the revert patch in libreboot fixes reboots (in no-microcode setups). if the same revert is applied upstream, on coreboot master, reboot is fixed there too

by the way: belgin and expert975 both confirm that the revert patch in libreboot fixes reboots (in no-microcode setups). if the same revert is applied upstream, on coreboot master, reboot is fixed there too

I wrote it on lbmk irc, but with only the answer the bug is fixed with a reference to belgin and expert975. But in my case, the bug is still alive.

My testdevie is a r400 with a fresh built libreboot image from the git repository on 3 september 2021 (grub_r400_4mb_libgfxinit_corebootfb_deqwertz.rom). The local git repo is up to date.

Here are the logs and my images.

I've tested multiple operation systems with the same result.

Edit: I can confirm, that it isn't a device problem. I've tested one more different R400 device with the same result.

I wrote it on lbmk irc, but with only the answer the bug is fixed with a reference to belgin and expert975. But in my case, the bug is still alive. My testdevie is a r400 with a fresh built libreboot image from the git repository on 3 september 2021 (grub_r400_4mb_libgfxinit_corebootfb_deqwertz.rom). The local git repo is up to date. Here are the [logs](https://hatebin.com/lowotmgdoi) and my [images](https://filebin.net/otngwl7p8wliu3pc). I've tested multiple operation systems with the same result. Edit: I can confirm, that it isn't a device problem. I've tested one more different R400 device with the same result.
Leah Rowe commented 3 years ago
Owner

closing. i left this open while testing the fix

it's all good

closing. i left this open while testing the fix it's all good
Sign in to join this conversation.
No Label
No Milestone
No assignee
4 Participants
Loading...
Cancel
Save
There is no content yet.