#289 Libreboot X200 freezing up

Closed
opened 1 year ago by plantroon · 21 comments
plantroon commented 1 year ago

My Libreboot X200 freezes up when I use it for long periods. This did not happen before Libreboot. I have no way of getting any logs, it just freezes up. Sometimes the Caps Lock LED flashes, which I presume is a kernel panic, that I could catch somehow. I have to restart the laptop every 3 days just to keep going, I call this unusable because it happens in the most unfortunate circumstances.

My Libreboot X200 freezes up when I use it for long periods. This did not happen before Libreboot. I have no way of getting any logs, it just freezes up. Sometimes the Caps Lock LED flashes, which I presume is a kernel panic, that I could catch somehow. I have to restart the laptop every 3 days just to keep going, I call this unusable because it happens in the most unfortunate circumstances.
Swift Geek commented 1 year ago
Collaborator

Are you using suspend? Are you using newest version of libreboot there? What kernel version?

Are you using suspend? Are you using newest version of libreboot there? What kernel version?
plantroon commented 1 year ago
Poster

I am using the newest version (20160907) and I updated the EC firmware prior to flashing.

Happens on any OS (including BSDs), any kernel version, it is libreboot-specific as it did not happen before.

My primary OS is Parabola. I am using suspend, the laptop is in suspend mode most of the time. That's how I imagine any computer should work. Reboots should be only for important kernel updates. Rebooting is unacceptable in 2017.

Basically I am just asking if this is a known problem or if others are experiencing it.

I am using the newest version (20160907) and I updated the EC firmware prior to flashing. Happens on any OS (including BSDs), any kernel version, it is libreboot-specific as it did not happen before. My primary OS is Parabola. I am using suspend, the laptop is in suspend mode most of the time. That's how I imagine any computer should work. Reboots should be only for important kernel updates. Rebooting is unacceptable in 2017. Basically I am just asking if this is a known problem or if others are experiencing it.

I'm pretty sure it's a known problem. My X200 also has sporadic crashes, though their frequency has decreased dramatically after re-flashing the newest version of libreboot. It used to hang as often as you report; after reflashing it hangs about once every couple of weeks.

I urge you to download a fresh copy and re-flash.

Can you get any other details gathered? Microcode? Type and capacity of RAM?

I'm pretty sure it's a known problem. My X200 also has sporadic crashes, though their frequency has decreased _dramatically_ after re-flashing the newest version of libreboot. It used to hang as often as you report; after reflashing it hangs about once every couple of weeks. I urge you to download a fresh copy and re-flash. Can you get any other details gathered? [Microcode](https://libreboot.org/docs/hardware/x200.html#ram_s3_microcode)? Type and capacity of RAM?
Swift Geek commented 1 year ago
Collaborator

Suspend does not work yet for x200, use hibernate if you need similar functionality which is also more secure

Suspend does not work yet for x200, use hibernate if you need similar functionality which is also more secure
plantroon commented 1 year ago
Poster

So even without suspending for 2 weeks and swapping RAM sticks, it still managed to freeze up on the 3rd or 4th day of the power cycle.

I have 2 Samsung RAM sticks of the same kind: M471B5673EH1-CF8

One more interesting questions, do you use the graphical version of Grub payload or the text mode? I am asking this because I think this has to do with graphics.

So even without suspending for 2 weeks and swapping RAM sticks, it still managed to freeze up on the 3rd or 4th day of the power cycle. I have 2 Samsung RAM sticks of the same kind: M471B5673EH1-CF8 One more interesting questions, do you use the graphical version of Grub payload or the text mode? I am asking this because I think this has to do with graphics.
plantroon commented 1 year ago
Poster

Another update. The laptop went fine for 10 days (that's a record uptime on Libreboot for me). I noticed that this might be related to RAM speeds somehow, as I notice a huge slowdown of graphics on Libreboot as well. I do not know what triggers it. For example, glxgears then goes like 20 FPS. Anyone having these problems on an X200?

Another update. The laptop went fine for 10 days (that's a record uptime on Libreboot for me). I noticed that this might be related to RAM speeds somehow, as I notice a huge slowdown of graphics on Libreboot as well. I do not know what triggers it. For example, glxgears then goes like 20 FPS. Anyone having these problems on an X200?
Swift Geek commented 1 year ago
Collaborator

Have your ran memtest? (Not including broken test 7, block move)

Graphics issues are related to your distro and chosen drivers

Freeze and kernel panic are very distinct failures here so pay attention to leds that indicate it.

Have your ran memtest? (Not including broken test 7, block move) Graphics issues are related to your distro and chosen drivers Freeze and kernel panic are very distinct failures here so pay attention to leds that indicate it.
plantroon commented 1 year ago
Poster

Is there any way to run memtest on the vesafb version of Libreboot? I also have a gut feeling that using txtmode will make these problems go away.

Is there any way to run memtest on the vesafb version of Libreboot? I also have a gut feeling that using txtmode will make these problems go away.
thum commented 1 year ago

While I haven't been looking into this deeper than just to observe behavior, my observations with a handful of X200 laptops on GNU/Linux are that using txtmode won't help with the "sporadic crashes". Neither will swapping RAM, using less RAM, using coreboot or using the latest stock boot firmware (!). It also seems to be independent of vram size.

Every configuration I've tested crashes sporadically. Sometimes four times a day, sometimes only once in two weeks. This happens when idle, though it's more likely that it will occur when watching a movie (e.g. with mpv, vlc) or under full load.

Again, those are my observations, yours may vary.

glxgears will likely run at a similar fps rate than your display refresh rate. Use $ vblank_mode=0 glxgears and you should get a lot more fps. vblank_mode tells the graphic card to ignore the refresh rate of your monitor.

While I haven't been looking into this deeper than just to observe behavior, my observations with a handful of X200 laptops on GNU/Linux are that using txtmode won't help with the "sporadic crashes". Neither will swapping RAM, using less RAM, using coreboot or using the latest stock boot firmware (!). It also seems to be independent of vram size. Every configuration I've tested crashes sporadically. Sometimes four times a day, sometimes only once in two weeks. This happens when idle, though it's more likely that it will occur when watching a movie (e.g. with mpv, vlc) or under full load. Again, those are my observations, yours may vary. glxgears will likely run at a similar fps rate than your display refresh rate. Use $ vblank_mode=0 glxgears and you should get a lot more fps. vblank_mode tells the graphic card to ignore the refresh rate of your monitor.
plantroon commented 1 year ago
Poster

@thum, thank you for sharing your experience. You are right, that is exactly the problem I am having.

In the meantime, steps I did in order to solve this problem:

  1. suspected faulty RAM, so I bought 2 RAM sticks of the same kind and left memtest run for 24 hours. No errors found.
  2. Tried removing one RAM stick, didn't help. Thinkpad froze up after 2 days.
  3. Switched to txtmode. This didn't help either, as @thum noted.

I found that it happens at completely random times but the machine can be used fine for work and entertainment. It is my only computer and for now I learned to live with it.

You mentioned that "using the latest stock boot firmware (!)" didn't solve the problem. What do you mean ? The original BIOS the computer came with? Because mine worked fine with that. I have not changed my computer use patterns in the last year and my Thinkpad X200 was fine and 100% reliable on the original BIOS (my uptimes were as long as 60 days and I only rebooted becuase of bigger kernel updates)

@thum, thank you for sharing your experience. You are right, that is exactly the problem I am having. In the meantime, steps I did in order to solve this problem: 1. suspected faulty RAM, so I bought 2 RAM sticks of the same kind and left memtest run for 24 hours. No errors found. 2. Tried removing one RAM stick, didn't help. Thinkpad froze up after 2 days. 3. Switched to txtmode. This didn't help either, as @thum noted. I found that it happens at completely random times but the machine can be used fine for work and entertainment. It is my only computer and for now I learned to live with it. You mentioned that "using the latest stock boot firmware (!)" didn't solve the problem. What do you mean ? The original BIOS the computer came with? Because mine worked fine with that. I have not changed my computer use patterns in the last year and my Thinkpad X200 was fine and 100% reliable on the original BIOS (my uptimes were as long as 60 days and I only rebooted becuase of bigger kernel updates)
plantroon commented 4 months ago
Poster

After buying more of these laptops, I found out that it was the RAM modules that caused the crash. The RAM modules tested are not faulty and work fine in other systems or without Libreboot.

The following model was causing the crash: Samsung 2GB 2Rx8 PC3-8500S This model I can confirm work fine: Ramaxel 2GB 2RX8 PC3-8800S

I always used 2 matching RAM modules in my testing.

I think that both are models that came with X200 originally, but I cannot confirm because I bought all my X200 used.

After buying more of these laptops, I found out that it was the RAM modules that caused the crash. The RAM modules tested are not faulty and work fine in other systems or without Libreboot. The following model was causing the crash: Samsung 2GB 2Rx8 PC3-8500S This model I can confirm work fine: Ramaxel 2GB 2RX8 PC3-8800S I always used 2 matching RAM modules in my testing. I think that both are models that came with X200 originally, but I cannot confirm because I bought all my X200 used.
plantroon commented 4 months ago
Poster

To clarify: The issue I am describing here has nothing to do with S3 suspend problems or kernel panics. I have those too but this happens regardless of whether I suspend or not and it is never a kernel panic.

To clarify: The issue I am describing here has nothing to do with S3 suspend problems or kernel panics. I have those too but this happens regardless of whether I suspend or not and it is never a kernel panic.
plantroon commented 2 months ago
Poster

I caught this recently - and it was a kernel panic after all.

[15629.573376] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 0: b200004000000800
[15629.573376] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff9eec5b12> {remove_wait_queue+0x12/0x50}
[15629.573376] mce: [Hardware Error]: TSC d610edb8be0 
[15629.573376] mce: [Hardware Error]: PROCESSOR 0:1067a TIME 1537107606 SOCKET 0 APIC 0 microcode 0
[15629.573376] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[15637.614690] mce: [Hardware Error]: Machine check: Processor context corrupt
[15637.614690] Kernel panic - not syncing: Fatal machine check
[15637.614690] Kernel Offset: 0x1de00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[15637.614690] Rebooting in 30 seconds..
[15637.614690] ACPI MEMORY or I/O RESET_REG.
I caught this recently - and it was a kernel panic after all. ``` [15629.573376] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 0: b200004000000800 [15629.573376] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff9eec5b12> {remove_wait_queue+0x12/0x50} [15629.573376] mce: [Hardware Error]: TSC d610edb8be0 [15629.573376] mce: [Hardware Error]: PROCESSOR 0:1067a TIME 1537107606 SOCKET 0 APIC 0 microcode 0 [15629.573376] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [15637.614690] mce: [Hardware Error]: Machine check: Processor context corrupt [15637.614690] Kernel panic - not syncing: Fatal machine check [15637.614690] Kernel Offset: 0x1de00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [15637.614690] Rebooting in 30 seconds.. [15637.614690] ACPI MEMORY or I/O RESET_REG. ```
https://notabug.org/libreboot/libreboot/issues/493
plantroon commented 1 month ago
Poster

Thanks for reminding me about the thread of yours. I have the following microcode-related messages in my dmesg:

[    0.838696] microcode: sig=0x1067a, pf=0x80, revision=0x0
[    0.838811] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

Running Debian Stable. The microcode updates are in my Libreboot ROM. I once tried loading the system microcode updates but the freeze/crash/panic still occurs. The most time I had without crashing was 2 weeks.

Can you please share how your microcode-related message looks? Also some information on the RAM you're using would help.

Maybe I added the microcode updates to the ROM in a wrong way, since the field Revision says 0x0, whatever that means.

Thanks for reminding me about the thread of yours. I have the following microcode-related messages in my dmesg: ``` [ 0.838696] microcode: sig=0x1067a, pf=0x80, revision=0x0 [ 0.838811] microcode: Microcode Update Driver: v2.01 <tigran@aivazian.fsnet.co.uk>, Peter Oruba ``` Running Debian Stable. The microcode updates are in my Libreboot ROM. I once tried loading the system microcode updates but the freeze/crash/panic still occurs. The most time I had without crashing was 2 weeks. Can you please share how your microcode-related message looks? Also some information on the RAM you're using would help. Maybe I added the microcode updates to the ROM in a wrong way, since the field Revision says 0x0, whatever that means.

Revision says 0x0 if the microcode has not been updated yet by the kernel.

The first microcode message is the same to use, at the time of triggering an update, the following appears in dmesg: microcode: updated to revision 0xa0e, date = 2015-07-29

Tell me how you added the updates to Libreboot as I don't know how.

RAM is 1*CT51264BF160B

Revision says 0x0 if the microcode has not been updated yet by the kernel. The first microcode message is the same to use, at the time of triggering an update, the following appears in dmesg: `microcode: updated to revision 0xa0e, date = 2015-07-29` Tell me how you added the updates to Libreboot as I don't know how. RAM is 1*CT51264BF160B
Leah Rowe commented 1 month ago
Owner

you add it from coreboot. it's in their 3rdparty repo. add it using cbfstool

you add it from coreboot. it's in their 3rdparty repo. add it using cbfstool
plantroon commented 1 month ago
Poster

@Fedja Beader you're saying, that when updating the microcode from the OS-supplied package, you don't run into the crash? Because I already tried that.

@Fedja Beader you're saying, that when updating the microcode from the OS-supplied package, you don't run into the crash? Because I already tried that.

Are you sure the microcode was updated at all? Do you see a line similar to microcode: updated to revision 0xa0e, date = 2015-07-29 in your kernel log?

Are you sure the microcode was updated at all? Do you see a line similar to `microcode: updated to revision 0xa0e, date = 2015-07-29` in your kernel log?

For future reference, one of these two on https://github.com/platomav/CPUMicrocodes/tree/master/Intel should be the correct microcode update (PROCESSOR 0:1067a):

  • cpu1067A_platB1_ver00000A0E_2015-07-29_PRD_59BF808E.bin
  • cpu1067A_plat44_ver00000A0E_2015-07-29_PRD_A3107D75.bin

iucode-tool will decide which one.

For future reference, one of these two on https://github.com/platomav/CPUMicrocodes/tree/master/Intel should be the correct microcode update (PROCESSOR 0:1067a): - cpu1067A_platB1_ver00000A0E_2015-07-29_PRD_59BF808E.bin - cpu1067A_plat44_ver00000A0E_2015-07-29_PRD_A3107D75.bin iucode-tool will decide which one.
plantroon commented 2 weeks ago
Poster

Created a pull request to make the information from here available in the docs: https://notabug.org/libreboot/libreboot/pulls/552

Thanks to all who helped. Closing this for now as it is solved by applying microcode updates.

Created a pull request to make the information from here available in the docs: https://notabug.org/libreboot/libreboot/pulls/552 Thanks to all who helped. Closing this for now as it is solved by applying microcode updates.
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.