#561 Grub hacker help wanted

Open
opened 5 months ago by swiftgeek · 3 comments

In this meta bug i will just start listing bugs/issues we currently face with grub in various configurations, and we need help even with identifying those bugs properly:

Grub baremetal:

  • Some USB Mass Storage devices is freezing grub (forever/minutes). That even includes usb floppy
  • Sometimes it even happens for just any usb device like hid keyboard/mouse
  • native ahci starts looping after first access to drive (eg. after ls)
  • cryptomount (compared to grub i386-pc) takes at least order of magnitude more on baremetal.
  • USB plugged in on power on (so i far I reproduced this only on X200 and related) produces error: EHCI grub_ehci_pci_iter: EHCI halt timeout. Works perfectly fine when hotplugging in grub or in every case when in linux.
  • ahci/ata device cannot be distinguished from atapi ones, which causes unnecessary delays

SeaGRUB (grub i386-pc loaded via SeaBIOS from floppy/disk image)

  • making it recognize fs from image grub was loaded from. Same image dd-ed onto pendrive doesn't seem to have issues and is recognized properly
  • automatic module loading doesn't work when using cbfs (prefix doesn't seem to work as intended)
In this meta bug i will just start listing bugs/issues we currently face with grub in various configurations, and we need help even with identifying those bugs properly: Grub baremetal: * Some USB Mass Storage devices is freezing grub (forever/minutes). That even includes usb floppy * Sometimes it even happens for just any usb device like hid keyboard/mouse * native ahci starts looping after first access to drive (eg. after ls) * cryptomount (compared to grub i386-pc) takes at least order of magnitude more on baremetal. * USB plugged in on power on (so i far I reproduced this only on X200 and related) produces `error: EHCI grub_ehci_pci_iter: EHCI halt timeout.` Works perfectly fine when hotplugging in grub or in every case when in linux. * ahci/ata device cannot be distinguished from atapi ones, which causes unnecessary delays SeaGRUB (grub i386-pc loaded via SeaBIOS from floppy/disk image) * making it recognize fs from image grub was loaded from. Same image dd-ed onto pendrive doesn't seem to have issues and is recognized properly * automatic module loading doesn't work when using cbfs (prefix doesn't seem to work as intended)
strcpy commented 3 months ago

Some research into the cryptomount performance:

Doing "set debug=luks" reveals the slowdown is between the debug logs "Trying keyslot 0" and "PBKDF2 done" in disk/luks.c. The only thing of consequence between those lines is a grub_crypto_pbkdf2 on the passphrase. However, later in the same function, between "candidate key recovered" and "Slot 0 opened", there is another call to grub_crypto_pbkdf2 for the candidate key, and there is no delay between these two logs.

Additionally, manually creating a LUKS partition on a USB with cryptsetup --iter-time 1 luksFormat means that cryptomount finishes almost instantly, so the difficulty of the passphrase key derivation is related. Diffing the grub_crypto_pbkdf2 implementation in the latest Libreboot release with the latest GRUB release shows nothing has changed there. Perhaps it is just a slower implementation than the one the Linux kernel has, but I don't see how that would mean it is only slow on baremetal, as opposed to i386-pc.

Edit: The pbkdf2 implementation in cryptsetup is here, and to my untrained eye it looks identical to GRUB's.

Some research into the cryptomount performance: Doing "set debug=luks" reveals the slowdown is between the debug logs "Trying keyslot 0" and "PBKDF2 done" in *disk/luks.c*. The only thing of consequence between those lines is a *grub_crypto_pbkdf2* on the passphrase. However, later in the same function, between "candidate key recovered" and "Slot 0 opened", there is another call to *grub_crypto_pbkdf2* for the candidate key, and there is no delay between these two logs. Additionally, manually creating a LUKS partition on a USB with *cryptsetup --iter-time 1 luksFormat* means that cryptomount finishes almost instantly, so the difficulty of the passphrase key derivation is related. Diffing the *grub_crypto_pbkdf2* implementation in the latest Libreboot release with the latest GRUB release shows nothing has changed there. Perhaps it is just a slower implementation than the one the Linux kernel has, but I don't see how that would mean it is only slow on baremetal, as opposed to i386-pc. Edit: The pbkdf2 implementation in cryptsetup is [here](https://gitlab.com/cryptsetup/cryptsetup/blob/master/lib/crypto_backend/pbkdf2_generic.c), and to my untrained eye it looks identical to GRUB's.
strcpy commented 3 months ago

Cryptomount is also slow on coreboot 4.9, with GRUB2 master, also seemingly in the grub_crypto_pbkdf2 function.

@swiftgeek - When you say performance is better on i386-pc, is that through the stock BIOS or SeaBIOS? Thanks.

Edit: I think the issue is with GRUB's implementation of memmove (grub_memmove). It doesn't perform as well as glibc's. However, on my desktop, running with -O3 has a ~5x performance increase over -O2 on the GRUB implementation, so simply building GRUB with -O3 might be enough to fix this. I'll try and confirm this at some point.

Cryptomount is also slow on coreboot 4.9, with GRUB2 master, also seemingly in the *grub_crypto_pbkdf2* function. @swiftgeek - When you say performance is better on i386-pc, is that through the stock BIOS or SeaBIOS? Thanks. Edit: I think the issue is with GRUB's implementation of memmove (*grub_memmove*). It doesn't perform as well as glibc's. However, on my desktop, running with -O3 has a ~5x performance increase over -O2 on the GRUB implementation, so simply building GRUB with -O3 might be enough to fix this. I'll try and confirm this at some point.
Swift Geek commented 3 months ago
Collaborator

Both vendor bios (phoenix) and seabios, when not using grub native drivers

Both vendor bios (phoenix) and seabios, when not using grub native drivers
Sign in to join this conversation.
Loading...
Cancel
Save
There is no content yet.