How I fixed the booting problem after a kernel update in openSUSE 11.3

This is not for the experts, as there was very little to fix. It's for the Linux newbies who go "aagh!!!" when they've installed the security update that SUSE's update applet told them to, and then found they couldn't boot any more. And it's so simple that it will save them a lot of surfing and reading posts too technical to understand. Only two things are required: that the "/boot" directory is on the same partition as the "/" (root) directory, and that there is another bootable Linux partition in the grub menu. Because trying to work from a rescue CD didn't work too well.

One night, the automatic update applet in openSUSE 11.3 announced five critical security updates, one of which involved the kernel and had to be done manually because a Wacom USB driver's dependencies could not be satisfied. I let the other four run automatically and then, as proposed by the applet, opened YaST and, from the options offered, chose to uninstall the Wacom USB driver as I don't have a Wacom tablet anyway. And now the kernel was to be upgraded, from 2.6.34-12 to to be exact. This worried me a little as kernel upgrades have never gone well for me, the one or two times I tried (although, those times, I tried to "bake" a kernel following instructions, and always ended up with something unbootable). So when, after the update, my SUSE partition wouldn't boot any more, I wasn't surprised.

Having a separate "/home" partition and Zenwalk, Mint and even Windows to fall back on, I wasn't too upset. Still, it was annoying. So I surfed again. Apparently it was normal for kernel updates, especially in SUSE versions, to break support for USB drivers, leave a messed-up kernel that had to be re-downloaded, and corrupt either the menu list of grub, the default boot loader, or the file called initrd. In my grub menu list, I could see the entries for both kernel versions. Choosing the newer one gave an error 21, telling me that the partition was not found (although booting into Zenwalk showed me that it was still there, undamaged and mountable) while choosing the older one gave error 15: file not found (because that kernel had been removed, duh). Since grub still showed a menu, the most likely problem was a bad initrd (a disk image file needed for booting). The command for creating a new initrd is "mkinitrd", run presumably as root and presumably from the root directory of the afflicted Linux installation.

After a bit of messing around, booting up from the install DVD and choosing "system rescue" and then trying to mount the partition and use chroot to run mkinitrd from the right place, and then some websurfing because it didn't work, I found another solution, booted into Zenwalk Linux (another partition on the same disk), opened a terminal, made myself superuser with "su" and password, and typed in:

umount /dev/sda[n]
mkinitrd -d /dev/sda[n]

where the "[n]" should be replaced with the number of the right partition (which should for some reason be unmounted). This only works if /boot is a subdirectory of that partition, else I would have to mount root and /boot and use chroot and a "bind" option - good reason for noobs to install everything except /home on the same partition. (And, of course, it only worked because I had two Linux installations on the same machine.) A new initrd was written, and now I shall never know whether there was anything wrong with the old one, because I was ready to boot again and discover the root of the problem.

I rebooted, chose the new kernel and again got error 21. This time I noticed that what couldn't be found was (hda1,[n]) which in grubspeak means the nth partition on the second disk. But the laptop only has one! Then I restarted and, on getting the grub menu, typed in what I wrote down when surfing Linux forums, and should have used in the first place:

(this gives access to the grub command line)
find /boot/grub/menu.lst
(I knew perfectly well where it's located, but I wanted to see if grub knew too, and it gave me the answer in grubspeak: (hd0,10) meaning the eleventh partition on the first harddisk)
root (hd0,10)
(this is grub's chroot)
kernel /boot/vmlinuz
(tells grub which kernel to use, ie. vmlinuz in /boot of the partition I chose)
initrd /boot/initrd
(ditto for initrd)
(speaks for itself)

and I booted into SUSE without a hitch. And ran mkinitrd again just to be sure. And edited /boot/grub/menu.lst, where, just as I'd suspected, the old kernel entries had not been erased, but the new kernel entries had "hd1,10" instead of "hd0,10". Did the update script assume that if the old kernel is on the first disk, the new kernel must be on the second disk, even if no second disk is installed? Of course, I changed the disk number to 0 and removed the old entries, after which I could boot from the menu again.

So maybe there was nothing wrong with the initrd file(s) and the only problem had been the erroneous disk number in /boot/grub/menu.lst. In which case I could just have booted into SUSE from the grub command line and corrected menu.lst straight away. That's how simple fixing a boot problem can be!

Postscript: within a week of writing this, I had the second kernel update to, to fix the aforementioned USB driver problems. Same thing: menu.lst wasn't updated properly and had to be edited. There was nothing wrong with initrd. This has happened with every subsequent kernel update, so after every update, I edit menu.lst before restarting.