View Single Post

  #2 (permalink)  
Old 06-08-2006
Ken Ryan
 
Posts: n/a
Default Re: SuSE 10.0 Something broke: /dev/hd* and friends no longer getcreated, boot fails

Ken Ryan wrote:
> Hello.
>
> Last night I put a new I/O board in my machine, which means I had to
> boot for the first time in about two months. Something happened where I
> can no longer boot.
>
> A little while ago I added a rule file to /etc/udev/rules.d (a
> 99-something which attempted to set permissions on /dev/ttyS0) but never
> tested it across a boot.
>
> First, rest assured I reversed the hardware and udev change, so my
> system should be the same as it was before. When getting ready to make
> the change I did a proper shutdown etc.
>
> I was negligent in three respects: I didn't keep up with my backups
> (most recent is a week or so ago), I never boot-tested my udev rules
> change, and I've been periodically running YOU updating everything it
> suggests including the kernel and whatnot but I hadn't been rebooting
> the machine to ensure all is well. So this problem could be caused by
> something that happened or something I did as long as two months ago and
> I didn't run across it until now.
>
> Here is my machine configuration:
>
> - 2.4(?) GHz P4, 1GB RAM, NVidia video, 10/100+USB1.1+1394 combo card,
> Audigy 2, USB2+FW combo card, Promise 20269-based IDE card, two hard
> disks (hda and hdg, both WD 120GB), LG dvd/cd writer, IDE zip drive
> (Dell Dimension 8200 with a couple peripheral changes)
> - Boot on hda1, swap on hda2 and hdg2, root on md0=hda3+hdg3, /home on
> md1=hda4+hdg4. The hdg1 partition is mounted on /altboot; I was going
> to rsync /boot onto it but I never got around to it.
> - All filesystems are ext3
> - Was running KDE with the NVidia driver
>
> When I shut down to add the new board one thing was a little odd - when
> I logged out of my user KDE session I was dropped to a console prompt
> rather than an xdm screen. I assumed that was simply because I had a
> YOU kernel update that hadn't gotten booted on before. I logged in as
> root at the console prompt and executed 'halt'. The system seemed to
> shut down OK at that point (I use the verbose boot, no splash screen).
>
> I made the hardware change, then powered on. The kernek booted, initrd
> loaded, / passed fsck and was mounted (it forced fsck due to being 63
> days since last fsck). It detected and assembled both raid1 volumes BUT
> fsck failed on hda1 and hdg1. At first I though "great, disk error or
> something". It dropped me to single-user, and when I tried rerunning
> fsck I realized it failed because /dev did not contain any hd* devices.
>
> I rebooted into "failsafe" with the same results except this time md0
> and md1 got fsck forced because it claimed 49710 days elapsed since last
> fsck. That makes me uncomfortable, obviously, but the root partition
> (md0) at least seemed to be OK from within single-user.
>
> It was at this point that I reverted the hardware change and my
> /etc/udev/rules.d/99-foo file (by removing the 99-foo file).
>
> Right now whether I try to boot into failsafe or normal mode I end up
> with /dev/hd* missing (/dev/md* is there). If I reboot into the same
> mode fsck doesn't get forced, if I switch from normal to failsafe or
> vice versa I get that weird 49710-day fsck (always the same number). It
> also doesn't matter whether I reboot or halt/powerdown then boot.
>
> A few things I was able to find:
>
> - It appears that /etc/init.d/boot.udev did not get run. I haven't
> figured out yet when it is supposed to run; if it's before or after
> boot.localfs (where I end up in single-user shell).
>
> - Sometimes udevd is running when I'm in singleuser, sometimes not. I
> haven't figured out the pattern yet. As I write this, I booted failsafe
> and am in singleuser with udevd running and /dev is missing the hd* files
>
> - If I run boot.udev force-reload I get a properly populated /dev.
>
> - Note: While I'm concentrating on /dev/hd* (especially /dev/hda1)
> missing, I have not checked if that is the only thing missing. As I
> write this, /dev has some files such as tty*, lp*, parport*, ippp*,
> isdn*, console, and the misc devices (zero, mem, null, etc.).
>
> - /proc and /sys are mounted and appear to be OK. Particularly I
> checked that /sys/block is OK, including /sys/block/hda/hda1.
>
> - if I cd to /dev and run 'df' I see "-" as the device and "/dev" as the
> mount point (I don't know if that's normal or not).
>
> - Booting with the installation DVD (OpenSuSE Eval DVD for 10.0) comes
> up to the installation screens OK, but the repair options don't work
> because they can't figure out where my root is. It appears to find hda1
> OK.
>
> - I tried searching google and google-groups for anything related to
> this but the only clue I was able to find was to verify /sys/block. I
> was unable to come up with a search string that produced something
> useful (a common problem with me, unfortunately).
>
> I appreciate any suggestions of what to try or what to look at.
> Hopefully this afternoon I'll have another 10.0 installation on another
> machine I can compare against, at least so I can see what is right and
> what is broken. Obviously I'm most suspicious that my attempt to use
> udev rules to modify ttyS0 permissions royally screwed things up - I'd
> never tried writing a udev rule before. I've reverted the file change
> as I mentioned, but I'd guess if the saved udevdb got messed up maybe
> that's what's wrong. I haven't posted to the udev lists, though; I want
> to see if there might be another reason or suggestion.
>
> Thanks in advance!
>
> ken
>



further investigation shows something really bizzarre.

When I run udevinfo e.g.

udevinfo -q all -p /sys/block/hda/hda1

all the lines look OK *except* the line

N: ttyS0

is in all files. This is also in /dev/.udevdb files.

I'm certain now that my attempt to write a rule for permissions on ttyS0
is the cause of this. The question is how do I fix it? I removed the
rule I wrote, but something is remembering it. I looked around with
find and grep but I don't know udev and the SuSE boot process well at
all, so I'm having no luck figuring out where the problem is.

Again, any tips would be immensely appreciated!

Thanks...

ken

Reply With Quote