SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain...

79
www.novell.com Novell Training Services ATT LIVE 2012 LAS VEGAS SUSE Advanced Troubleshooting: The Boot Process Lab SUS21 Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Transcript of SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain...

Page 1: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

www.novel l .comNovell Training Services

AT T L I V E 2 0 1 2 L A S V E G A S

SUSE Advanced Troubleshooting: The Boot ProcessLab

S U S 2 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 2: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Proprietary StatementCopyright © 2011 Novell, Inc. All rights reserved.

Novell, Inc., has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed on the Novell Legal Patents Web page (http://www.novell.com/company/legal/patents/) and one or more additional patents or pending patent applications in the U.S. and in other countries.

No part of this publication may be reproduced, photocopied, stored on a retrieval system, or transmitted without the express written consent of the publisher.

Novell, Inc.404 Wyman Street, Suite 500Waltham, MA 02451U.S.A.www.novell.com

Novell TrademarksFor Novell trademarks, see the Novell Trademark and Service Mark list (http://www.novell.com/company/legal/trademarks/tmlist.html).

Third-Party MaterialsAll third-party trademarks are the property of their respective owners.

Software PiracyThroughout the world, unauthorized duplication of software is subject to bothcriminal and civil penalties.

If you know of illegal copying of software, contact your local Software Antipiracy Hotline. For the Hotline number for your area, access Novell’s World Wide Web page (http://www.novell.com) and look for the piracy page under “Programs.”Or, contact Novell’s anti-piracy headquarters in the U.S. at 800-PIRATES (747-2837) or 801-861-7101.

DisclaimerNovell, Inc., makes no representations or warranties with respect to the contents or use of this documentation, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose.

Further, Novell, Inc., reserves the right to revise this publication and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. Further, Novell, Inc., makes no representations or warranties with respect to any software, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, Novell, Inc., reserves the right to make changes to any and all parts of Novell software, at any time, without any obligation to notify any person or entity of such changes.

Any products or technical information provided under this Agreement may besubject to U.S. export controls and the trade laws of other countries. You agree to comply with all export control regulations and to obtain any required licenses or classification to export, re-export or import deliverables. You agree not to export or re-export to entities on the current U.S. export exclusion lists or to any embargoed or terrorist countries as specified in the U.S. export laws. You agree to not use deliverables for prohibited nuclear, missile, or chemical biological weaponry end uses. See the Novell International Trade Services Web page (http://www.novell.com/info/exports/) for more information on exporting Novell software. Novell assumes no responsibility for your failure to obtain any necessary export approvals.

This Novell Training Manual is published solely to instruct students in the use of Novell networking software. Although third-party application software packages are used in Novell training courses, this is for demonstration purposes only and shall not constitute an endorsement of any of these software applications.

Further, Novell, Inc. does not represent itself as having any particular expertisein these application software packages and any use by students of the same shall be done at the student’s own risk.

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 3: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Contents

Section 1 Troubleshooting........................................................................7

Exercise 1.1 Troubleshooting Techniques.......................................................................................8

Exercise 1.2 Troubleshooting Table...............................................................................................12

Section 2 Administration.........................................................................13

Exercise 2.1 Configuring Your Snapshot.......................................................................................14Task I: Take a Snapshot................................................................................................................14

Section 3 Troubleshooting Exercises.....................................................16

Exercise 3.1 Troubleshooting Exercise: Root Password............................................................17Task I: Configuration...................................................................................................................18Task II: Troubleshooting Procedure.............................................................................................19Task III: Root Cause.....................................................................................................................19

Exercise 3.2 Troubleshooting Exercise: Users Locked Out........................................................20Task I: Configuration...................................................................................................................21Task II: Troubleshooting Procedure.............................................................................................22Task III: Root Cause.....................................................................................................................22

Exercise 3.3 Troubleshooting Exercise: Repair Filesystem Prompt..........................................23Task I: Configuration...................................................................................................................24Task II: Troubleshooting Procedure.............................................................................................25Task III: Root Cause.....................................................................................................................25

Exercise 3.4 Troubleshooting Exercise: Server Hung with Blank Screen.................................27Task I: Configuration...................................................................................................................28Task II: Troubleshooting Procedure.............................................................................................29Task III: Root Cause.....................................................................................................................29

Exercise 3.5 Troubleshooting Exercise: Kernel and Initrd Messages.......................................30Task I: Configuration...................................................................................................................31Task II: Troubleshooting Procedure.............................................................................................32Task III: Root Cause.....................................................................................................................32

Exercise 3.6 Troubleshooting Exercise: Server Reboots............................................................34Task I: Configuration...................................................................................................................35Task II: Troubleshooting Procedure.............................................................................................36Task III: Root Cause.....................................................................................................................36

Exercise 3.7 Troubleshooting Exercise: Login Console Hang...................................................38Task I: Configuration...................................................................................................................39Task II: Troubleshooting Procedure.............................................................................................40Task III: Root Cause.....................................................................................................................40

Exercise 3.8 Troubleshooting Exercise: Waiting for Device.......................................................41Task I: Configuration...................................................................................................................42Task II: Troubleshooting Procedure.............................................................................................43Task III: Root Cause.....................................................................................................................43

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

3

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 4: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

Exercise 3.9 Troubleshooting Exercise: GRUB Prompt..............................................................45Task I: Configuration...................................................................................................................46Task II: Troubleshooting Procedure.............................................................................................47Task III: Root Cause.....................................................................................................................48

Exercise 3.10 Troubleshooting Exercise: Failed Run Level Services..........................................49Task I: Configuration...................................................................................................................50Task II: Troubleshooting Procedure.............................................................................................51Task III: Root Cause.....................................................................................................................52

Exercise 3.11 Troubleshooting Exercise: Read-Only Root Filesystem.......................................53Task I: Configuration...................................................................................................................54Task II: Troubleshooting Procedure.............................................................................................55Task III: Root Cause.....................................................................................................................55

Exercise 3.12 Troubleshooting Exercise: Missing Action Field...................................................56Task I: Configuration...................................................................................................................57Task II: Troubleshooting Procedure.............................................................................................58Task III: Root Cause.....................................................................................................................59

Exercise 3.13 Troubleshooting Exercise: GRUB............................................................................60Task I: Configuration...................................................................................................................61Task II: Troubleshooting Procedure.............................................................................................62Task III: Root Cause.....................................................................................................................62

Exercise 3.14 Troubleshooting Exercise: Invalid Partition Table.................................................64Task I: Configuration...................................................................................................................65Task II: Troubleshooting Procedure.............................................................................................66Task III: Root Cause.....................................................................................................................69

Exercise 3.15 Troubleshooting Exercise: Kernel Panic.................................................................70Task I: Configuration...................................................................................................................71Task II: Troubleshooting Procedure.............................................................................................72Task III: Root Cause.....................................................................................................................73

Exercise 3.16 Troubleshooting Exercise: Error in Service Module..............................................73Task I: Configuration...................................................................................................................75Task II: Troubleshooting Procedure.............................................................................................75Task III: Root Cause.....................................................................................................................76

Exercise 3.17 Troubleshooting Exercise: Fatal modules.dep Error.............................................77Task I: Configuration...................................................................................................................78Task II: Troubleshooting Procedure.............................................................................................79Task III: Root Cause.....................................................................................................................79

Exercise 3.18 Troubleshooting Exercise: Another Kernel Panic..................................................81Task I: Configuration...................................................................................................................82Task II: Troubleshooting Procedure.............................................................................................83Task III: Root Cause.....................................................................................................................84

Exercise 3.19 Troubleshooting Exercise: Segmentation Fault.....................................................86Task I: Configuration...................................................................................................................87Task II: Troubleshooting Procedure.............................................................................................88Task III: Root Cause.....................................................................................................................89

Exercise 3.20 Troubleshooting Exercise: Respawning Too Fast.................................................90Task I: Configuration...................................................................................................................91Task II: Troubleshooting Procedure.............................................................................................92Task III: Root Cause.....................................................................................................................93

4 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 5: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Exercise 3.21 Troubleshooting Exercise: Booting to $ Prompt....................................................94Task I: Configuration...................................................................................................................95Task II: Troubleshooting Procedure.............................................................................................96Task III: Root Cause.....................................................................................................................97

Exercise 3.22 Troubleshooting Exercise: Server Hang at Boot....................................................98Task I: Configuration...................................................................................................................99Task II: Troubleshooting Procedure...........................................................................................100Task III: Root Cause..................................................................................................................101

Exercise 3.23 Troubleshooting Exercise: Power Off...................................................................102Task I: Configuration.................................................................................................................103Task II: Troubleshooting Procedure...........................................................................................104Task III: Root Cause..................................................................................................................105

Exercise 3.24 Troubleshooting Exercise: Critical Data...............................................................106Task I: Configuration.................................................................................................................107Task II: Troubleshooting Procedure...........................................................................................108Task III: Root Cause..................................................................................................................109

Exercise 3.25 Troubleshooting Exercise: Kernel Panic After Disk Change...............................110Task I: Configuration..................................................................................................................111Task II: Troubleshooting Procedure...........................................................................................112Task III: Root Cause...................................................................................................................112

Exercise 3.26 Troubleshooting Exercise: Command Not Found................................................114Task I: Configuration.................................................................................................................115Task II: Troubleshooting Procedure...........................................................................................116Task III: Root Cause...................................................................................................................116

Exercise 3.27 Troubleshooting Exercise: Waiting for Device after LUN....................................118Task I: Configuration.................................................................................................................119Task II: Troubleshooting Procedure...........................................................................................120Task III: Root Cause..................................................................................................................121

Exercise 3.28 Troubleshooting Exercise: Not Booting After Power Failure..............................122Task I: Configuration.................................................................................................................123Task II: Troubleshooting Procedure...........................................................................................124Task III: Root Cause..................................................................................................................124

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

5

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 6: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

List of Figures

Initial "Refresh" Snapshot............................................................................................................14

6 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 7: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting

Section 1 Troubleshooting

Brief overview of troubleshooting techniques and the troubleshooting table.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

7

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 8: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

1.1 Troubleshooting Techniques

Troubleshooting Procedure

1. Let the server boot until it fails

2. Write down verbatim what's on the screen

3. Match on-screen landmarks to the Troubleshooting Table

4. Use Boot Installed System (BIS) to bypass GRUB, kernel and ram disk issues

5. Use an administrative run level for daemon failure

6. Use Chroot Installed System (CIS) if all else fails

7. Address the issues and files associated with the location of the boot failure (see Troubleshooting Table)

Boot Installed System (BIS)

1. Used mostly in lines 1-7 of the Troubleshooting Table.

2. Boot from DVD1

3. Select “Installation”

4. Accept the License Agreement

5. Click “Next”, and “Next” to skip media checks

6. Select “Repair Installed System” and “Next”

7. Select “Expert Tools”

8. Select “Boot Installed System”

NOTE: Select “Repair Installed System” directly from the DVD boot menu does not probe

as thoroughly as the “Repair Installed System” from the Installation option.

Administrative Run Levels

Run level S and 1 are very similar to chroot installed system (CIS), as far as run levels go.

However, run levels S and 1 use the installed system's boot loader, kernel and ram disk to

boot. It just doesn't start all the system processes like run level 3 or 5 do. So, run level S

and 1 are preferred over CIS. There are a couple of ways to change to run level S or 1. You

could just type init 1. However, if you are troubleshooting system processes that fail at boot

time or cause the server to misbehave as a result; you will want to reboot the server, bypass

8 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 9: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting

the default run level and boot into the run level 1. To boot to run level 1, do the following:

1. Boot the server normally

2. Select the kernel you usually boot to

3. Tab or click in the “Boot Options” field

4. Append “ 1” (a space followed by the number 1) to the boot options line

5. Type root's password

If you need network access, run “/etc/init.d/network start”, or dhcpcd eth0

Chroot Installed System (CIS)

1. Used mostly in lines 8 and above of the troubleshooting table.

2. Boot from DVD1

3. Select “Rescue System”, Rescue login: root

4. Your first goal is to find and mount the root “/” partition, so we can see /etc/fstab

1. Run cat /proc/partitions to find the disk devices the OS sees

2. For each device, display the partition table

ls­boot:~ # parted ­s /dev/sda print 

Disk geometry for /dev/sda: 0kB ­ 2147MB 

Disk label type: msdos 

Number  Start   End     Size    Type      File system  Flags 

1       32kB    214MB   214MB   primary   ext2         boot, type=83 

2       214MB   535MB   321MB   primary   linux­swap   type=82 

3       535MB   2147MB  1612MB  extended               lba, type=0f 

5       535MB   1012MB  477MB   logical   reiserfs     type=83 

6       1012MB  1596MB  584MB   logical   reiserfs     type=83 

7       1596MB  2147MB  551MB   logical   reiserfs     type=83 

3. You can ignore type 82 swap and type 0f extended partitions

4. To find the root partition, you may need to just guess. For example,

1. mount /dev/sda1 /mnt 

2. ls ­l /mnt 

3. If the /mnt directory listing shows /etc and /root, then its the root partition

4. Repeat these steps for each device until you find root. In this case, the root device is /dev/sda6

5. mount /dev/sda6 /mnt 

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

9

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 10: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

5. Mount all additional file systems relative to /mnt

1. Once you have mounted the root filesystem, run cat /mnt/etc/fstab to see all the other

filesystem mount points.

2. Mount all file systems manually as shown in /mnt/etc/fstab.

mount /dev/sda1 /mnt/boot 

mount /dev/sda5 /mnt/var 

mount /dev/sda7 /mnt/usr 

3. Rebind the /proc, /sys and /dev filesystems.

mount ­­rbind /proc /mnt/proc 

mount ­­rbind /sys /mnt/sys 

mount ­­rbind /dev /mnt/dev 

6. Chroot to the installed system: chroot /mnt 

7. To return to the rescue system, type exit.

Flow Control

The normal boot messages display on the screen very fast. There are ways to slow it down

and test each service as it loads. The boot messages are controlled by variables set in the

/etc/sysconfig/boot file.

FLOW_CONTROL=”yes”

Allows you to stop the boot process messages using Ctrl-S and resume them with Ctrl-Q.

PROMPT_FOR_CONFIRM=”yes”

CONFIRM_PROMPT_TIMEOUT=”5”

This will display the prompt:

Enter interactive startup mode? y/[n](5s)

You will need to select “y” to enter interactive startup mode within the

CONFORM_PROMPT_TIMEOUT period, otherwise the server will boot normally without

prompting to load system daemons. After you enter interactive startup mode, you will be

prompted to load each service with the following:

10 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 11: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting

Start service <service_name>, (Y)es/(N)o/(C)ontinue? [y]

The CONFIRM_PROMPT_TIMEOUT value also applies to each service start prompt. This

was not true with earlier versions of SLES.

Once the server has booted up, you can use Shift-PgUp to scroll up about two screens worth

of boot messages, regardless of the /etc/sysconfig/boot settings. However, if you switch to

other consoles (ie tty2), you will not be able to use this keystroke.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

11

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 12: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

1.2 Troubleshooting TableBoot Process Associated File(s) On-Screen Landmarks Troubleshooting/Potential Fixes

1 BIOS N/A BIOS Messages Update the firmwareMark boot device “bootable” with fdisk

2 MBR /boot/grub/stage1 GRUB loading stage2... BISgrub-install or yast bootloader

3 GRUB /boot/grub/stage2/boot/grub/menu.lst

GRUB menu or grub> prompt BISgrub-install or yast bootloaderCheck /boot/grub/menu.lst (file and device)

4 GRUB /boot/vmlinuz/boot/initrd

root (hd?,?) Filesystem type is … kernel /<path_to_vmlinuz>initrd /<path_to_initrd>

Reinstall kernel RPMmkinitrdGRUB loads and boots the kernel

5 kernel /boot/vmlinuz Kernel driver information beginning with [ 0.0000000 ] time stamps

BISReinstall kernel rpm

6 initrd /boot/initrd/etc/sysconfig/kernel

A time stamp [ 0.0000000] followed by module info BIScd /tmp/ramdisk; zcat /boot/initrd | cpio -ivdmkinitrd

7 ramdisk:init /init in /boot/initrd/etc/sysconfig/kernel

Starting udevdCreating devicesLoading <module_name>

There will be “Loading” for each module defined in /etc/sysconfig/kernel INITRD_MODULES

BISmkinitrd creates the ramdisk:init file.

8 sbin:init /sbin/init/etc/inittab

INIT: version 2.86 booting Use boot options init=/bin/bash or init=/bin/sash to bypass running /sbin/init

9 sbin:init:boot /bin/bash/etc/init.d/boot/etc/init.d/boot.d/*/etc/sysconfig/boot

System Boot Control: Running /etc/init.d/bootEach service shows: done, failed or skippedSystem Boot Control: The system has been setup

CISPROMPT_FOR_CONFIRM=”yes”RUN_PARALLEL="no"FLOW_CONTROL="yes" (Ctrl-S stops, Ctrl-Q resumes)

10 sbin:init:boot /etc/init.d/boot.local System Boot Control: Running /etc/init.d/boot.local CIS

11 sbin:init /etc/inittab INIT: Entering runlevel: 3 init 1 or CIS

12 sbin:init:rc /bin/bash/etc/init.d/rc/etc/init.d/rc?.d/*/etc/init.d/before.local/etc/init.d/after.local

Master Resource Control: previous runlevel:N, switching to runlevel: 3Master Resource Control: Running /etc/init.d/before.localEach service shows: done, failed or skippedMaster Resource Control: Running /etc/init.d/after.localMaster Resource Control: runlevel 3 has been reachedSkipped services in runlevel 3:

init 1 or CISPROMPT_FOR_CONFIRM=”yes”RUN_PARALLEL="no"FLOW_CONTROL="yes"

13 sbin:init /etc/inittab N/A init 1 or CISinit refers to it's inittab file to know how to run the login programs.

14 sbin:init:mingetty

/etc/issue/etc/motd/etc/nologin/sbin/mingetty/etc/pam.d/login

<contents of /etc/issue>login:

init 1 bypasses mingettyCIS

15 sbin:init:X /etc/sysconfig/ displaymanger/etc/sysconfig/ windowmanger

Graphical login screen init 1 bypasses X loginCIS

12 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 13: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting

Section 2 Administration

Exercises that help prepare for the troubleshooting labs.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

13

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 14: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

2.1 Configuring Your SnapshotThe boot process labs require a lot of rebooting. Create a snapshot with the ls-boot virtual

machine running to minimize boot time after reverting your snapshot.

Objectives:Task I: Take a Snapshot

Special Instructions and Notes:

None

Task I: Take a Snapshot1. In VMware, click File, Open

2. Select /opt/labs/vms/ls-boot/ls-boot.vmx

3. Power on the ls-boot virtual machine.

4. Select “Boot from Hard Disk”

5. Login as root, password linux

6. Type hi

7. Type bplab followed by a space, but DO NOT press Enter.

8. Select VM, Snapshot, Take Snapshot

9. Call the Snapshot “Revert” and press OK

10. Wait for the virtual machine state to finish saving before continuing

14 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 15: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Administration

This exercise will reduce down time between exercises.

(End of Exercise)

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

15

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 16: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

Section 3 Troubleshooting Exercises

The purpose of these exercises is to create a boot failure that you need to troubleshoot. The lab notes are an example of how to approach troubleshooting that lab's symptoms. Realize that there are multiple ways to troubleshoot issues. Try to resolve the problem without looking at the lab notes. The lab notes will contain the symptom, error messages if any, a method for troubleshooting the issue, and the root cause.

You will become effective at troubleshooting boot related issues as you practice the techniques taught and apply them to the exercises in this lab.

16 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 17: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.1 Troubleshooting Exercise: Root PasswordI forgot root's password. Set root's password to "linux" and login normally as the root user.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 1

3. Press enter to continue

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Method 1

1. Reboot with boot options parameter: init=/bin/bash

2. The passwd command is in /usr/bin, which is not mounted yet.

3. Run mount ­a to mount all remaining filesystems

4. Run /usr/bin/passwd to change root's password

5. Reboot

2. Method 2

1. Boot to Rescue System; Rescue login: root

2. chroot Installed System (CIS)

3. Run passwd to change root's password

4. Reboot

Task III: Root Cause1. Root's password was forgotten

2. Does an Automatic Repair fix this scenario? No

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

17

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 18: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

Though forgetting is not directly a boot related issue, changing the root's password is a good skill to have.

(End of Exercise)

18 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 19: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.2 Troubleshooting Exercise: Users Locked OutAll my users are locked out, only root can log in. Make sure the geeko user can login without errors. Geeko's password is linux.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 2

3. Press enter to continue

4. Some errors observed

1. Permission denied (publickey,keyboard­interactive). (ssh)

2. login[2639]: FAILED LOGIN 1 FROM /dev/tty2 FOR geeko, Authentication failure

3. login[3050]: FAILED LOGIN 1 FROM /dev/tty1 FOR UNKNOWN, User not known to the underlying 

authentication module

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Review the troubleshooting table and identify where you are.

2. The boot process is complete, and you are at the login.

3. Read the man pages for each of the associated files with login.

4. Does an /etc/nologin file exist? Yes.

5. Remove the /etc/nologin file.

Task III: Root Cause1. Normal administrative feature. No logins are allowed when /etc/nologin is present.

2. Does an Automatic Repair fix this scenario? No

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

19

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 20: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

Normal administrative functions may appear like failures, but are not.

(End of Exercise)

20 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 21: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.3 Troubleshooting Exercise: Repair Filesystem PromptThe system fails to boot and prompts for root's password.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 3

3. Press enter to continue

4. Some errors observed

1. fsck failed for at least one filesystem (not /).

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table.

2. Write down any errors verbatim on the screen. To see additional errors that have scrolled past the current

screen, press Shift-Up. This allows you to see several previous screens before logging in. Some errors in

addition to the error(s) above include:

1. Filesystem is clean failed

2. blogd: no message logging because /var file system is not accessible

3. Failed to open the device '/dev/hdb3': No such file or directory

4. You could also look in /var/log/boot.msg for these errors, but in this case that won't work, because /var

was not mounted.

3. Since we see “System Boot Control:” messages, but we never see a “Master Resource Control:” message, we

are in the init boot phase of the boot process.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

4. When you see the prompts “Give root password for login:” and “Attention: Only CONTROL-D will reboot

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

21

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 22: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

the system in this maintenance mode,” you should immediately suspect problems with the /etc/fstab

configuration file.

5. Try cat /proc/partitions to see if the OS recognizes the /dev/hdb3 partition.

6. Edit the /etc/fstab and confirm that each device, mount point, and mount options are valid. Comment out any

that are not valid.

7. Comment out /dev/hdb3 from /etc/fstab and press Ctrl-D to reboot.

8. If this works, then we need to determine why /dev/hdb3 does not exist. In this case, it was an old disk that

was removed.

9. Edit /etc/fstab

10. Delete the entry “/dev/hdb3 /vol1 reiserfs acl,user_xattr 1 1”

11. Save and reboot

Task III: Root Cause1. Invalid /etc/fstab entry, /dev/hdb3 is a non-existent device

2. Does an Automatic Repair fix this scenario? Yes

The boot process looks in the /etc/fstab for filesystems that need to be mounted at boot time. If the last entry in the /etc/fstab is non-zero, the filesystem will be checked for errors. If the device cannot be found, or the filesystem does not check properly, the boot will fail and stop in the repair mode. You can usually comment out non-system filesystems from the /etc/fstab and boot properly for troubleshooting.

(End of Exercise)

22 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 23: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.4 Troubleshooting Exercise: Server Hung with Blank Screen

When I boot the server, it just hangs. The screen is completely blank.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 4

3. Press enter to continue

4. Some errors observed

1. Blank screen

2. VM attempts PXE boot

3. Operating System not found

1. Booting from local disk...

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Did you see the boot loader menu, and successfully picked a kernel to boot? No. BIS will still help, but the

problem is with the boot loader itself. Execute the BIS procedure.

3. Since we saw the BIOS information on screen, but not boot loader menu, and no errors, the problem is with

the BIOS transitioning to the stage1 boot loader.

4. BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

5. Try reinstalling the boot loader and reboot to test.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

23

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 24: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

1. DVD1, Installation, Repair Installed System (RIS), Expert Tools, Install New Boot Loader, OK (No

edits or changing necessary), Exit RIS and reboot.

2. NOTE: DVD1, Repair Installed System does not always work well, you should use the repair installed

system after selecting Installation.

Task III: Root Cause1. Damaged or corrupted Master Boot Record

2. Does an Automatic Repair fix this scenario? Yes

The master boot record was corrupted. Since we boot off the DVD for BIS, we bypassed the disk's MBR. Reinstalling the boot loader resolves the issue.

(End of Exercise)

24 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 25: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.5 Troubleshooting Exercise: Kernel and Initrd Messages

We moved our server from one data center to another. When we boot the server, we just see some kernel and initrd message information. Sometimes the screen just goes black or the server reboots.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 5

3. Press enter to continue

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Once you pressed enter on the kernel to boot from the boot loader, did you see any messages scroll on the

screen? No. This indicates something is wrong with whatever the boot loader is pointing to (ie the kernel and

ram disk).

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. GRUB runs the commands in /boot/grub/menu.lst order. So, it would run root, kernel, initrd, and then boot.

The only command you don't see on screen is boot. Since nothing scrolled on the screen at all, you can

suspect the kernel could not execute and something is wrong with the kernel.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

4. Load BIS, and make sure the /boot/grub/menu.lst is valid and all files are present.

5. Check the /boot/vmlinuz and /boot/initrd symbolic links found in menu.lst and make sure they are pointing

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

25

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 26: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

to valid files in /boot

6. Run rpm -Vf /boot/vmlinuz to validate the rpm that owns the /boot/vmlinuz file. Notice that the

/boot/vmlinuz-3.0* kernel is marked with an S and 5. Refer to the rpm man page to understand all the RPM

verify options, but S means the size has changed and 5 means the MD5 checksum has changed. Since

/boot/vmlinuz is a symbolic link to the vmlinuz kernel file, then this is a major red flag. The Linux kernel

itself has changed.

7. Try reinstalling the kernel, and reboot.

8. Boot installed system and make sure DVD installation media is mounted.

9. Install the kernel rpm, yast -i kernel-pae-base

10. rpm -Vf /boot/vmlinuz should return no output

11. Reboot

Task III: Root Cause1. Corrupt kernel in /boot

2. Does an Automatic Repair fix this scenario? No

The kernel was damaged and needed to be reinstalled. BIS worked because it bypassed the installed kernel.

(End of Exercise)

26 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 27: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.6 Troubleshooting Exercise: Server RebootsYour computer keeps rebooting. You do not have access to your installation media and so cannot use rescue mode.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 6

3. Press enter to continue

4. Do not use rescue mode.

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since the screen goes by too fast, it's hard to narrow down where the reboot starts.

So reboot and append to the boot options: S. This tells the kernel to bypass the

existing runlevel and start in the stand alone run level. Runlevel 1 will also fail, but

runlevel S will prompt for root's password.

3. Change /etc/sysconfig/boot options to the following:

1. PROMPT_FOR_CONFIRM=”yes” (Prompts before loading each service)

2. FLOW_CONTROL=”yes” (Pauses the screen with Ctrl-S and resumes with

Ctrl-Q)

3. RUN_PARALLEL="no" (Runs each service, waits before running the next)

4. Reboot and press “y” to “Enter Interactive startup mode”.

4. Load each service until you notice that the server is rebooting. Press Ctrl-S to

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

27

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 28: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

pause the screen and observe the messages on screen. Press Shift-Up to see more

messages that have scrolled too far.

5. Since we see “System Boot Control:” messages, but we never see a “Master

Resource Control:” message, we are in the init boot phase of the boot process.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

6. The last message before the reboot message “INIT: Switching to runlevel: 6” is

“System Boot Control: Running /etc/init.d/boot.local”

7. Reboot and include the boot loader boot options parameter: S, then edit and check

the following: /etc/inittab, /etc/init.d/boot.local, 

/etc/init.d/before.local and /etc/init.d/after.local 

(before and after.local files do not exist by default, whereas boot.local does exist

but is usually empty.)

8. Try switching to run level 3 to see if the rebooting has stopped init 3.

9. Edit /etc/init.d/boot.local

10. Remove the shutdown ­r now command.

Task III: Root Cause1. Reboot command in the boot script: /etc/init.d/boot.local

2. Does an Automatic Repair fix this scenario? No

The init process run /etc/init.d/boot to process all boot level scripts, followed by the /etc/init.d/boot.local. This command is run prior to starting the typical run levels. An invalid command in this file will cause problems at boot time.

(End of Exercise)

28 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 29: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.7 Troubleshooting Exercise: Login Console HangI can login to a virtual console, but once I logout, I cannot log back into them. Help!

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 7

3. Press enter to continue

4. Some errors observed

1. INIT: no more processes left in this runlevel

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. The server appears to be at the very end of the boot process; sounds like a

configuration issue.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. What application provides the login after a reboot? mingetty

4. Did you login successfully? Yes. So mingetty is probably working fine.

5. What application runs the mingetty login program? /sbin/init

6. What is /sbin/init's configuration file? /etc/inittab

7. Use 'man inittab' to help you understand the fields and confirm the mingetty field

values are correct.

8. Compare /etc/inittab from a working system to your faulty system.

9. You could also reinstall the aaa_base package that owns /etc/inittab by running

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

29

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 30: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

yast ­i aaa_base.

Task III: Root Cause1. Incorrect /etc/inittab configuration

2. Does an Automatic Repair fix this scenario? No

Init is the parent process of all running processes, including the login programs. An invalid configuration caused init to stop respawning the login process.

(End of Exercise)

30 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 31: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.8 Troubleshooting Exercise: Waiting for DeviceI cannot boot my system, there seems to be an issue with the file system.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 8

3. Press enter to continue

4. Some errors observed

1. Waiting for device /dev/sda2 to appear: ok

2. fsck: Error 2 while executing fsck.swap for /dev/sda2

3. fsck failed. Mounting root device read­only.

4. could not mount root filesystem – exiting to /bin/sh

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since the boot loader works, but init does not run, then the problem is narrowed to

the kernel or ram disk.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. Observe carefully the error messages seen on screen when the boot process failed.

1. Check on /dev/sda2 to see what it is and why it's not showing up.

2. Consider mounting it manually to see if it mounts read/write instead of read

only.

4. Load BIS, and investigate /dev and /dev/sda2

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

31

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 32: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

5. Use cat /proc/partitions to see if sda2 is valid, and fdisk -l to see what

kind of partition sda2 is.

6. sda2 is a swap partition, yet the OS was “Waiting for sda2 to appear.” This means it

was attempting to mount it as a file system, instead of just turning it on with

swapon.

7. Does the /etc/fstab show the swap mounted correctly? Yes.

8. Type mount and observe which device is the root device. (/dev/sda6)

9. GRUB tells the kernel where the root device is with the root= parameter. GRUB's

configuration file is /boot/grub/menu.lst.

10. Edit /etc/grub/menu.lst, and confirm root= is set properly.

11. Change the kernel parameter root= so that it points to the correct root partition,

instead of the swap partition. (ie root=/dev/sda6)

12. Save, exit and reboot

Task III: Root Cause1. The swap partition was used instead of the root partition in the /boot/grub/menu.lst

configuration

2. Does an Automatic Repair fix this scenario? Yes

Grub has a kernel command option allowing you to tell the kernel the location of the root partition. The location needs to be correct in order for the system to boot properly.

(End of Exercise)

32 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 33: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.9 Troubleshooting Exercise: GRUB PromptWhen I boot, it stops at the grub> prompt.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 9

3. Press enter to continue

4. Some errors observed

Grub prompt instead of the grub menu

GNU GRUB  version 0.94  (640K lower / 3072K upper memory)

 [ Minimal BASH­like line editing is supported.  For the first 

word, TABlists possible command completions.  Anywhere else TAB 

lists the possible completions of a device/filename. ]

grub>

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since you don't see the GRUB menu screen, but do get a GRUB prompt; this

means you reached stage2.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

33

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 34: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

3. Since you are at the GRUB prompt, can you boot manually without any options?

Yes. This means it's probably a configuration issue.

4. Try booting manually from the grub> prompt using the following method. Do not

use BIS or CIS.

grub> find /boot/vmlinuz

 (hd0,1)

grub> root (hd0,1)

 Filesystem type is reiserfs, partition type 0x83

grub> kernel /boot/vmlinuz

   [Linux­bzImage, setup=0x1400, size=0x176b27]

grub> initrd /boot/initrd

   [Linux­initrd @ 0x2a6000, 0x149abd bytes]

grub> boot

5. What configuration file does GRUB use to display it's default menu?

/boot/grub/menu.lst.

6. In this case the customer renamed the menu.lst file to menu.lst.old. You could

restore this file, but assuming you did not have a menu.lst file, recreate the

menu.lst for the purpose of this lab.

7. Recreate the menu.lst with yast bootloader, Other, Propose New

Configuration, OK

Task III: Root Cause1. Missing /boot/grub/menu.lst file.

2. Does an Automatic Repair fix this scenario? Yes

The grub> prompt means the second stage GRUB boot loader is working just fine, but it cannot find the default menu configuration file. The menu configuration file is /boot/grub/menu.lst. In this case the file itself was missing.

(End of Exercise)

34 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 35: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.10 Troubleshooting Exercise: Failed Run Level Services

When I boot the server, I see a bunch of messages on the screen, with a lot of failed runlevel services.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 10

3. Press enter to continue

4. Some errors observed

1. /etc/init.d/boot.d/S12boot.compliance: line 57: clear: command not found

2. Press any key to proceed with booting

3. /etc/init.d/boot.d/S13boot.klog: line 41: /var/log/boot.msg: No such file or

directory

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Press Shift-Up to observe other errors, looking for errors relating to mount, since it

returned a non-zero exit status. We need the mount command to mount the file

systems. This is a big red flag.

3. Some errors of interest:

1. mount: invalid option – 'o'

2. Try 'mount –help' for more information

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

35

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 36: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

3. INIT: Entering runlevel: 3 (Means init has started rc)

4. The first error we see before the problem occurs is:

1. System Boot Control: Running /etc/init.d/boot

2. Mounting procfs at /procmount: invalid option -- n

3. Try 'mount --help' for more information

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

5. On a working system, type which mount. The executable is located at

/bin/mount.

6. Reboot and use boot options: S to bypass all runlevels.

7. What happens when you type 'mount' or 'mount –help'?

8. Boot DVD1, Rescue System

9. Mount the root filesystem to /mnt using mount /dev/sda6 /mnt

10. Copy the rescue mode's /bin/mount into your failing system's /bin directory using

cp /bin/mount /mnt/bin

11. Reboot the system and reinstall the rpm that owns /bin/mount so it will pass rpm

validation. Run yast ­i util­linux

Task III: Root Cause1. Corrupted /bin/mount command

2. Does an Automatic Repair fix this scenario? No

The date command was accidentally copied over the top of the mount command causing the boot failure.

(End of Exercise)

36 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 37: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.11 Troubleshooting Exercise: Read-Only Root Filesystem

Some services fail to load at boot, and the root filesystem is read-only. Resolve the errors and boot normally.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 11

3. Press enter to continue

4. Some errors observed

1. mktemp: failed to create file via template `/tmp/keymap.XXXXXX` : Read-

only filesystem

2. Failed services in runlevel 3: random kbd

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Identify how far the boot process got.

2. You get to a login screen, but there are errors.

3. If you have not changed screens, you can use Shift-PgUp to see more boot

messages.

4. Try to find out where the errors first started.

5. The first error is after /etc/init.d/before.local.

6. Login and see what is in /etc/init.d/before.local.

7. Change to runlevel 1 and type root's password

8. Try moving /etc/init.d/before.local to /root

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

37

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 38: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

9. Remount root as read/write

1. mount -o rw,remount /

10. Run mv /etc/init.d/before.local /root

11. Reboot to test.

Task III: Root Cause1. The umount command was in /etc/init.d/before.local.

2. Does an Automatic Repair fix this scenario? No

Sometimes problems happen due to logical errors on the part of the administrator. The umount command in the /etc/init.d/before.local file had unmounted several file systems and caused some to become read-only.

(End of Exercise)

38 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 39: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.12 Troubleshooting Exercise: Missing Action FieldI can boot my computer, but cannot login. I get an error message about a missing action field.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 12

3. Press enter to continue

4. Some errors observed

1. INIT: /etc/inittab[50]: missing action field

2. INIT: no more processes left in this runlevel

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Do you get a login prompt? No.

3. How far did INIT get before failing?

1. You can see INIT ran rc because of message “Master Resource Control:

runlevel 3 has been reached”

2. This indicates the runlevel completed, otherwise it would say there were

skipped or failed services; so we are having a problem running the login

program.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

4. init is identifying the file and line number where it thinks the problem is. In this

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

39

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 40: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

example, you would check line 50 in the /etc/inittab file. Check your error

messages for your specific line number.

5. Use 'man inittab' to help you understand which is the “action” field, and what

should go in it, or compare it with a working system.

6. Load CIS, edit /etc/inittab and fill in the correct value for the missing action

field(s).

7. Change the /etc/inittab entries from this:

1:2345::/sbin/mingetty --noclear tty1

2:2345::/sbin/mingetty tty2

3:2345::/sbin/mingetty tty3

4:2345::/sbin/mingetty tty4

5:2345::/sbin/mingetty tty5

6:2345::/sbin/mingetty tty6

to this:

1:2345:respawn:/sbin/mingetty --noclear tty1

2:2345:respawn:/sbin/mingetty tty2

3:2345:respawn:/sbin/mingetty tty3

4:2345:respawn:/sbin/mingetty tty4

5:2345:respawn:/sbin/mingetty tty5

6:2345:respawn:/sbin/mingetty tty6

8. Reboot

Task III: Root Cause1. Missing action filed in /etc/inittab configuration

2. Does an Automatic Repair fix this scenario? No

The /etc/inittab requires the format id:runlevels:action:process . The action field was missing and needed a valid action based on inittab(5).

(End of Exercise)

40 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 41: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.13 Troubleshooting Exercise: GRUBWhen I boot, all I see on the screen is GRUB, and the server hangs.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 13

3. Press enter to continue

4. Some errors observed

1. GRUB GRUB Hard Disk Error

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since we saw the BIOS information on screen and then “GRUB”, we know that the

BIOS found and started executing the first stage boot loader from the MBR.

However, we could not progress from stage1 to stage2.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. Since BIS worked, and the problem seems to be with the boot loader, try

reinstalling the GRUB boot loader (ie grub-install), and reboot to test.

4. Reinstalling the boot loader fails, so we should verify the rpm that owns the stage1

and stage2 (ie rpm -Vf /boot/grub/stage{1,2})

5. Since the /boot/grub/stage{1,2} files get installed by grub-install, and they are not

owned by any rpm package, we need to determine how they got in the /boot/grub

directory to begin with.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

41

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 42: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

6. The most likely package to put the files in /boot/grub is the grub rpm itself. Since it

does not own the files, we could check it pre and post scripts to see if it touched the

files during the pre or post rpm installation.

1. Run rpm ­q ­­scripts grub | grep stage

7. The grub rpm did copy the stage{1,2} files from a different location, explaining

why it does not directly own /boot/grub/stage{1,2}

8. You could try copy the the *stage* files to /boot/grub list the rpm did to see if that

would help. cp /usr/lib/grub/*stage1* /boot/grub/

9. If that fails, the easiest thing to try while in BIS, is reinstalling the grub rpm. Run

yast ­i grub

10. Run grub-install

11. Reboot and retest

Task III: Root Cause1. Corrupted /boot/grub/stage1 file.

2. Does an Automatic Repair fix this scenario? No

The GRUB files in /boot/grub are not directly owned by the grub rpm package. As a result, an rpm verify did not identify the problem. Based on the location in the boot process, a grub package reinstall makes sense.

(End of Exercise)

42 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 43: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.14 Troubleshooting Exercise: Invalid Partition TableI get an invalid partition table error when booting.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 14

3. Press enter to continue

4. Some errors observed

1. Error No active partition

2. Operating System not found

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Did you see the boot loader menu, and successfully picked a kernel to boot? No.

BIS may still help, but the problem might be with the boot loader or the partition

table itself.

3. Since we saw the BIOS information on screen, and then the partition table error, it

probable that the BIOS cannot find the partition table in the MBR at all.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

4. Try BIS or Repair Installed System.

5. Repair Installed System is grayed out, because without a partition table, there is not

an installed system that install knows about.

6. At this point you need to either restore the partition table from backup (sfdisk),

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

43

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 44: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

attempt to recover the partition table from the disk (gpart) or reformat the disk and

restore data from backup.

7. To restore or recover the partition table, you need to boot to Rescue System,

Rescue login: root

1. Run cat /proc/partitions to see the disk and show there are no

partitions.

2. Restore the partition table backup file. -OR-

1. Copy the partition table backup file to the rescue system

2. Use sfdisk to restore the partition table

3. Recover the partition table from disk -OR-

1. Use gpart to recover the partition from disk

2. If you have a partition backup stored on the failed server, you might want

to restore it, once you recover enough with gpart to boot the server.

3. Try gpart ­W /dev/sda /dev/sda to attempt to recover the

partition table.

4. NOTE: gpart does not always work well with extended partitions.

4. Repeat the lab, but copy the /boot/backup_mbr and a supportconfig to another

server before starting the lab. Restore the MBR using both methods.

1. For the backup_mbr

1. dd if=backup_mbr of=/dev/sda

2. partprobe

3. fdisk /dev/sda, a, 1 to mark sda1 bootable

2. For supportconfig

1. Copy the sfdisk -d section in fs-diskio.txt to its own file called

partitions.

2. Run sfdisk /dev/sda < partitions

3. partprobe

8. Boot to Rescue System; Rescue login: root; Run dhcpcd eth0 for network

connection

9. Recovering partition table from disk --OR--

1. Boot from DVD, select Repair Installed System (NOTE: Selecting “Repair

Installed System” after selecting installation will fail due to the missing

partition table.)

2. Select Expert Tools, Recover Lost Partitions, Start

3. If this works then restore the /boot/backup_mbr and reboot

4. If it fails, try manual recovery

44 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 45: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

10. Restoring partition table from supportconfig backup --OR--

1. Create a partition backup file from supportconfig (You would have to have a

supportconfig tarball that was copied off the server for this to work.)

1. The partition table backup is stored at the bottom of the fs-diskio.txt file,

and looks like this:

#==[ Command ]======================================#

# /sbin/sfdisk -d

# partition table of /dev/sda

unit: sectors

/dev/sda1 : start= 63, size= 417627, Id=83, bootable

/dev/sda2 : start= 417690, size= 626535, Id=82

/dev/sda3 : start= 1044225, size= 3148740, Id= f

/dev/sda4 : start= 0, size= 0, Id= 0

/dev/sda5 : start= 1044288, size= 931707, Id=83

/dev/sda6 : start= 1976058, size= 1140552, Id=83

/dev/sda7 : start= 3116673, size= 1076292, Id=83

2. Copy the uncommented text from the relevant device. supportconfig gets a

backup of all disk devices. This example only shows one.

3. Create the partition backup file (ie part.txt) that looks like this:

unit: sectors

11.

/dev/sda1 : start= 63, size= 417627, Id=83, bootable

/dev/sda2 : start= 417690, size= 626535, Id=82

/dev/sda3 : start= 1044225, size= 3148740, Id= f

/dev/sda4 : start= 0, size= 0, Id= 0

/dev/sda5 : start= 1044288, size= 931707, Id=83

/dev/sda6 : start= 1976058, size= 1140552, Id=83

/dev/sda7 : start= 3116673, size= 1076292, Id=83

1. Copy part.txt from your backup server to the rescue system's /tmp directory.

scp <backup_ip>:/directory/part.txt /tmp

2. Restore the partition table with part.txt

sfdisk /dev/sda < /tmp/part.txt

3. Reboot

12. Manually Recovering --OR--

1. Boot Rescue System

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

45

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 46: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

2. Since we are familiar with the filesystem, we know sda1 was boot, but forgot

how big it was.

3. Run fdisk /dev/sda, n (New), p (Primary), 1, use the default beginning sector

number, +75M (make the partition 75M as a guess).

4. P lists the partitions. W (Write the partition table).

5. Cat /proc/partitions shows only one partition, if we got the beginning sector

number right, we should be able but mount the filesystem, even though the

device is smaller than the filesystem.

mount /dev/sda1 /mnt

6. Restore the backup_mbr and reread the partition table. Use cat /proc/partitions

to confirm.

dd if=/mnt/backup_mbr of=/dev/sda

partprobe

7. Use fdisk /dev/sda, a (toggle bootable flag), 1 (partition 1), w (write changes to

disk)

8. reboot

Task III: Root Cause1. Missing partition table

2. Does an Automatic Repair fix this scenario? No

When the partition table gets damaged, it does not necessarily mean the filesystems are damaged. Restoring the partition table may allow you to recover the filesystems and boot the server properly.

(End of Exercise)

46 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 47: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.15 Troubleshooting Exercise: Kernel PanicI get a kernel panic when I start my Linux host. It doesn't matter which kernel I boot from, I still get the error.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 15

3. Press enter to continue

4. Some errors observed

1. VFS: Cannot open root device “sda6” or unknown-block(0,0)

2. Please append a correct “root=” boot option

3. Kernel panic – not syncing: VFS: Unable to mount root fs or unknown-

block(0,0)

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table.

2. Write down the information on the screen verbatim at the point of boot failure.

3. The on screen error messages give us three clues to check: root=, sda6 and

mounting root.

4. Since we did not see “done” during boot, this issue is a good candidate for Boot

Installed System (BIS).

5. Since BIS worked, the problem most likely can be found in the bold section below:

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

6. Since we saw the boot loader menu during a normal boot, we can assume the boot

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

47

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 48: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

loader is fine as well, further narrowing the problem to the kernel or RAM disk.

This correlates with our clues (root= and sda6 relate to kernel) and (RAM disk to

initrd).

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

7. While in BIS, investigate the clues: root= (/boot/grub/menu.lst), sda6 (mount), and

RAM disk (/boot/initrd*)

1. Since root= in /boot/grub/menu.lst is set to root=/dev/sda6, and mount shows

that BIS choose /dev/sda6 as the root file system to load, we can eliminate

those two clues from the problem.

2. This leaves the RAM disk. Root must be mounted initially from the RAM disk.

It is then remounted once the system is up.

8. You can dump the content of the RAM disk as follows:

1. mkdir ­p /tmp/ramdisk

2. cd /tmp/ramdisk

3. zcat /boot/initrd | cpio ­ivd

4. Notice you get errors, instead of the contents of the initrd ramdisk.

9. Check to make sure /etc/sysconfig/kernel has all the drivers needed in the

INITRD_MODULES= variable to get to the root file system. If you don't know for

sure, just recreate the RAM disk anyway.

10. Recreate the RAM disk and try a reboot to retest.

1. Boot Installed System

2. Run mkinitrd

3. Reboot

Task III: Root Cause1. Corrupt initrd ram disk

2. Does an Automatic Repair fix this scenario? Yes

The kernel panic was caused by a corrupted ram disk (/boot/initrd-*). The BIS troubleshooting technique allows you to use the DVD's kernel and ram disk to boot the server and troubleshoot the issue. If BIS works, then the most common problem is a ram disk issue, which can usually be resolved by running mkinitrd. The next most common problem is GRUB. Reinstalling the GRUB boot loader usually resolves them.

(End of Exercise)

48 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 49: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.16 Troubleshooting Exercise: Error in Service ModuleThe server hard crashed due to power outage. Logging is as root fails with “Error in service module.”

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 16

3. Press enter to continue

4. Some errors observed

1. INIT: version 2.86 booting

2. INIT: cannot execute “/bin/sh”

3. INIT: Entering runlevel: 3

4. INIT: Id “3” respawning too fast: disabled for 5 minutes

5. INIT: no more processes left in this runlevel

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Notice the “INIT: Id '3' respawning too fast” and “no more 

processess left in this runlevel” messages. This indicates

/sbin/init attempted to execute runlevels, but could not. So, BIS is not an option.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. Try chroot Installed System (CIS)

4. CIS failed with the error message: “chroot: failed to run command 

'/bin/bash': No such file or directory.” The chroot command

needs to source the new directory's environment. If it cannot, this is a red flag that

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

49

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 50: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

needs to be resolved.

5. Change to /mnt/bin and look for bash. It's missing.

6. Try using the rescue system's bash to boot the server: cp /bin/bash 

/mnt/bin/

1. If this works, reboot and then reinstall the bash rpm to restore the correct

/bin/bash on the system. You might also want to tell the customer to verify

their other RPMs.

2. If it fails, you will have to manually copy the /bin/bash file from another

server.

Task III: Root Cause1. Missing /bin/bash

2. Does an Automatic Repair fix this scenario? No

Bash is the default shell and is used to start all the services on the server. Without, nothing works right.

(End of Exercise)

50 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 51: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.17 Troubleshooting Exercise: Fatal modules.dep ErrorI cannot boot my computer. I get an error message about the / (root) partition waiting to appear... not found exiting to /bin/sh. I also see a fatal modules.dep error.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 17

3. Press enter to continue

4. Some errors observed

1. FATAL: Could not load /lib/modules/3.0*/modules.dep: No such file or

directory

2. Waiting for /dev/sda6 to appear: ...Could not find /dev/sda6

3. Want me to fall back to /dev/sda6? (Y/n)

4. Waiting for /dev/sda6 to appear: ...not found – exiting to /bin/sh

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since the boot loader works, but init does not run, then the problem is narrowed to

the kernel or ram disk.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. Load BIS

4. Verify the kernel RPMs. They pass.

5. The easiest thing to try now is making a new RAM disk.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

51

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 52: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

1. Check /etc/sysconfig/kernel and make sure INITRD_MODULES is correct.

2. Run mkinitrd

6. mkinitrd does not give any errors, but seems to be shorter than you expect. Run

mkinitrd on your test system and compare the output.

7. mkinitrd is missing the Kernel Modules list included in the RAM disk.

8. This is the first solid lead to test. Run rpm ­V mkinitrd to validate the RPM.

9. It passes, but you still need to have the kernel modules included in the RAM disk

in order to boot the server. Try reinstalling the mkinitrd RPM (yast ­i 

mkinitrd) even though it passes RPM validation.

10. You could also run /sbin/mkinitrd_setup which is called by the RPM

install script. It would fix the broken links too and make mkinitrd work.

Task III: Root Cause1. Missing symlink for mkinitrd

2. Does an Automatic Repair fix this scenario? No

A good troubleshooting technique is to try reinstalling the rpm package associated with files you know are having a problem, even if they pass RPM validation. Maybe the RPM package scripts do something to resolve the issue.

(End of Exercise)

52 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 53: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.18 Troubleshooting Exercise: Another Kernel PanicI get a kernel panic right after attempting to mount root during the boot process.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 18

3. Press enter to continue

4. Some errors observed

1. Mounting root /dev/sda6

2. INIT: version 2.86 booting

3. Kernel panic – not syncing: Attempted to kill init!

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. You see the INIT: version 2.86 booting just before the kernel panic.

This means we paniced just as init was executing, suggesting init may have an

issue.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. Boot using the “boot options” parameter init=/bin/bash.

4. It fails. However, if you try “boot options” init=/bin/sash, which is the stand alone

shell, it works. Sash is a statically linked executable.

5. Since we are already at init in the boot process, this issue is not a good candidate

for BIS, but CIS. Try CIS.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

53

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 54: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

6. Check /bin/bash and /lib/libc.so.6 on the installed system.

7. Did you document any errors when attempting init=/bin/bash? Always document

what you do and the outcome. CIS fails with the same error as init=/bin/bash.

“/bin/bash: relocation error: /bin/bash: symbol access, 

version GLIBC_2.4 not defined in file libc.so.6 with 

link time reference”

8. Mount all additional filesystems as shown in /mnt/etc/fstab manually if you haven't

already, and try chroot /mnt again.

9. CIS still fails. However, the error message suggests we may have a problem with

bash or libc.so.6.

10. Try validating the RPM packages that own bash and libc.so.6.

11. Since you have mounted the installed file systems, you can also run rpm verify

against the mounted filesystems, without chrooting to it.

12. Verify which rpm package owns /bin/bash and /lib/libc.so.6, then verfiy those

packages.

1. rpm -qf -r /mnt /bin/bash

2. rpm -qf -r /mnt /lib/libc.so.6

3. rpm -Vr /mnt bash

4. rpm -Vr /mnt glibc

13. Notice that libc-2.11.3.so has been modified, but what does that have to do with

libc.so.6?

1. ls -l /mnt/lib/libc*

2. libc.so.6 is a symbolic link to libc-2.11.3.so

14. Update the damaged libc-2.11.3.so with a good one. The easiest way to fix the

problem is by doing a down server update from the DVD. Try this method.

1. Boot from DVD, Select Installation, Update an Existing System

2. Update an Existing System does not work, you get an error: “Switching 

to the installed system has failed”

15. You will have to manually update the glibc package.

1. There are two glibc rpms, i586 and i686. Make sure you update the correct

one!

2. rpm ­qi ­r /mnt glibc may show you which one you are using.

16. Boot into rescue mode, then mount the installation media.

17. Mount the installed system. This is the same as CIS, only do not do the final chroot

command.

18. Install the required glibc rpm from the installation media

54 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 55: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

19. Boot to rescue mode and mount all the filesystems as if you are chrooting the

installed system.

mount /dev/sda6 /mnt

mount /dev/sda1 /mnt/boot

mount /dev/sda5 /mnt/var

mount /dev/sda7 /mnt/usr

20. Run uname ­a to determine the glibc architecture to use (ie i686)

21. Mount the installation media

mount -o ro /dev/cdrom /media/cdrom

22. Install the rpm

rpm -Uvh --force -r /mnt /media/cdrom/suse/i686/glibc-2.*rpm

23. Reboot

Task III: Root Cause1. Damaged glibc library file.

2. Does an Automatic Repair fix this scenario? No

The glibc library is used with all dynamically linked executables. The first application to rely on the system's glibc libraries is /sbin/init, causing the kernel panic.

(End of Exercise)

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

55

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 56: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

3.19 Troubleshooting Exercise: Segmentation FaultIt seems like the server is hung or something. Every command I type gives me a segmentation fault.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 19

3. Press enter to continue

4. Some errors observed

1. Segmentation Fault

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. The first objective is to reboot the server. Since all commands seem to segfault,

you have two options.

1. Reset or power off the server and reboot

2. Use magic keys

1. echo s > /proc/sysrq-trigger # sync all filesystems

2. echo u > /proc/sysrq-trigger # remount filesystems read-only

3. echo b > /proc/sysrq-trigger # force a server reboot

2. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

3. We get a kernel panic at boot just after the INIT: version 2.86 booting,

message.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

56 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 57: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

4. Try chroot installed system (CIS).

5. CIS seg faults too. Manually mount all filesystems and chroot /mnt again. They all

still fail. Since chroot needs to source installed environment, we need to check

/mnt/bin/bash.

6. After mounting all installed filesystems, run rpm verify on bash: rpm ­qf 

­r /mnt bash

7. No bash RPM errors. Since bash is a dynamically linked executable, then it will

have library dependencies, which would also need to be checked.

8. Run ldd /mnt/bin/bash, to check for shared library dependencies.

9. Run an rpm verify on each shared library file that bash depends on. You need to

run rpm ­qfr /mnt /lib/libreadline.so.5 to find out to which RPM

it belongs, and repeat this process for each file listed in the ldd output for

/mnt/bin/bash.

10. libreadline5 and libncurses5 seem fine, but glibc says something is wrong with ld-

2.11.3.so.

11. Reinstall the glibc rpm

12. Boot from DVD1, Select Installation, New Installation, Custom Partitioning

13. Select System View/linux/Import Mount Points..., DESELECT Format system

volumes, Import

1. NOTE: If this fails, you will need to select the filesystem devices and mount

points manually.

14. Edit the swap partition and configure it to be mounted.

15. Accept, “Really keep the partition unformatted?” Yes.

16. Select the software patterns you originally had on the system. In this case, Base

and Minimal System, Accept.

17. Install

18. Go through the configuration phase, configuring the server as it was previous to the

install.

Task III: Root Cause1. Damaged glibc shared library

2. Does an Automatic Repair fix this scenario? No

This is another exercise where a glibc library was damaged. When glibc is damaged, it may bring into question the integrity of the server. This exercise demonstrates a method of reinstalling the existing installed system. The procedure can be referred to as “Install the Installed System” (IIS) method. You basically reinstall the OS WITHOUT formatting the

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

57

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 58: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

filesystems. This assumes the filesystems are intact.

(End of Exercise)

58 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 59: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.20 Troubleshooting Exercise: Respawning Too FastI cannot login to the server. I keep getting errors that id 1 is respawning too fast.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 20

3. Press enter to continue

4. Some errors observed

1. INIT: Id “1” respawning too fast: disabled for 5 minutes

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. You also see the “Master Resource Control: runlevel 3 has been: reached”. This

means /sbin/init finished boot and rc successfully, otherwise it would show

“skipped” services. So we have reached a failed login state.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

3. After the reboot, observe the error verbatim as it appears on the screen. The error

is, INIT: Id “1” respawning too fast: disabled for 5 

minutes. The same error occurs with the number ranging from 1 to 6.

4. Since we are too far along in the boot process, BIS will not work. However, it

appears that the runlevels do work. Try rebooting with “boot options”: S, for single

user admin mode.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

59

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 60: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

5. Since that worked, type init 1 to change to runlevel 1 to see if it works.

6. It works. Now you can troubleshoot from the administration runlevel 1.

7. The error message is /sbin/init identifying the specific /etc/inittab ID that is failing.

8. We should look in the /etc/inittab file for ID's 1 through 6 to see what application

init is respawning too fast.

9. The application is /sbin/mingetty. If you compare this to your working test system,

you would see that mingetty is the correct application that should be spawned.

Remember the first field is the ID field, and last one is the application.

1:2345:respawn:/sbin/mingetty --noclear tty1

10. rpm verify the package that owns /sbin/mingetty.

11. Reinstall the mingetty rpm package, because /sbin/mingetty had it's MD5 sum

changed since installation. Type init 3 to change to runlevel three to confirm

you can login.

1. Boot to runlevel 1

2. Mount the installation media

mount -o ro /dev/cdrom /mnt

3. Install the rpm

rpm -Uvh --force /mnt/suse/i586/mingetty-1*rpm

4. Run init 3

Task III: Root Cause1. Corrupted /sbin/mingetty login executible.

2. Does an Automatic Repair fix this scenario? No

The mingetty binary is used to get login credentials and validate them through the PAM stack. /sbin/init is responsible for running mingettty. Since mingetty was failing, init kept trying to restart it. /sbin/init detected too many restarts or respawning attempts and stopped trying.

(End of Exercise)

60 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 61: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.21 Troubleshooting Exercise: Booting to $ PromptThe server will only boot to the $ prompt

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 21

3. Press enter to continue

4. Some errors observed

1. could not mount root filesystem -- exiting to /bin/sh

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Copy the screen verbatim at the time of the failure, and look for additional

messages or errors.

3. Some interesting messages from bottom to top are:

1. umount: /dev: device is busy

2. mount: unknown filesystem type 'reiserfs'

3. modprobe: FATAL: Error inserting reiserfs (/lib/modules/2.6.16.21-0.8-

default/kernel/fs/reiserfs/reiserfs.ko): Unknown symbol in module, or

unknown parameter (see dmesg)

4. reiserfs: Unknown symbol vfs_check_on

5. reiserfs: Unknown symbol vfs_check_on_mount

4. The predominate theme seems to be something wrong with the reiserfs file

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

61

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 62: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

system driver. The unknown symbols may mean an outdated

modules.dep, or something may be wrong with the file system driver itself.

In any case, dmesg output is suggested for more information. So we

should look into these three areas for clues.

5. Did you see “done” scroll across the screen? No. Boot Installed System (BIS)

should work.

6. Did you see the boot loader menu, and successfully picked a kernel to boot? Yes.

The boot loader is probably fine, and the problem would seem to kernel/initrd

related.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

7. Confirm that the reiserfs driver is needed. Type mount. There are two things to

notice with this output: 1) reiser is used for one or more devices, and 2) the root “/”

partition is reiserfs. This means the RAMDISK needs to have this driver; requiring

a mkinitrd rebuild.

8. Backup the /lib/modules/$(uname -r)/modules.dep file. Run depmod -a to update

modules.dep. Use diff to check for any differences, and vimdiff to see the

differences. There are none.

9. To check the reiserfs driver file, the easiest way is to remember that all distributed

kernel drivers are owned by the kernel rpm. This is done with rpm ­V kernel­

default.

1. Notice that the reiserfs.ko and ext3.ko driver files have a modified MD5 sum

and time stamp. This is a big red flag.

10. Reinstall the kernel rpm and retest.

11. Boot installed system

12. The uname -r command shows a “pae” kernel type.

yast -i kernel-pae

Task III: Root Cause1. Corrupted file system driver ko files.

2. Does an Automatic Repair fix this scenario? Yes

All shipping kernel drivers come packaged in the kernel RPM package. If they are bad, reinstalling the kernel RPM package restore the good driver files.

(End of Exercise)

62 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 63: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.22 Troubleshooting Exercise: Server Hang at BootThe server hangs at boot time. There don't seem to be any messages or errors.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 22

3. Press enter to continue

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. The last messages you see that match the troubleshooting table are, “Loading

<module_name>” messages. However, we do not see the “INIT: version 2.86

booting” message. This indicates the problem may be with the ramdisk:init or

sbin:init.

3. Did you see “done” scroll across the screen? No. Boot Installed System (BIS)

should work.

4. Did you see the boot loader menu, and successfully picked a kernel to boot? Yes.

The boot loader is probably fine, and the problem would seem to kernel/initrd

related.

BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login

5. BIS still hangs, so the problem may be after the kernel/initrd. Try chroot Installed

System (CIS).

6. CIS worked, which means /bin/bash and glibc are probably fine.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

63

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 64: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

7. Perform a basic server health check, specifically disk space on root “/”.

1. http://www.novell.com/communities/node/4097/basic-server-health-check-

supportconfig

8. Since the problem may be related to the ramdisk:init, and mkinitrd creates that file

in the initrd, you should try recreating the ramdisk.

9. mkinitrd did not show any errors, and a reboot test shows the server still hangs.

10. CIS again and verify the troubleshooting table's associated files for sbin:init.

11. Running rpm ­Vf /sbin/init shows the MD5 sum has changed. This is a

red flag and must be resolved before troubleshooting further. Run rpm ­qf 

/sbin/init to see which rpm needs to be reinstalled.

12. Reinstall the sysvinit rpm and reboot to test.

1. Chroot Installed System

2. Mount the DVD

mount /dev/cdrom /mnt

3. Reinstall the rpm

rpm -Uvh --force /mnt/suse/i586/sysvinit*rpm

4. Reboot

Task III: Root Cause1. Corrupted /sbin/init

2. Does an Automatic Repair fix this scenario? No

The troubleshooting table helps narrow down where in the boot process a failure is occuring. Once known, CIS was used because BIS would continue to use the /sbin/init that was bad. The RPM package that owned /sbin/init needed to be replaced. Updating the server would also fix the problem if a new sysvinit RPM package was available.

(End of Exercise)

64 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 65: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.23 Troubleshooting Exercise: Power OffThe server turns itself off and never comes up completely. Boot the server normally and determine the root cause for the server hang.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 23

3. Press enter to continue

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Since init 1 worked, then the problem is with one of the services that exist in

runlevel 3, but do not exist in runlevel 1.

3. Compare /etc/init.d/rc1.d/S* with /etc/init.d/rc3.d/S*; these are the services that are

Started in that runlevel.

4. Look at the /var/log/boot.omsg for clues. The boot.omsg is the “old” boot.msg file

(ie the previous boot).

5. Run an rpm -Vf /etc/init.d/<service>, for each service start script that exists in

runlevel 3, but does not exit in runlevel 1.

6. Since more than one service has changed, you could reinstall all the affected rpms,

or try an narrow it down further by stepping through the boot process.

7. Edit /etc/sysconfig/boot, change to PROMPT_FOR_CONFIRM=”yes”,

RUN_PARALLEL=”no” and FLOW_CONTROL=”yes”. Reboot and watch for

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

65

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 66: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

the first sign that the server is shutting down or powers off.

8. You will have a couple of seconds to respond either Y or N to load the service. The

prompt will be Start service <name>? (Y)es/(N)o/(C)ontinue)

[y]. Quickly note the name and then press enter to load the service. What for

anything unusual for each service, but remember you only have a couple of

seconds to respond to the next load prompt. If you don't respond in time, all the rest

of the services will load automatically.

9. As soon as microcode.ctl runs, we see Start service purge­kernels, 

(Y)es/(N)o/(C)ontinue? [y], but it does not prompt to load and we then

immediately see “INIT: Sending processes the KILL signal” and the server is

already in its shutdown procedure. Press Ctrl-S to pause the shutdown process long

enough to see the messages on the screen. This is what FLOW_CONTROL was

for. When you are done, press Ctrl-Q to continue.

10. Boot to runlevel 1 again and look at /etc/init.d/rc3.d/*microcode.ctl and the scripts

that follow it.

11. Run an rpm verify on the scripts that follow microcode.ctl. Make sure you verify

the scripts in the /etc/init.d directory and not the /etc/inti.d/rc3.d directory (these

are not owned by any rpm package, but created by insserv).

12. Notice that /etc/init.d/syslog has changed since installation. Reinstall the rpm that

installed /etc/init.d/syslog

13. The reinstall may have failed, try mv /etc/init.d/syslog 

/etc/init.d/syslog.old, and then reinstall the klogd rpm.

14. Boot to runlevel 1

15. Mount the DVD, mount /dev/cdrom /mnt

16. mv /etc/init.d/syslog /etc/init.d/syslog.old

17. rpm -Uvh --force /mnt/suse/i586/klogd-*.rpm

18. init 3

Task III: Root Cause1. Logic error in customized /etc/init.d/syslog service

2. Does an Automatic Repair fix this scenario? No

The value of this lab is to learn how to use PROMPT_FOR_CONFIRM and FLOW_CONTROL. These are valuable troubleshooting tools for problems relating to a boot service.

66 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 67: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

(End of Exercise)

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

67

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 68: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

3.24 Troubleshooting Exercise: Critical DataThe server is hung, but the customer must have access to their critical data. Fix the server and make sure there are 100 critical files and 200 important files on /data. The customer must have these files, and there is no backup.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 24

3. Press enter to continue

4. Some errors observed

1. The server is hung

2. fsck failed for at least one filesystem (not /).

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Make sure you look at all the errors on the screen where the boot failed, and then

type root's password for maintenance mode.

3. Errors on the screen include:

1. reiserfs_open: the reiserfs superblock cannot be found on /dev/sdb1

2. you need to run this utility with --rebuild-sb

3. Reiserfs super block in block 16 on 0x805 of format 3.6 with standard journal

4. This indicates some bad damage to the /dev/sdb1 device and filesystem. You

should ask the customer if they have a good backup.

68 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 69: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

5. Since this is there data volume, you can comment out the /dev/sdb1 device from

/etc/fstab and type exit to reboot normally. The server should come up just fine, but

without their data volume mounted.

6. If there is no backup and /data files are critical, then STOP now. The customer

needs to send the drive to a third party data recovery service (i.e. Ontrack or

DriveSavers) to get the files back. If they cannot afford very expensive option to

recovery the disk, then we can move forward: Run reiserfsck --check /dev/sdb1

7. The fsck says a --rebuild-sb is required because the super block is gone. Run

reiserfsck --rebuild-sb /dev/sdb1

8. The reiser filesystem does not create copies of the super block throughout the

filesystem, you just have to attempt to rebuild it. Read the output, but generally

you can assume the defaults. The error message said it was version 3.6. Try to

rebuild the super block successfully. If you succeed, you will be able to run a

normal reiserfsck --check /dev/sdb1

9. The check says you need to rebuild the tree. The chances of successfully

recovering data, let alone the filesystem has dropped from 30% to about 5% or

less. This message means serious damage has occurred to the filesystem.

Run: reiserfsck --rebuild-tree /dev/sdb1

10. Run a --check until it comes back with “No corruptions found.”

11. Try mounting the filesystem. If it mounts, reboot by typing exit. If it fails, restore

from backup.

12. You can look in the /data/lost+found directory for any files that were recovered.

These files can be renamed back into their original location if you know the

original filename. The original file names are lost.

13. Since this is a data volume, restore the files from backup.

14. If no backup exists, try to rename the files in lost+found to their original location.

Yes you have to go through them one at a time.

15. If the data was critical, you should not try anything recovery or fsck options, but

immediately refer to the customer to a third party data recovery service like:

Ontrack or DriveSavers.

Task III: Root Cause1. Corrupted reiser filesystem with superblock lost

2. Does an Automatic Repair fix this scenario? No

Damaged filesystems happen. Many times there is not a lot you can do about it other than

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

69

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 70: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

try the fsck options.

(End of Exercise)

70 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 71: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.25 Troubleshooting Exercise: Kernel Panic After Disk Change

We changed some disks on the server and now the kernel panics.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 25

3. Press enter to continue

4. Some errors observed

1. Kernel panic – not syncing: Attempted to kill init!

2. No init found. Try passing init= option to the kernel.

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Make sure you look at all the errors on the screen where the boot failed.

3. Important messages on the screen, in addition to those listed above, include:

1. Mounting root /dev/sda5

2. /dev/sda5: clean

4. The system thinks /dev/sda5 is the root filesystem. Where is “init” found?

/sbin/init. Make sure /sbin/init exists on /dev/sda5.

5. Try boot installed system

6. chroot installed system

7. When you mount /dev/sda5 in rescue mode, it does not even have a /root or /sbin

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

71

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 72: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

directory. This is is not the root filesystem.

8. As you explore, you find /dev/sda6 is the correct root filesystem.

9. Correct the root= option in /boot/grub/menu.lst to point to the correct root

filesystem and retest.

10. Correct the /etc/fstab to use the correct root filesystem and retest.

11. Once menu.lst and fstab are fixed, run mkinitrd to create a new ram disk with the

updated root filesystem information and retest.

12. Chroot Installed System

13. sed -i -e 's!/dev/sda5!/dev/sda6!g' /boot/grub/menu.lst

14. sed -i -e 's!/dev/sda5!/dev/sda6!g' /etc/fstab

15. mkinitrd

16. reboot

Task III: Root Cause1. The root filesystem device changed.

2. Does an Automatic Repair fix this scenario? Partially

1. df still lists /dev/sda5 as the root device.

When the root device changes, more than one file needs to be updated. The exercise is typical of systems installed onto a single local disk, and later adding SAN devices.

(End of Exercise)

72 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 73: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.26 Troubleshooting Exercise: Command Not FoundSome commands are not found that should be found, and I get an input/output error on /usr/bin. After rebooting, my server won't come up. Boot the server successfully and find the missing commands.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 26

3. Press enter to continue

4. Some errors observed

1. command­not­found lsof (others include gc, lpr, prune)

2. fsck failed for at least one filesystem (not /).

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Make sure you look at all the errors on the screen where the boot failed, and then

type root's password for maintenance mode.

3. Errors on the screen include:

1. fsck.ext3: Bad magic number in super-block while trying to open /dev/sda7

2. you might try running e2fsck with an alternate superblock: e2fsck -b 8193

<device>

4. This indicates significant damage to /dev/sda7. There is about a 30% chance we

will recover this filesystem. Ask if there is a backup. The /usr directory contains

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

73

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 74: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

most of the installed files for the system. A reinstall is probable.

5. Type root's password for maintenance, and run e2fsck /dev/sda7. It suggests using

an alternate superblock. Run mke2fs ­n /dev/sda7 to determine the location

of the alternate superblocks. The -n is a non-destructive option.

6. Since the superblock is at the beginning of the disk, start by using the last

alternates first. For example, run e2fsck ­y ­b 294912 /dev/sda7.

Repeat for each superblock listed in the mke2fs -n /dev/sda7 output until it works

or you run out of superblocks.

7. Run e2fsck -f /dev/sda7 to force another fsck on the filesystem.

8. Try mounting the filesytem: mount /dev/sda7 /usr

9. List the files in /usr and /usr/lost+found. Try rebooting by typing exit from

maintenance mode.

10. Do you trust this system to run properly?

11. Restore from backup -OR- Reinstall the OS

Task III: Root Cause1. Corrupted ext3 filesystem /usr on /dev/sda7

2. Does an Automatic Repair fix this scenario? No

Some issues just cannot be fixed and a reinstall of the OS is the best course of action.

(End of Exercise)

74 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 75: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.27 Troubleshooting Exercise: Waiting for Device after LUN

We attached a LUN from the SAN to the server. The boot process keep asking me to fall back to a different device. It works, but the system shows the wrong device. Will this hurt the server or fail to work properly? Make sure the server boots without falling back, and the correct root device shows up in the mount and df commands.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 27

3. Press enter to continue

4. Some errors observed

1. Waiting for device /dev/sdd6 to appear: ..Could not find /dev/sdd6

2. df -h shows /dev/sdd6 mounted to root, but that is the device that could not be

found.

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below. **NOTE** Change steps for non-existent device.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Make sure you look at all the errors on the screen where the boot failed.

3. Important messages on the screen, in addition to those listed above, include:

1. Mounting root /dev/sda5

2. /dev/sda5: clean

4. The system thinks /dev/sda5 is the root filesystem. Where is “init” found?

/sbin/init. Make sure /sbin/init exists on /dev/sda5.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

75

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 76: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

5. Try boot installed system

6. chroot installed system

7. When you mount /dev/sda5 in rescue mode, it does not even have a /root or /sbin

directory. This is not the root filesystem.

8. As you explore, you find /dev/sda6 is the correct root filesystem.

9. Correct the root= option in /boot/grub/menu.lst to point to the correct root

filesystem and retest.

10. Correct the /etc/fstab to use the correct root filesystem and retest.

11. Once menu.lst and fstab are fixed, run mkinitrd to create a new ram disk with the

updated root filesystem information and retest.

12. Chroot Installed System

13. sed -i -e 's!/dev/sdd6!/dev/sda6!g' /boot/grub/menu.lst

14. sed -i -e 's!/dev/sdd6!/dev/sda6!g' /etc/fstab

15. mkinitrd

16. reboot

Task III: Root Cause1. The root filesystem device changed to a non-existent device.

2. Does an Automatic Repair fix this scenario? Partially

1. df still lists /dev/sdd6 as the root device.

(End of Exercise)

76 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 77: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

3.28 Troubleshooting Exercise: Not Booting After Power Failure

We think we've fixed all our hardware problems, but the server still won't boot. Fix the problems so the server boots properly.

Objectives:Task I: ConfigurationTask II: Troubleshooting ProcedureTask III: Root Cause

Special Instructions and Notes:

None

Task I: ConfigurationConfigures the virtual machine for the assigned lab exercise.

1. Revert your snapshot

2. Run bplab 28

3. Press enter to continue

4. Some errors observed

1. fsck failed for at least one filesystem (not /).

Task II: Troubleshooting ProcedureTry to resolve the issue without looking at the troubleshooting procedure, otherwise follow the troubleshooting steps below.

1. Find the last on-screen landmark that matches the troubleshooting table. Follow the

“Troubleshooting/Potential Fixes”.

2. Make sure you look at all the errors on the screen where the boot failed, and then

type root's password for maintenance mode.

3. Errors on the screen include:

1. fsck.ext3: Bad magic number in super-block while trying to open /dev/sda5

2. you might try running e2fsck with an alternate superblock: e2fsck -b 8193

<device>

4. This indicates significant damage to /dev/sda5. There is about a 30% chance we

will recover this filesystem. Ask if there is a backup. The /usr directory contains

most of the installed files for the system. A reinstall is probable.

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

77

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 78: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

SUSE Advanced Troubleshooting: The Boot Process

5. Type root's password for maintenance, and run e2fsck /dev/sda5. It suggests using

an alternate superblock. Run mke2fs ­n /dev/sda5 to determine the location

of the alternate superblocks. The -n is a non-destructive option.

6. Since the superblock is at the beginning of the disk, start by using the last

alternates first. For example, run e2fsck ­y ­b 294912 /dev/sda5.

Repeat for each superblock listed in the mke2fs -n /dev/sda5 output until it works

or you run out of superblocks.

7. Notice that all of the alternate superblocks failed.

8. This means the geometry has changed or the partition table is messed up.

9. You could try restoring the partition table from backup and seeing if that helps.

10. Restore from backup -OR- Reinstall the OS

Task III: Root Cause1. Corrupted partition table or disk geometry

2. Does an Automatic Repair fix this scenario? No

Disk geometry problems generally must be fixed with a restore from backup or reinstall of the operating system. You can sometimes recover the partition table using gpart.

(End of Exercise)

78 Copying all or part of this manual, or distributing such copies, is strictly prohibited. To report suspected copying, please call 1-800-PIRATES

Version 1

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.

Page 79: SUSE Advanced Troubleshooting: The Boot Process Lab · responsibility for your failure to obtain any necessary export approvals. This Novell Training Manual is published solely to

Troubleshooting Exercises

Version 1 Copying all or part of this manual, or distributing such copies, is strictlyprohibited. To report suspected copying, please call 1-800-PIRATES

79

Novell, Inc. Copyright 2012-ATT LIVE-1-HARDCOPY PERMITTED. NO OTHER PRINTING, COPYING, OR DISTRIBUTION ALLOWED.