Hack 99 Recovery Roadmap

figs/moderate.gif figs/hack99.gif

When it comes to troubleshooting startup problems, finding the right tool for the job is the key.

Would you try to crack a walnut with a bulldozer? Or pry open a door with a toothpick? Every tool has its purpose, and using the right tool for the job gets the job done quick and easy. The same is true concerning the maze of troubleshooting options available for restoring Windows 2000 and Windows Server 2003 systems that fail on startup. Safe Mode, Last Known Good Configuration, Emergency Repair Disk, Recovery Console, Automated System Recovery, Windows Startup Disk?which should you use and in which situations? This hack helps you get your toolbox in order by answering that question.

Windows 2000

I'll start with Windows 2000 and then highlight differences in troubleshooting issues on the newer Windows Server 2003 platform. Obviously, we won't be able to cover every possible scenario or even the intricate details of specific situations, but if you follow the procedures outlined in this hack, you should be able to get started and figure the rest out yourself, with the help of various Knowledge Base articles on Microsoft Product Support Services (http://support.microsoft.com).

To make things crystal clear, here's the big picture, right from the start:

  1. If the system won't boot, boot with the Last Known Good Configuration.

  2. If that fails or isn't an option, try booting into Safe Mode or one of its variants.

  3. If that fails or isn't an option, try using the Recovery Console together with a Windows Startup Disk to repair your machine.

  4. If that fails or isn't an option, try the Emergency Repair Process to repair your machine.

  5. If that fails, you'll probably have to completely rebuild your machine from tape backup media.

There are exceptions to this procedure, based on possible knowledge you have of what might be wrong with your machine, and we'll talk about that later. But first, let's unpack these steps one at a time.

Last Known Good Configuration

When you press F8 during the startup process (or when you see the "Please select the operating system to start" message, if you have the Recovery Console installed), the Windows Advanced Options Menu is displayed. One of the options on this menu is Last Known Good Configuration, which uses the Registry settings that Windows used for its last successful logon. Every time you boot Windows and log on successfully, this information is updated in the Registry and becomes your next version of Last Known Good Configuration. This applies only to normal mode; logging on to Safe Mode successfully does not update your Last Known Good Configuration settings.

Digging a little deeper, when you use Last Known Good Configuration, Windows restores the HKLM\SYSTEM\CurrentControlSet Registry settings from a previous set of settings, such as ControlSet001 or ControlSet002. In addition, Last Known Good Configuration also rolls back the device drivers used by your system to those that loaded during your last successful logon. However, Last Know Good Configuration cannot be used to restore missing or corrupt operating-system files.

When should you use Last Known Good Configuration to recover your system? Choosing this option overwrites any Registry changes or device-driver configuration changes you made during your last successful logon session and restores your system to the previous logon session's configuration. In other words, you lose any configuration changes made and any updated drivers installed since the last successful logon to your system. As a result, you should use this tool only if you think that some configuration change you recently made or a device driver you updated might be causing your system to fail upon startup. Typically, a problem like this will cause a STOP error (blue screen) of some sort, and the message on the screen might give you a clue about which driver or service might be causing the failure.

So, the moral of the story is, if you change something, reboot, and your system won't start, try Last Known Good Configuration to restore your system.

Safe Mode

If you think your problem isn't due to a recent misconfiguration error on your part and you haven't updated any device drivers lately, then try Safe Mode if your system won't start. Safe Mode lets you start your system using a minimal set of device drivers and services. This allows you to get to a logon screen and start using Windows to look for what might be wrong. Safe Mode can also be accessed by using the Advanced Options menu by pressing F8. There are three versions you can use: Safe Mode, Safe Mode with Networking, and Safe Mode with Command Prompt.

I suggest you always try Safe Mode with Networking, because might may need to access your Windows installation files on a network distribution point to repair your server?for example, by extracting driver files from a .cab file. If Safe Mode with Networking fails, try Safe Mode; if that works, then something might be wrong with your network card or networking subsystem settings. If that fails, try Safe Mode with Command Prompt so that you can at least get to the Windows command-line troubleshooting tools to look for what's wrong with your system.

You can log on to Safe Mode by using either a domain administrator account or the local administrator account. On a domain controller, the local administrator account is the only local account present on the machine and is stored in a minimal version of the SAM database found on member servers and workstations. This is also the account used to run the Recovery Console, and it uses the password you specified when you first installed Windows (unless it's been changed).

When should you use Safe Mode and its variants? Usually, you might try this if you recently installed new hardware or software on your machine, not necessarily during the most recent logon session. Once you're logged on in Safe Mode, you can start disabling hardware devices one at a time until you find exactly which device is causing the problem. Or, if the issue is software-related, you can try to reconfigure or even uninstall different applications to isolate the problem and then see if you can reboot in Normal Mode.

Some of the Windows tools you might use in Safe Mode to try to determine the cause of startup failure include Event Viewer, System Information (StartRunmsinfo32), Device Manager, and so on. Also, successfully booting into any version of Safe Mode creates a log file named Ntbtlog.txt (found in %SystemRoot%) that describes the services started and the drivers loaded during startup. By examining this list, you might be able to determine which failed service or missing/corrupt driver is preventing Windows from starting, and then you can use the GUI tools to fix your problem.

Recovery Console

The Recovery Console is a command-line interface that you can start either by selecting it from the Boot Loader menu (if you've previously installed the Recovery Console on your server) or directly from the product CD. You must log onto the Recovery Console using the local administrator account on your machine, even if it's a domain controller. Recovery Console provides you with a minimal version of Windows that lets you run various commands to perform tasks such as copying and replacing system files, enabling or disabling problem services, repairing a boot sector, or even reformatting your drive.

Best practice is to install the Recovery Console on your machine before you need it. That way, you won't be running around looking for your product CD when a disaster occurs and you can't start your system. To install the Recovery Console on a machine, insert the product CD, open a command prompt, change to the I386 folder on the CD, and type winnt32 /cmdcons. If you haven't installed the Console and need to run it directly from the CD, insert the CD and select the Repair option.

Windows Startup Disk

Sometimes, Windows won't boot because the boot sector is damaged on your system volume or a virus has infected your master boot record. A Windows Startup Disk can be extremely handy in such circumstances. The name of the disk is a bit of a misnomer, because you can't start Windows from the disk itself. Rather, the disk can be used in conjunction with the Recovery Console to repair certain kinds of problems that might arise.

But first, here's how to create one of these disks so that you'll have it ready when you need it. Stick a blank floppy disk into your machine and double-click on My Computer. Right-click on your A: drive and select Format. Check the option for Quick Format and click Start to format your disk. Now, double-click on the C: drive to open it in My Computer, select ToolsFolder Options, and on the View tab clear the checkbox labeled "Hide protected operating system files." Drag and drop the boot.ini, ntldr, and ntdetect.com files from the root of the C: drive to your floppy, and include the Bootsect.dos and Ntbootdd.sys files if these are also present. Open a command prompt window and type attrib -h -s -r a:\*.* to set the attributes properly for the files on your floppy. Eject the floppy and label it Windows Startup Disk or something similar. Finally, hide your protected system files again in My Computer so that you don't accidentally try to delete any of them.

Now, if your system won't start because of a damaged master boot record, a corrupt boot sector, or missing or corrupt system files such as Ntldr or Ntdetect.com, you can start the system by using the Recovery Console (from the product CD if necessary), insert your Windows Startup Disk, and copy the files you need from the floppy to your C: drive. Then, you can run other Recovery Console commands to repair the boot sector (using the fixboot command), repair the master boot record (using the fixmbr command), and so on, until you have a working system that will start.

Emergency Repair Process

The only other thing you can usually try (short of reinstalling Windows from scratch) is the Emergency Repair Process. This feature of Windows 2000 is basically a holdover from Windows NT. The Emergency Repair Disk (ERD) itself isn't as useful (since a floppy can't contain the whole Windows 2000 Registry) as the other actions that are performed by Windows when you create this disk. In particular, when you create an ERD by starting the Backup utility and selecting ToolsCreate an Emergency Repair Disk, be sure to select the "Also backup the Registry to the repair directory" option, which backs up all your Registry hives to the %SystemRoot%\ Repair folder. Then, when you need to repair the Registry or replace missing or damaged files on your machine, you can press L when the startup process asks you for your ERD floppy. Doing so will ignore the floppy and use the information in the Repair directory instead. Of course, if your boot volume is badly damaged, your Repair folder might be corrupt or missing, in which case you will likely have to reinstall Windows from scratch anyway.

The repair process finds the boot.ini file, reads the ARC paths to the operating system, and then attempts to load the %systemroot%\System32\Config\Software Registry hive. If the boot.ini file is corrupt or missing or if the Software hive is corrupt, the repair fails (unless you actually do have an ERD handy, in which case the repair process can gain access to a working copy of the Setup.log file for your machine). Hopefully, you updated your EFD the last time you reconfigured your machine or installed new hardware on it to keep it current.

Windows Server 2003

Things are pretty much the same in Windows Server 2003, except for one major difference: the ERD of Windows 2000 has been replaced by the new Automated System Recovery (ASR) feature of Windows Server 2003 [Hack #98]. This new ASR feature is a powerful tool of last resort for restoring your system when everything else fails. ASR includes the functionality of ERD and much, much more and can really save your bacon in an emergency when your server won't start and everything else (Last Known Good Configuration, Safe Mode, Recovery Console) fails.

Right Tool for the Job

Finally, here's a list of the proper tools to use when any of these common issues prevent your system from starting:

A configuration change made during the last logon session

Try Last Known Good Configuration and reconfigure accordingly.

A device driver updated during the last logon session

Try Last Known Good Configuration and try a different driver.

Server misconfiguration

Try Safe Mode and reconfigure accordingly.

A newly installed device

Try Safe Mode and disable, reconfigure, or uninstall the device.

A newly installed application

Try Safe Mode and uninstall the application.

A newly installed hotfix or service pack

Try Safe Mode and uninstall the hotfix or service pack.

A problem service that prevents Windows from starting

Try using the Recovery Console to reconfigure or disable the service.

A corrupted boot sector or master boot sector

Try using the Recovery Console and repairing the problem.

A missing or corrupt system file

Try using the Recovery Console and copying the file from a Windows Startup Disk or from the installation files on CD or a distribution point.

Registry corruption

Try using the Emergency Repair Process to restore the Registry or restore the System State from tape backup media using Safe Mode.

Massive corruption or loss of system files or the Registry

Try the Automated System Recovery (ASR) feature of Windows Server 2003 if you previously created an ASR backup set; otherwise, rebuild your server from scratch by reinstalling Windows.