Thursday, November 1, 2012

Using Linux to Find Bad Sector Data

Most of us have probably encountered the dreaded NTFS monster.  It's usually easy to fix with chkdsk /f /r or chkntfs.  However, there is one thing that these utilities can NOT fix: bad sectors.

Enter Linux.

If you have a Windows boot disk or WinRE partition, there's a good chance that you can restore a system to full working order if your hard drive doesn't have bad sectors.  However, if you start seeing things like this during a chkdsk:



...then there's a very good chance that you're looking at some hardware level corruption.  In addition to the Blue Screen Of Death (BSOD), I noticed that plugging the HDD into another computer generated a S.M.A.R.T. error!  I could still access data on the drive by mounting it in Linux:



If you don't have the ability to just arbitrarily connect a random HDD to your computer, you could always opt to create a backup over the network.  To accomplish this, Clonezilla is a solid contender.  But, to do it over the network, you'll need a method.  PartedMagic comes out on top by packaging Clonezilla, along with a variety of backup and restore methods.  There are a number of ways you can backup and restore on the network: SSH, CIFS, NFS are just a few to name.  I prefer SSH since it's secure connection is bar-none and the encryption overhead doesn't drag the speed of the connection down too far.  If I was on a firewalled LAN, I'd probably go with CIFS.  However, SSH is ideal for internet-based backup and restores.  It also plays nice with Windows and BSD systems.

Of course, with all backups comes with it the risk of data corruption:



So, what to do?  Well, that's why I have decided to open my case up and hook it up to a free SATA port.  Thankfully, I'm not watching any movies at the moment, so I can disconnect my DVD drive and hook it up that way:



This allows the aforementioned ability to directly mount it.  Now, with that setup, let's see if we can actually find out WHAT files are sitting on WHICH bad sectors, shall we?  First of all, let's see what there is to work with and see what's actually taking up space.  Then, let's look at some important directories:



Hmmm, well the Boot directory looks okay.  The others seem to be doing okay up through MSOCache.  What about ProgramData?



OH!  What's this?  Core dumped?!  When was the last time you saw the du command dump a core?  And did it flush...?  Well, there seems to be some kind of irony to all of this... There appears to be 9 of the original 11 reported bad sectors here.  That means that there are 2 more bad sectors elsewhere on the disk.  I think it's safe to say that the restoration process won't be losing any seriously important data this time around!  Of course, there's more than one way to break a window:



After manually checking the rest of the disk, I could not find the other 2 bad sectors.  It's entirely possible that they were remapped at some point.  Perhaps the failing of S.M.A.R.T. had something to do with it.  Perhaps I just don't know.  That's the risk of backup and recovery, you just never know what's going to happen.  In all likelihood, they are probably buried in the ProgramData directory, but du bottomed out before it got to them.  It would take some digging around to find out the exact spots... that's really beyond the scope of this article.

Of course, if you really wanted to get fancy, you could install smartmontools and get some diagnostic information straight from the drive:



And if you ask it REALLY nicely, it will give you some human-readable content:



Whoops, less than 24 hours left!  I better get a move on...

Addendum: It should be noted that, if you had bad sectors but the S.M.A.R.T. tools says that your disk is fine, then perhaps you should look into shrinking the NTFS partition(s) to avoid the bad sectors.

No comments:

Post a Comment