In line with my SysRq Post comes another bit of assumed knowledge, SMART. Let’s begin at the beginning (and stick to the PC world).

Magnetic storage is fault-prone. In the old days, when you formatted a drive, part of the formatting process was to check that each block seemed to be able to store data. All the bad blocks would be listed in a “bad block list” and the file-system would never try to use them. File-system checks would also be able to mark blocks as being bad.

As disks got bigger, this meant that formatting could take hours. Drives had also become fancy enough that they could manage bad blocks themselves, and so a shift occured. Disks were shipped with some extra spare space that the computer can’t see. Should the drive controller detect that a block went bad (they had parity checks), it could re-allocate a block from the extra space to stand in for the bad block. If it was able to recover the data from the bad block, this could be totally transparent to the file-system, if not the file-system would see a read-error and have to handle it.

This is where we are today. File-systems still support the concept of bad blocks, but in practice they only occur when a disk runs out of spare blocks.

This came with a problem, how would you know if a disk was doing ok or not? Well a standard was created called SMART. This allows you to talk to the drive controller and (amongst other things) find out the state of the disk. On Linux, we do this via the package smartmontools.

Why is this useful? Well you can ask the disk to run a variety of tests (including a full bad block scan), these are useful for RMAing a bad drive with minimum hassle. You can also get the drive’s error-log which can give you some indication of it’s reliability. You can see it’s temperature, age, and Serial Number (useful when you have to know which drive to unplug). But, most importantly, you can find out the state of bad sectors. How many sectors does the drive think are bad, and how many has it reallocated.

Why is that useful?

In the event of a bad block, you can manually force a re-allocation. This way it happens under your terms, and you’ll know exactly what got corrupted.

Next, Google published a paper linking non-zero bad sector values to drive failure. Do you really want be trusting known-non-trustworthy drives with critical data?

Finally, there is a nasty RAID situation. If you have a RAID-5 array with say 6 drives in it and one fails either the RAID system will automatically select a spare drive (if it has one), or you’ll have to replace it. The system will then re-build on the new disk, reading every sector on all the other disks, to calculate the sector contents for the new disk. If one of those reads fails (bad sector) you’ll now be up shit-creek without a paddle. The RAID system will kick out the disk with the read failure, and you’ll have a RAID-5 array with two bad disks in it — one more than RAID-5 can handle. There are tricks to get such a RAID-5 array back online, and I’ve done it, but you will have corruption, and it’s risky as hell.

So, before you go replacing RAID-5 member-disks, check the SMART status of all the other disks.

Personally, I get twitchy when any of my drives have bad sectors. I have smartd monitoring them, and I’ll attempt to RMA them as soon as a sector goes bad.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The trick foor dealing with >1 disk failure

Seeing as I’ve been asked, the trick for dealing with multiple disk failures is:

Firstly, we have to assume that they are soft-failures, i.e. a disk got kicked out because of a bad sector. You were re-adding it, when another drive also picked up a bad read, and got kicked too.

Now, because of bad sectors on multiple drives we are going to have corruption, but so be it.

The array can’t be started any more, and disks can’t be added to it, because both kicked disks are marked as dirty.

So you have to work out the correct order, and do a mdadm --create with exactly the right options (you can determine them from the md super-blocks). You do this with N-1 disks so it is started degraded, and no re-syncing occurs. If it looks sane, you add another disk, resync (you’ve already dealt with those bad sectors, right?), fsck, and then go looking for corruption.

When I did this, it was with reiserfs, and one of the disks in the rebuild had been out of the array for a day or so (it was the best we could do). Although I got hundreds of fsck errors (and fsck-ing reiserfs is never pretty) we never found any real data corruption on the volume. I was rather impressed.

RAID choice

For this reason alone, despite it being so much more expensive, RAID 10 is looking so much more worth it these days. o.O

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.