raid

Linux on-line Software RAID reshaping

I first read about on-line Software RAID reshaping a year ago on LWN. Today I tried it on a live system (that's too big to be backed-up first :-))

I added 2 250GB drives to my existing RAID5 array of 4, making for a 1.2TiB array. The reshape took a while...:

md0 : active raid5 sdb1[4] sda1[5] sdf1[2] sde1[3] sdd1[1] sdc1[0]
      732587712 blocks super 0.91 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
      [>....................]  reshape =  1.9% (4800784/244195904) finish=397.4min speed=10034K/sec

But when it was done:

$ df -h
/dev/md0              1.2T  657G  509G  57% /mnt/storage

This is one of the reasons why I love software RAID, while you have a kak load more I/O through the PCI(E) bus than you would with hardware RAID, you get the flexibility of the highest-end hardware controllers on a normal PC motherboard.

And of course, should things go pear-shaped, I don't need to find an identical controller, I just have to find a box with 6 SATA sockets.

HOWTO reshape

Lets say you have 4 SATA drives, /dev/sda to /dev/sdd, and you are adding a new one /dev/sde.

Check that everything is happy:

$ cat /proc/mdstat
md0 : active raid5 sdd1[3] sdc1[1] sdb1[2] sda1[0]
      937705728 blocks level 5, 128k chunk, algorithm 2 [4/4] [UUUU]

Partition the new drive (clone sdas partition table onto sde):

# sfdisk -d /dev/sda | sfdisk /dev/sde

Add the new drive:

# mdadm -a /dev/md0 /dev/sde1

Grow the RAID:

# mdadm --grow -n5 /dev/md0

Watch the progress:

$ watch cat /proc/mdstat

Grow the filesystem:

# resize_reiserfs /dev/md0

See the extra space:

$ df -h

Horrific performance with 3ware RAID

I’ve been enjoying our server at UK2.net. It’s a pretty speedy machine (although a little light on RAM - I suspect that they don’t want people running Xen), and it’s connected to a fat pipe. But I’ve been experiencing a lot of bad lockups.

I traced the problem to postmaping the uceprotect.net RBL file. They recommend that you rsync this file from them, and then postmap it into a fast lookup database for postfix, rather than using their DNSRBL service. But running the postmap was taking my box 40 mins. The same operation, on a loaded, lower-spec, 2 year old server took 2 mins (yes this server also has RAID1 on the volume concerned). On my UK2 box, while the postmap was running, the machine became totally unresponsive, and it could take a minute or two to log in, serve a web page, or even execute a basic command like ps.

Clearly something wasn’t right. And it was something in the IO system. The only answer is the 3ware RAID controller. (It’s a 8006-2, doing RAID-1) I know these controllers have a big buffer, so I looked up the 3ware website, for tuning guidance. I followed it to the letter, and things didn’t really improve. I tried the deadline scheduler, and tweaking the buffers, but it only got marginally better.

Personally, I’ve always used software RAID, even for RAID-5, and I’ve never had bad performance like that. And having the RAID in a portable format has really helped with recovery in the past. I understand that Windows monkeys have to use hardware RAID (because their software RAID sucks so much), but is this kind of performance normal?

I’ve asked UK2 to chuck my controller and give me software RAID :-)

Update

I’ve now got software RAID 1, and postmap runs in 25 seconds. That’s what I call a 60x speed improvement :-)

Oh, and the system is totally responsive while the postmap runs.

Syndicate content