I’ve been enjoying our server at UK2.net. It’s a pretty speedy machine (although a little light on RAM - I suspect that they don’t want people running Xen), and it’s connected to a fat pipe. But I’ve been experiencing a lot of bad lockups.
I traced the problem to postmaping the uceprotect.net RBL file. They recommend that you rsync this file from them, and then postmap it into a fast lookup database for postfix, rather than using their DNSRBL service. But running the postmap was taking my box 40 mins. The same operation, on a loaded, lower-spec, 2 year old server took 2 mins (yes this server also has RAID1 on the volume concerned). On my UK2 box, while the postmap was running, the machine became totally unresponsive, and it could take a minute or two to log in, serve a web page, or even execute a basic command like ps
.
Clearly something wasn’t right. And it was something in the IO system. The only answer is the 3ware RAID controller. (It’s a 8006-2, doing RAID-1) I know these controllers have a big buffer, so I looked up the 3ware website, for tuning guidance. I followed it to the letter, and things didn’t really improve. I tried the deadline scheduler, and tweaking the buffers, but it only got marginally better.
Personally, I’ve always used software RAID, even for RAID-5, and I’ve never had bad performance like that. And having the RAID in a portable format has really helped with recovery in the past. I understand that Windows monkeys have to use hardware RAID (because their software RAID sucks so much), but is this kind of performance normal?
I’ve asked UK2 to chuck my controller and give me software RAID :-)
I’ve now got software RAID 1, and postmap runs in 25 seconds. That’s what I call a 60x speed improvement :-)
Oh, and the system is totally responsive while the postmap runs.
Comments
I'm seeing awful performance w
I'm seeing awful performance with a 3Ware 9550SX-8LP in a twin Opteron 250 Supermicro H8DA8 box with 4GB RAM that I'm testing as a server replacement.
I've upgraded to the 3ware 9.4.1.2 codeset firmware (FE9X 3.08.02.005) and utilities, tried the various tweaks as recommended by them in their KnowledgeBase (blockdev --setra 16384 etc), am running with CentOS 4.5 (2.6.9-55.ELsmp) which has the correct driver (2.26.05.007) for this codeset built in and I'm seeing only about 70MB/s read/ 40MB/s write, coupled with huge loadaves and lack of responsiveness to even simple commands like 'ls' when any intensive disk i/o is going on. I can cope with the relatively poor MB/s throughput - it's the cliff that the system performance falls off of that's the killer, as you also found.
This problem applies to both RAID 1 and "Single Disk" configurations I've tried (JBOD is no longer an option - drives not previously used as JPOD units on the pre-9xxx series can't be set up from fresh as JBOD on this controller it seems).
I'm about to try swapping out the 4x 250GB Maxtor disks for WD ones instead - but I suspect something rather more fundamental is wrong here.
If I could configure the thing as a simple passthrough SATA controller, I'd be able to run mdadm/software RAID on the raw disks perhaps, but that's not an option - everything has to be mediated through 3ware these days.
My current last hope, if the WDs don't miraculously sort it out, is to give openSUSE 10.2 a go but right now I think I'm looking at a very expensive doorstop.
Where are the benchmarks for RAID 1, that's what I'd like to know - they're curiously absent from 3ware's own performance benchmarking whitepaper, which deals only with RAID 0 and RAID 5.
Yes, I feel your pain. I don'
Yes, I feel your pain.
I don't have any benchmarks, but I'm just not happy with the RAID1 performance.
I think these cards are tweaked for massive streaming performance, not random I/O...
Who'd want to use RAID0, anyway? :-)
No improvement with either dif
No improvement with either different disks, nor with OpenSUSE 10.2.
My current Googling involves the keywords 'pdflush' and 'uninterruptible'.
A 30 day eval of RHEL AS 4 Update 5 is the next thing to try - then at least I might be able to file a relevant problem report on bugzilla.
S.
same issues
HI,
I am having very similar issues with a 9000 series and running Centos 5.2. Did any of you solve your issues? Did you find turning off AutoVerify helped?
Pingback
Post new comment