Not getting more than 65000 IOPS with xtreme setup

**stevecs** · 03-05-2009, 03:18 PM

@henk53- If you are using 8KiB request sizes from postgresql then I would at least try 8KiB as a stripe size, I wouldn't use 4KiB as that would be 1/2 your request size so you would be splitting it across two drives (halving your subsystem's available iops which is opposite of what you want to do).

As for the lvm metadatasize rounding yes I'm aware of that and the default of 192KiB, that /should not/ increase to 320KiB if you are UNDER 256 (255KiB). Basically LVM ear-marks 3 blocks of 64KiB each by default for data. If you set it to 256 you are in the next block (remember this is 0 offset) so it will 'pad' that to 320 (256+64) as your new starting point for your data. So 255 should also work. The main point however is when you check the offset w/ pvs (pvs -o+pe_start) you should see your new starting point as long as it's aligned it doesn't really matter how you get there.

Now the question is (which I have NOT benchmarked myself) is should this be set to a multiple of your STRIPE size or a multiple of your data STRIPE WIDTH. (ie. assuming 5 drives in raid-5 with a 128K stripe size, your stripe width for the array would be 128*4 (minus 1 drive for parity) so 512KiB. By using your data stripe width for your LVM stripe size it /should/ increase your performance as you would have less stripe width fragmentation and cut down on the read/modify/write penalties per sub array. remember to set your metadatasize to your stripe width boundary not to your stripe size. It would prove to be very interesting if you could test that.

The numbers you get from your ram tests are very good and strongly point to the raid card being a bottleneck. Now why you don't get scale with multiple cards I think is due to how your i/o is being distributed and the low level updates (array & lvm striping) OR the areca driver. Another item to check as well is how do you have your interrupts distributed to your cpus' for the cards? do you have each card going to a different cpu & core?

As for queue depth I have not found anything on the ARC16xx cards that sets it per-drive, however you can set it per array (/sys/block/<device>/queue/nr_requests) max is 256 for the cards. At the same time here you may want to set your scheduler to deadline or noop if you haven't done so already (/sys/block/<device>/queue/scheduler )

Thread: Not getting more than 65000 IOPS with xtreme setup

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions