PDA

View Full Version : Looking for Linux file system performance tuning tips for LSI 9271-8i + 8 SSD's RAID0



RJ1
01-28-2013, 03:11 PM
System info: Supermicro X9SAE-V motherboard, Intel Xeon E3-1275V2 CPU, 32GB 1600MHz ECC memory, LSI 9271-8i at PCIeV3 X8 (with FastPath), 8 OCZ Vertex3 120GB SSD's, RAID0

Raw disk performance as measured by Gnome Disk Utility is very good, >= 4.0 GB/s sustained read/write

LSI config that seems to work best for raw disk IO, 256KB stripe, no read-ahead, always write-back, Direct IO

So far, I have tried a few different stripe sizes (64K, 128K, 256K) with the XFS file system using what I believe is optimal XFS stripe unit and width:

sunit = stripe size / 512 (usually get a warning rfom mkfs.xfs that specified stripe unit not the same as volume stripe unit 8)
swidth = sunit * #drives

Linux blockdev read-ahead is the main system parameter I have varied, a value of 16384 seems to work well. For FastPath + SSD's, LSI recommends to set the LSI config read-ahead to no read-ahead. I have tried it both ways and it does not seem to make a difference. The Linux blockdev read-ahead make a big difference.

I have mainly used iozone as my file system performance benchmark as it has worked well for me in the past.

I am looking for suggestions for optimizing the Linux file system performance, preferably from someone that has worked with a similar setup in Linux. Right now, my sequential read performance is decent, approaching 4 GB/s, but my write performance seems to have a ceiling of about 2.5 GB/s whether I use a RAID0 of 5 SSD's or 8 SSD's.

I have seen some recommend ext4 instead of xfs. Also, I know there are a lot of Linux system vars that may be applicable. My goal is to get as close as possible to 4 GB/s sequential read/write. I don't have a lot of time right now to try every possible configuration.

CrazyNutz
01-28-2013, 04:46 PM
Your bottleneck is probably with the journal. When you write to an XFS file system it writes to a journal ahead of the data being written.

You can try some tweaks that effect the journal on the mount options (I think there are different modes) .

You can also put the journal on a different device, also I think you can disable it all together (if your not worried about data loss if the power fails, should not be a problem if you have a good UPS)

Darakian
01-28-2013, 08:00 PM
Make sure you're on the latest kernel. XFS has gone through some major updates in the last few years.

Kain665
01-28-2013, 08:15 PM
Block device level, make sure that you are using the NOOP IO-Scheduler (Default CFQ).

RJ1
01-29-2013, 01:07 PM
I am currently using the CentOS 6.3 Linux distribution. The kernel is shown as 2.6.32-279.19.1.el6.x86_64.

I can try disabling the XFS journal to see how much that affects things.

How many of the suggestions in the following two LSI support pages apply to my application?

http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?inc=8196

http://mycusthelp.info/LSI/_cs/AnswerDetail.aspx?sSessionID=3A06A11BAB2F4084BBCE8 6439F82FE5DKUUCCXBS&inc=8273

RJ1
01-30-2013, 05:32 AM
Here is an interesting video of David Chinner discussing recent improvements of XFS:

http://www.youtube.com/watch?v=FegjLbCnoBw

Here is an article with related discussion including comments from Chinner:

http://lwn.net/Articles/476263/

In both the video and the article comments, Chinner states that all of the XFS improvements were incorporated into RHEL 6.2, so CentOS 6.3 should also have all of the enhancements.

I tried all of the suggestions from the LSI support pages, but I only saw a very minor improvement.

I have not found a way to disable journaling all together in XFS to see how that affects performance. If anyone knows how to do that, please let me know.

I am going to try ext4 and see how it compares to XFS for my application.

RJ1
02-01-2013, 09:10 AM
I cannot figure out how to upload files as attachments, is this feature broken?

Here is a script I have tried to optimize various Linux settings based on the LSI support pages:

-----------------------------------------------------------------------------------------------

#!/bin/bash

# turn off seek re-ordering
echo "0" > /sys/block/sda/queue/rotational

# set I/O scheduler to noop
echo "noop" > /sys/block/sda/queue/scheduler

# set block layer queue depth
echo "975" > /sys/block/sda/queue/nr_requests

# set device layer queue depth
echo "975" > /sys/block/sda/device/queue_depth

# disable kernel background thread
echo "N" > /sys/module/drm_kms_helper/parameters/poll

# set MegaRAID IRQ affinity masks
irqs=`cat /proc/interrupts |grep mega|awk -F ":" '{ print $1}'`
mask[0]=1
mask[1]=2
mask[2]=4
mask[3]=8
mask[4]=10
mask[5]=20
mask[6]=40
mask[7]=80
i=0
for irq in $irqs; do
# echo "irq=$irq, i=$i, mask=${mask[$i]}";
echo ${mask[$i]} > /proc/irq/$irq/smp_affinity
let i++;
done

# set irq affinity
echo "0" > /sys/block/sda/queue/rq_affinity

RJ1
02-01-2013, 10:09 AM
Here is the Gnome Disk Utility Benchmark results for a 256K stripe 8 SSD RAID0 configuration:

http://i1298.photobucket.com/albums/ag44/rjimg/Bench1_zps3a163256.png

RJ1
02-01-2013, 10:12 AM
Here is an example of how I have tried to create an XFS file system for a 256K stripe 8 SSD RAID0:

mkfs.xfs -f -L DataRAID \
-b size=4096 \
-d sunit=512,swidth=4096 \
-l sunit=512,lazy-count=1,version=2,size=128m \
-i attr=2 /dev/sda


Here is the fstab entry I have tried to mount this XFS file system:

LABEL=DataRAID /DataRAID xfs rw,noatime,nodiratime,nodev,nosuid,logbsize=256k,l ogbufs=8,inode64,nobarrier,allocsize=64m 0 0


Here are some iozone results for this XFS file system with the LSI tweaks mentioned above in a previous post. Write performance still only around 2.6 GB/s. My goal is to get as close to 4 GB/s as possible.

------------------------------------------------------------------------------------------------

Iozone: Performance Test of File I/O
Version $Revision: 3.394 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.

Run began: Thu Jan 31 15:27:01 2013

Record Size 1024 KB
Record Size 2048 KB
Record Size 4096 KB
Record Size 8192 KB
Record Size 16384 KB
File size set to 100663296 KB
Command line used: /usr/bin/iozone -i 0 -i 1 -i 2 -r 1m -r 2m -r 4m -r 8m -r 16m -s 96g -f /DataRAID/testfile
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
------------------------------------------------------random random bkwd record stride
----KB----reclen---write---rewrite----read----reread---read----write-----read rewrite read fwrite frewrite fread freread
100663296 1024 2634183 2877187 3985985 4002471 948822 3050027
100663296 2048 2677945 2966218 3981357 4009041 1491174 3022288
100663296 4096 2652620 3015210 3849320 3880013 1994893 2982310
100663296 8192 2467574 2776431 3587221 3602532 2509243 2801943
100663296 16384 2290530 2666110 3463176 3471063 2795685 2694036

iozone test complete.

RJ1
02-01-2013, 10:20 AM
Here is an example of how I have tried to create an EXT4 file system for a 256K stripe 8 SSD RAID0:

mkfs.ext4 -b 4096 -E stride=64,stripe-width=512 -O ^has_journal -L DataRAID /dev/sda


Here is the fstab entry I have tried to mount this EXT4 file system:

LABEL=DataRAID /DataRAID ext4 rw,noatime,nodiratime,nodev,nosuid,nobh,data=write back,barrier=0 0 0


Here are some iozone results for this EXT4 file system with the LSI tweaks mentioned above in a previous post. The XFS results look better than these EXT4 results.


------------------------------------------------------------------------------------------------

Iozone: Performance Test of File I/O
Version $Revision: 3.394 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.

Run began: Thu Jan 31 11:23:57 2013

Record Size 1024 KB
Record Size 2048 KB
Record Size 4096 KB
Record Size 8192 KB
Record Size 16384 KB
File size set to 100663296 KB
Command line used: /usr/bin/iozone -i 0 -i 1 -i 2 -r 1m -r 2m -r 4m -r 8m -r 16m -s 96g -f /DataRAID/testfile
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
------------------------------------------------------random random bkwd record stride
----KB----reclen---write---rewrite----read----reread---read----write-----read rewrite read fwrite frewrite fread freread
100663296 1024 2088831 2296893 3981856 3992165 958220 2305406
100663296 2048 2109006 2276364 3961707 3995606 1488399 2263700
100663296 4096 2035761 2237523 3836874 3854241 1992631 2196442
100663296 8192 1962799 2116072 3514392 3540483 2507479 2132013
100663296 16384 2046374 2212523 3425371 3445564 2781162 2232889

iozone test complete.

RJ1
02-01-2013, 08:23 PM
Here is something I found interesting. I reran the XFS iozone benchmark with the extra iozone flag "-I" for Direct-IO. The sequential write and the random read/write was improved while the sequential reads were not as good for smaller record sizes.

Here are the iozone results:


Iozone: Performance Test of File I/O
Version $Revision: 3.394 $
Compiled for 64 bit mode.
Build: linux

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave Boone,
Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer.
Ben England.

Run began: Fri Feb 1 13:30:57 2013

O_DIRECT feature enabled
Record Size 1024 KB
Record Size 2048 KB
Record Size 4096 KB
Record Size 8192 KB
Record Size 16384 KB
File size set to 100663296 KB
Command line used: /usr/bin/iozone -I -i 0 -i 1 -i 2 -r 1m -r 2m -r 4m -r 8m -r 16m -s 96g -f /DataRAID/testfile
Output is in Kbytes/sec
Time Resolution = 0.000001 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
------------------------------------------------------random random bkwd record stride
----KB----reclen---write---rewrite----read----reread---read----write-----read rewrite read fwrite frewrite fread freread
100663296 1024 2891087 3034149 1347364 1350111 1184200 2972416
100663296 2048 3285530 3356490 2581875 2593956 2319664 3260048
100663296 4096 3509246 3506871 2868134 2879221 2902174 3482614
100663296 8192 3455400 3484193 3261035 3274733 3243127 3444259
100663296 16384 3438511 3496536 3528632 3548018 3508518 3444243

iozone test complete.