Not getting more than 65000 IOPS with xtreme setup

**stevecs** · 03-16-2009, 04:11 PM

@FEAR. It's hard to give a step by step for all situations as you've probably seen even in this thread there are a lot of variables and they have an effect on the setup of a system. As a high-level simple process something like this for linux:

Create an array on each areca controller

Simple method here would be to have areca arrays of the same size
Your OS should see XX physical discs

where XX is the number of controllers in your system (max 4 for areca) you can do multiple arrays on a single controller but won't go into that, it's the same basic principle though.
Under linux you have two basic options
[INDENT]
- mdadm (meta disk driver)
  - don't create any partition tables on the areca arrays (ie. /dev/sda only, no /dev/sda1 or whatever)
  - mdadm --create --help (displays the create help menu)
  - mdadm --create /dev/md0 --raid-level=0 --raid-devices=XX --chunk=XX </dev/arecaraid0> </dev/arecaraid1> </dev/arecaraidX>
    
    set your chunk size to be a multiple of your areca data stripe width (array size minus parity disks TIMES your stripe size) if possible (best performance for writes) or at least equal the size of your areca stripe size.
  - put your file system of choice on your new meta-device, EXT3 caps at 8TiB (forget what others on the net have said, if you're going beyond this or think you may don't use EXT3, use JFS or XFS). XFS be careful if you have any type of power outages or system freezes as you have a higher chance to loose data.
  - mount and use your filesystem
  - If you set your chunk size to be your data stripe width you will have some pain in growing your file system if you add drives to the areca disk (your data stripe width would change and you can't change that at the mdadm level without a reformat). However you could change the size of your drives (ie, 500GB to 1TB or 1TB to 2TB and keep the same number of drives but change their sizes in which case your data stripe width remains constant. If you use a chunk size = your areca stripe size you can grow the array easier but you loose some write performance (assuming parity raids here). To grow you would:
    - add/replace drives to your areca controller
    - If you added more drives, expand your raidset
    - then modify your volumeset and you have the option to change the volumeset size this will keep your current data in place but just make the 'disk' larger
    - reboot so that linux sees the new areca volume sizes
    - mdadm --grow /dev/md0 (grows your metadisk array at the OS level)
    - Then grow the file system (for jfs: mount -o remount,resize </mntpoint> ; for xfs xfs_growfs </mntpoint> ) you do this with the file system mounted
    - you're done
- LVM (Logical volume manager)
  - don't create any partition tables on the areca arrays (ie. /dev/sda only, no /dev/sda1 or whatever)
  - calculate out your starting offset. LVM uses by default 192KiB for meta data at the beginning of each physical volume (your areca arrays here). You want to find an offset that is the smallest multiple of your areca data stripe width * stripe size and is greater than or equal to 192KiB (LVM meta data). For example a raid-6 array of 10 drives with a 64KiB stripe size is 10-2 = 8 data drives 8*64 = 512KiB which is > 192KiB so you have to pad your lvm starting point to be at 512KiB.
  - pvcreate --metadatasize 511K --metadatacopies=2 <arecaraid0> <arecaraid1> <arecaraidX>
  - pvscan -v (shows volumes that were added)
  - pvs -o+pe_start (shows starting offset of volumes, this should be 512KiB in above example if not re-create with different offset, LVM 'padds' to a 64KiB boundary so it's a little fuzzy)
  - vgcreate --physicalextentsize X <volumegroupname> </dev/arecaraid0> </dev/arecaraid1> </dev/arecaraidX> (pick an extent size so that it's reasonable anticipating future growth. You're not limited to 65535 w/ LVM2, default size is 4MiB. I set mine to 1GiB in size as I plan to grow to 100TiB or so with the current array, and will probably move to 4GiB when I rebuild at that point)
  - lvcreate --stripes X --stripesize X --extents X --name <lvname> <vgname> </dev/arecaraid0> </dev/arecaraid1> </dev/arecaraidX> (--stripes here is the number of physical volumes to stripe across (raid cards/areca arrays); --stripesize needs to be 2^n and best case should be the data stripe width of your areca array assuming it 2^n, otherwise should be your base array stripe size; --extents is how big you want your logical volume to be. You list the areca physical volumes to indicate to lvm what underlaying devices it should pull extents from for your stripe, more important if you don't want to stripe across all physical disks for example, but it's good practice to enumerate what you want to happen explicitly)
  - put your file system of choice on your new meta-device, EXT3 caps at 8TiB (forget what others on the net have said, if you're going beyond this or think you may don't use EXT3, use JFS or XFS). XFS be careful if you have any type of power outages or system freezes as you have a higher chance to loose data.
  - mount and use your filesystem
  - If you set your chunk size to be your data stripe width you will have some pain in growing your file system if you add drives to the areca disk (your data stripe width would change and you can't change that at the mdadm level without a reformat). However you could change the size of your drives (ie, 500GB to 1TB or 1TB to 2TB and keep the same number of drives but change their sizes in which case your data stripe width remains constant. If you use a chunk size = your areca stripe size you can grow the array easier but you loose some write performance (assuming parity raids here). To grow you would:
    - add/replace drives to your areca controller
    - If you added more drives, expand your raidset
    - then modify your volumeset and you have the option to change the volumeset size this will keep your current data in place but just make the 'disk' larger
    - reboot so that linux sees the new areca volume sizes
    - pvresize -v -d </dev/arecaraidX> (do this for each array that you expanded on teh areca's this tells lvm to re-check the physical disk)
    - pvscan -v
    - vgdisplay <vgname>
    - lvresize --extents X </dev/logicalvolume> (this expands your logical volume by X extents)
    - Then grow the file system (for jfs: mount -o remount,resize </mntpoint> ; for xfs xfs_growfs </mntpoint> ) you do this with the file system mounted
    - you're done

Ok, a little long winded I guess but that's an overview.

Thread: Not getting more than 65000 IOPS with xtreme setup

Thread Tools

Search Thread

Rate This Thread

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions