Not getting more than 65000 IOPS with xtreme setup

**stevecs** · 03-11-2009, 03:45 AM

Yeah, I've run into that problem for all platforms/vendors. I've even seen various 'stupid admin' designs where they put say a serial card in a 64bit/133mhz slot but then put the SAN hba in a 32bit/33mhz one. The board I've go my eyes on now is the Supermicro X8DAH+ as it has two tylersburg-36D chipsets on it (only one I've found so far w/ two). For I/O it's going to be a killer if they don't castrate something else. You don't really need more than 8x PCIe v1 speeds as of yet as the cards that are available can't handle more than that anyway (IOP34x are maxed out). What you're doing is generally what is done (putting multiple cards in a system and lvming/striping across them). Here at at the datacenter we don't have any SSD's really deployed to any of the clients, the largest bank of drives we have is about 500-600 (not including sans) but even that large one is not for a single db instance and is hooked up via 4gbit fc so there are a lot of built-in bottlenecks (doesn't help matters that they bought a lower-end sun box (should have had at least a T5440 for the sparc line).

**henk53** · 03-11-2009, 09:45 AM

Originally Posted by stevecs

The board I've go my eyes on now is the Supermicro X8DAH+ as it has two tylersburg-36D chipsets on it (only one I've found so far w/ two). For I/O it's going to be a killer if they don't castrate something else.

It sounds interesting, thanks for the tip.

You don't really need more than 8x PCIe v1 speeds as of yet as the cards that are available can't handle more than that anyway (IOP34x are maxed out).

Indeed, which is something else I don't understand. There don't seem to be any faster RAID cards coming any time soon. At least, no announcements have been made by either Intel, Areca or Adaptech. Clearly many of us are hitting a wall with the speed offered by the current generation of IOPS, but there doesn't seem to be any improvement in the short term.

Here at at the datacenter we don't have any SSD's really deployed to any of the clients, the largest bank of drives we have is about 500-600 (not including sans)

Wow, that's something nice indeed to play with :P

I did have the opportunity to run some more tests today. I started with doing tests for an 8KiB request size. The numbers turned out to be significantly lower. Just to be sure that something else hadn't changed in the system (as I mentioned, my co-worker did some tests too), I re-run all the 4KiB tests and for every number of threads (queue depth) the same results as before were reported. To be really sure I then again re-run all the 8KiB tests (taking about 1 hour), but these gave exactly the same results as the earlier run.

Here they are. This is again for 8 disks per raid controller, 2 controllers, lvm stripped, 8KiB array stripe size, NOOP scheduler, average over 10 passes and thus 8KiB request size this time. I specifically checked that none of the passes were out of range, and none were. Every pass reported nearly the exact number of IOPS.

Code:

                       T  Q   Bytes          Ops        Time      Rate        IOPS        Latency    %CPU   OP_Type     ReqSize 
^MTARGET   Average     0   4  171798691840   20971520   1020.919  168.278     20541.81    0.0000     0.01   read        8192
^MTARGET   Average     0   8  171798691840   20971520   702.962   244.393     29833.08    0.0000     0.03   read        8192 
^MTARGET   Average     0  16  171798691840   20971520   594.507   288.977     35275.51    0.0000     0.08   read        8192 
^MTARGET   Average     0  32  171798691840   20971520   560.851   306.318     37392.34    0.0000     0.18   read        8192 
^MTARGET   Average     0  64  171798691840   20971520   548.917   312.978     38205.30    0.0000     0.38   read        8192
^MTARGET   Average     0 128  171798691840   20971520   545.725   314.808     38428.76    0.0000     0.82   read        8192 
^MTARGET   Average     0 256  171798691840   20971520   545.344   315.028     38455.59    0.0000     2.29   read        8192

As can be seen, in this case we already almost max out for 16 threads. Going beyond that only very marginally increases the IOPS.

For completeness I also tested with block size 2KiB, although that size isn't really important for my live load:

Code:

                       T  Q   Bytes          Ops        Time       Rate        IOPS        Latency    %CPU   OP_Type     ReqSize 
^MTARGET   Average     0   4  171798691840   83886080   2856.671    60.139     29364.98    0.0000     0.01   read        2048 
^MTARGET   Average     0   8  171798691840   83886080   1813.242    94.747     46263.03    0.0000     0.05   read        2048 
^MTARGET   Average     0  16  171798691840   83886080   1395.675   123.094     60104.29    0.0000     0.16   read        2048 
^MTARGET   Average     0  32  171798691840   83886080   1258.457   136.515     66657.87    0.0000     0.40   read        2048 
^MTARGET   Average     0  64  171798691840   83886080   1216.341   141.242     68965.91    0.0000     0.90   read        2048 
^MTARGET   Average     0 128  171798691840   83886080   1205.534   142.508     69584.15    0.0000     1.91   read        2048 
^MTARGET   Average     0 256  171798691840   83886080   1217.721   141.082     68887.78    0.0000     3.90   read        2048

In this case IOPS increase until 32 threads, but very slightly decrease after 128. With 128 threads I did notice an awkward a-synchronicity in the numbers reported by iostat:

Code:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.13     0.00     0.00     8.00     0.00    4.00   2.00   0.03
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.13     0.00     0.00     8.00     0.00    4.00   2.00   0.03
sdb               0.00     0.00 35684.80    0.00    68.09     0.00     3.91    66.00    1.85   0.03 100.00
sdc               0.00     0.00 35833.27    0.00    68.39     0.00     3.91     4.30    0.12   0.03  99.07
dm-0              0.00     0.00    0.00    0.13     0.00     0.00     8.00     0.00    4.00   2.00   0.03
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00 73157.07    0.00   139.68     0.00     3.91     0.00    0.00   0.00   0.00

avgqu-sz and await for sdb is way more than 10 times higher than that for sdc. With 256 threads I saw the same things. With 64 threads the difference was there too, but a little less (almost exactly a factor 10):

Code:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.68    0.00   17.21   69.68    0.00   12.42

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00    2.67   1.33   0.03
sda1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
sda2              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00    2.67   1.33   0.03
sdb               0.00     0.00 35177.27    0.00    67.12     0.00     3.91    47.15    1.34   0.03 100.00
sdc               0.00     0.00 35127.20    0.00    67.04     0.00     3.91     4.17    0.12   0.03  98.75
dm-0              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00    2.67   1.33   0.03
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
md0               0.00     0.00 71920.27    0.00   137.31     0.00     3.91     0.00    0.00   0.00   0.00

During the 64 threads run, vmstat showed this:

Code:

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 3 40   1572 12645792     52 15683252    0    0  9715  1252   20   17  0  3 85 12
 7 63   1572 12645776     52 15683252    0    0 138445     0 50310 276128  1 17 12 70
 4 46   1572 12645776     52 15683252    0    0 138288     0 50262 276661  0 17 13 69
 1 53   1572 12645768     52 15683252    0    0 138387     0 50337 274764  1 17 13 69
 2 40   1572 12645768     52 15683252    0    0 135370     0 49442 268276  2 17 13 68
 2 55   1572 12645760     52 15683252    0    0 138110     0 50192 275198  1 18 11 71
 3 45   1572 12645760     52 15683252    0    0 138575     0 50218 275155  1 17 12 70
 5 53   1572 12645760     52 15683252    0    0 135070     1 49743 264514  1 17 14 69

Thread: Not getting more than 65000 IOPS with xtreme setup

Thread Tools

Search Thread

Rate This Thread

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions