
Originally Posted by
stevecs
The board I've go my eyes on now is the Supermicro X8DAH+ as it has two tylersburg-36D chipsets on it (only one I've found so far w/ two). For I/O it's going to be a killer if they don't castrate something else.
It sounds interesting, thanks for the tip.
You don't really need more than 8x PCIe v1 speeds as of yet as the cards that are available can't handle more than that anyway (IOP34x are maxed out).
Indeed, which is something else I don't understand. There don't seem to be any faster RAID cards coming any time soon. At least, no announcements have been made by either Intel, Areca or Adaptech. Clearly many of us are hitting a wall with the speed offered by the current generation of IOPS, but there doesn't seem to be any improvement in the short term.
Here at at the datacenter we don't have any SSD's really deployed to any of the clients, the largest bank of drives we have is about 500-600 (not including sans)
Wow, that's something nice indeed to play with :P
I did have the opportunity to run some more tests today. I started with doing tests for an 8KiB request size. The numbers turned out to be significantly lower. Just to be sure that something else hadn't changed in the system (as I mentioned, my co-worker did some tests too), I re-run all the 4KiB tests and for every number of threads (queue depth) the same results as before were reported. To be really sure I then again re-run all the 8KiB tests (taking about 1 hour), but these gave exactly the same results as the earlier run.
Here they are. This is again for 8 disks per raid controller, 2 controllers, lvm stripped, 8KiB array stripe size, NOOP scheduler, average over 10 passes and thus 8KiB request size this time. I specifically checked that none of the passes were out of range, and none were. Every pass reported nearly the exact number of IOPS.
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET Average 0 4 171798691840 20971520 1020.919 168.278 20541.81 0.0000 0.01 read 8192
^MTARGET Average 0 8 171798691840 20971520 702.962 244.393 29833.08 0.0000 0.03 read 8192
^MTARGET Average 0 16 171798691840 20971520 594.507 288.977 35275.51 0.0000 0.08 read 8192
^MTARGET Average 0 32 171798691840 20971520 560.851 306.318 37392.34 0.0000 0.18 read 8192
^MTARGET Average 0 64 171798691840 20971520 548.917 312.978 38205.30 0.0000 0.38 read 8192
^MTARGET Average 0 128 171798691840 20971520 545.725 314.808 38428.76 0.0000 0.82 read 8192
^MTARGET Average 0 256 171798691840 20971520 545.344 315.028 38455.59 0.0000 2.29 read 8192
As can be seen, in this case we already almost max out for 16 threads. Going beyond that only very marginally increases the IOPS.
For completeness I also tested with block size 2KiB, although that size isn't really important for my live load:
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET Average 0 4 171798691840 83886080 2856.671 60.139 29364.98 0.0000 0.01 read 2048
^MTARGET Average 0 8 171798691840 83886080 1813.242 94.747 46263.03 0.0000 0.05 read 2048
^MTARGET Average 0 16 171798691840 83886080 1395.675 123.094 60104.29 0.0000 0.16 read 2048
^MTARGET Average 0 32 171798691840 83886080 1258.457 136.515 66657.87 0.0000 0.40 read 2048
^MTARGET Average 0 64 171798691840 83886080 1216.341 141.242 68965.91 0.0000 0.90 read 2048
^MTARGET Average 0 128 171798691840 83886080 1205.534 142.508 69584.15 0.0000 1.91 read 2048
^MTARGET Average 0 256 171798691840 83886080 1217.721 141.082 68887.78 0.0000 3.90 read 2048
In this case IOPS increase until 32 threads, but very slightly decrease after 128. With 128 threads I did notice an awkward a-synchronicity in the numbers reported by iostat:
Code:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.13 0.00 0.00 8.00 0.00 4.00 2.00 0.03
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.13 0.00 0.00 8.00 0.00 4.00 2.00 0.03
sdb 0.00 0.00 35684.80 0.00 68.09 0.00 3.91 66.00 1.85 0.03 100.00
sdc 0.00 0.00 35833.27 0.00 68.39 0.00 3.91 4.30 0.12 0.03 99.07
dm-0 0.00 0.00 0.00 0.13 0.00 0.00 8.00 0.00 4.00 2.00 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 73157.07 0.00 139.68 0.00 3.91 0.00 0.00 0.00 0.00
avgqu-sz and await for sdb is way more than 10 times higher than that for sdc. With 256 threads I saw the same things. With 64 threads the difference was there too, but a little less (almost exactly a factor 10):
Code:
avg-cpu: %user %nice %system %iowait %steal %idle
0.68 0.00 17.21 69.68 0.00 12.42
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.20 0.00 0.00 8.00 0.00 2.67 1.33 0.03
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.20 0.00 0.00 8.00 0.00 2.67 1.33 0.03
sdb 0.00 0.00 35177.27 0.00 67.12 0.00 3.91 47.15 1.34 0.03 100.00
sdc 0.00 0.00 35127.20 0.00 67.04 0.00 3.91 4.17 0.12 0.03 98.75
dm-0 0.00 0.00 0.00 0.20 0.00 0.00 8.00 0.00 2.67 1.33 0.03
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
md0 0.00 0.00 71920.27 0.00 137.31 0.00 3.91 0.00 0.00 0.00 0.00
During the 64 threads run, vmstat showed this:
Code:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
3 40 1572 12645792 52 15683252 0 0 9715 1252 20 17 0 3 85 12
7 63 1572 12645776 52 15683252 0 0 138445 0 50310 276128 1 17 12 70
4 46 1572 12645776 52 15683252 0 0 138288 0 50262 276661 0 17 13 69
1 53 1572 12645768 52 15683252 0 0 138387 0 50337 274764 1 17 13 69
2 40 1572 12645768 52 15683252 0 0 135370 0 49442 268276 2 17 13 68
2 55 1572 12645760 52 15683252 0 0 138110 0 50192 275198 1 18 11 71
3 45 1572 12645760 52 15683252 0 0 138575 0 50218 275155 1 17 12 70
5 53 1572 12645760 52 15683252 0 0 135070 1 49743 264514 1 17 14 69
Bookmarks