Originally Posted by
stevecs
Yeah, just found the arcmsr.h and it's limited to 256 commands max for the controller:
#define ARCMSR_MAX_OUTSTANDING_CMD 256
I just took a look at the source too, but good find
if you want to skip all that and just run it w/ the max 256 outstanding threads (just don't do the for loop and set it to 256) as that will flood the array.
For a quick test I shortened the loop to 128 and 256 (and just two passes). Oh, and I also compiled the code. In the previous test I just executed the binary that was already in the /bin directory. These are the results for the 128 threads test:
Code:
IOIOIOIOIOIOIOIOIOIOI XDD version 6.5.013007.0001 IOIOIOIOIOIOIOIOIOIOIOI
xdd - I/O Performance Inc. Copyright 1992-2007
Starting time for this run, Tue Mar 10 18:03:26 2009
ID for this run, 'No ID Specified'
Maximum Process Priority, disabled
Passes, 2
Pass Delay in seconds, 0
Maximum Error Threshold, 0
Target Offset, 0
I/O Synchronization, 0
Total run-time limit in seconds, 0
Output file name, stdout
CSV output file name,
Error output file name, stderr
Pass seek randomization, disabled
File write synchronization, disabled
Pass synchronization barriers, enabled
Number of Targets, 1
Number of I/O Threads, 128
Computer Name, mrhpgdb2, User Name, henk.dewit
OS release and version, Linux 2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009
Machine hardware type, x86_64
Number of processors on this system, 1
Page size in bytes, 4096
Number of physical pages, 8255301
Megabytes of physical memory, 32247
Seconds before starting, 0
Target[0] Q[0], /ssd/S0
Target directory, "./"
Process ID, 20815
Thread ID, 1141754192
Processor, all/any
Read/write ratio, 100.00, 0.00
Throttle in MB/sec, 0.00
Per-pass time limit in seconds, 0
Blocksize in bytes, 512
Request size, 8, blocks, 4096, bytes
Number of Requests, 32768
Start offset, 0
Number of MegaBytes, 16384
Pass Offset in blocks, 0
I/O memory buffer is a normal memory buffer
I/O memory buffer alignment in bytes, 4096
Data pattern in buffer, '0x00'
Data buffer verification is disabled.
Direct I/O, enabled
Seek pattern, queued_interleaved
Seek range, 128000000
Preallocation, 0
Queue Depth, 128
Timestamping, disabled
Delete file, disabled
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET PASS0001 0 128 17179869184 4194304 76.375 224.942 54917.49 0.0000 1.42 read 4096
^MTARGET PASS0002 0 128 17179869184 4194304 76.593 224.300 54760.70 0.0000 1.24 read 4096
^MTARGET Average 0 128 34359738368 8388608 152.927 224.681 54853.83 0.0000 1.33 read 4096
^M Combined 1 128 34359738368 8388608 152.927 224.681 54853.83 0.0000 1.32 read 4096
Ending time for this run, Tue Mar 10 18:06:22 2009
The tool thus gives the number 54k for IOPS, which is distinctively different from the number that bm-flash gives me. I'll try to test with the same number of threads that bm-flash uses (10 resp. 40). I do wonder about one other thing and that's the fact that XDD reports the number of cpus on the system as 1, while in fact there are 8 (or 2 physical cpus).
edit:
The write test finally completed:
Code:
IOIOIOIOIOIOIOIOIOIOI XDD version 6.5.013007.0001 IOIOIOIOIOIOIOIOIOIOIOI
xdd - I/O Performance Inc. Copyright 1992-2007
Starting time for this run, Tue Mar 10 18:06:28 2009
ID for this run, 'No ID Specified'
Maximum Process Priority, disabled
Passes, 2
Pass Delay in seconds, 0
Maximum Error Threshold, 0
Target Offset, 0
I/O Synchronization, 0
Total run-time limit in seconds, 0
Output file name, stdout
CSV output file name,
Error output file name, stderr
Pass seek randomization, disabled
File write synchronization, disabled
Pass synchronization barriers, enabled
Number of Targets, 1
Number of I/O Threads, 128
Computer Name, mrhpgdb2, User Name, henk.dewit
OS release and version, Linux 2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009
Machine hardware type, x86_64
Number of processors on this system, 1
Page size in bytes, 4096
Number of physical pages, 8255301
Megabytes of physical memory, 32247
Seconds before starting, 0
Target[0] Q[0], /ssd/S0
Target directory, "./"
Process ID, 20951
Thread ID, 1152395600
Processor, all/any
Read/write ratio, 0.00, 100.00
Throttle in MB/sec, 0.00
Per-pass time limit in seconds, 0
Blocksize in bytes, 512
Request size, 8, blocks, 4096, bytes
Number of Requests, 32768
Start offset, 0
Number of MegaBytes, 16384
Pass Offset in blocks, 0
I/O memory buffer is a normal memory buffer
I/O memory buffer alignment in bytes, 4096
Data pattern in buffer, '0x00'
Data buffer verification is disabled.
Direct I/O, enabled
Seek pattern, queued_interleaved
Seek range, 128000000
Preallocation, 0
Queue Depth, 128
Timestamping, disabled
Delete file, disabled
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET PASS0001 0 128 17179869184 4194304 2635.499 6.519 1591.46 0.0006 0.04 write 4096
It might be more clear to paste the figures separately from the overview data:
The write figures, 128 threads:
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET PASS0001 0 128 17179869184 4194304 2635.499 6.519 1591.46 0.0006 0.04 write 4096
^MTARGET PASS0002 0 128 17179869184 4194304 2911.876 5.900 1440.41 0.0007 0.04 write 4096
^MTARGET Average 0 128 34359738368 8388608 5543.046 6.199 1513.36 0.0007 0.04 write 4096
^M Combined 1 128 34359738368 8388608 5547.000 6.194 1512.28 0.0007 0.04 write 4096
The read figures for 256 threads:
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET PASS0001 0 256 17179869184 4194304 337.441 50.912 12429.73 0.0001 0.86 read 4096
^MTARGET PASS0002 0 256 17179869184 4194304 76.499 224.577 54828.35 0.0000 3.09 read 4096
^MTARGET Average 0 256 34359738368 8388608 413.915 83.012 20266.49 0.0000 1.27 read 4096
^M Combined 1 256 34359738368 8388608 413.915 83.012 20266.49 0.0000 1.27 read 4096
There is rather large performance difference between the first and second pass. When I did this run again (10 passes), I got a more consistent picture:
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
TARGET PASS0001 0 256 17179869184 4194304 76.641 224.159 54726.28 0.0000 3.95 read 4096
TARGET PASS0002 0 256 17179869184 4194304 76.726 223.912 54665.97 0.0000 3.22 read 4096
TARGET PASS0003 0 256 17179869184 4194304 76.594 224.297 54760.02 0.0000 3.20 read 4096
TARGET PASS0004 0 256 17179869184 4194304 76.691 224.013 54690.69 0.0000 3.26 read 4096
TARGET PASS0005 0 256 17179869184 4194304 76.680 224.045 54698.57 0.0000 3.24 read 4096
TARGET PASS0006 0 256 17179869184 4194304 76.702 223.983 54683.23 0.0000 3.22 read 4096
TARGET PASS0007 0 256 17179869184 4194304 76.708 223.966 54679.14 0.0000 3.26 read 4096
TARGET PASS0008 0 256 17179869184 4194304 76.259 225.283 55000.76 0.0000 3.27 read 4096
TARGET PASS0009 0 256 17179869184 4194304 76.835 223.596 54588.75 0.0000 3.22 read 4096
TARGET PASS0010 0 256 17179869184 4194304 76.295 225.177 54974.74 0.0000 3.24 read 4096
TARGET Average 0 256 171798691840 41943040 765.943 224.297 54760.01 0.0000 3.31 read 4096
Combined 1 256 171798691840 41943040 766.000 224.280 54755.93 0.0000 3.29 read 4096
Finally the write figures for 256 thread. This again took quite some time to complete.
Code:
T Q Bytes Ops Time Rate IOPS Latency %CPU OP_Type ReqSize
^MTARGET PASS0001 0 256 17179869184 4194304 2615.796 6.568 1603.45 0.0006 0.10 write 4096
^MTARGET PASS0002 0 256 17179869184 4194304 2893.969 5.936 1449.33 0.0007 0.07 write 4096
^MTARGET Average 0 256 34359738368 8388608 5508.566 6.238 1522.83 0.0007 0.09 write 4096
^M Combined 1 256 34359738368 8388608 5509.000 6.237 1522.71 0.0007 0.08 write 4096
Bookmarks