Go nuts..
Edit: Out of curiosity I took the fp key out and ran the 1 worker config, fp nearly perfectly doubles iops after like qd4, pretty cool. :)
Printable View
Go nuts..
Edit: Out of curiosity I took the fp key out and ran the 1 worker config, fp nearly perfectly doubles iops after like qd4, pretty cool. :)
You can save them all to 1 big file.
Yes, and that one big file makes it less work for me, since i get less to filter :)
I'll make a graph of your result file mbreslin. I'm hoping you will do a run with # of workers = CPU threads, to investigate if you have a CPU bottleneck.
I have something to add. I just skimmed your result file mbreslin, it seems something went wrong when you ran the config/configs. It seems you have 5 sets of results in the file, but they all report 1 worker, and the IOPS support that. What did you do?
in all test files at the 7th run of 9 the IOmeter is not responding :S why???
No idea, could the queue depth be overloading your raid processor?
Not all the configs have 9 runs.
Here are the graphs i made of mbreslin's results, there was something strange about the results, so i could only give graphs for one worker:
Attachment 104798
Attachment 104799
From the looks of it, you peak at QD 32 with 1 worker and fastpath on your current settings (scaling slows off after QD 16, aka diminishing returns). Have you got any OC? It seems you are getting insane scaling, so i would love to see what it will look like with 8-16 workers.
Given the data from 1 C300 drive, 45-50K 4KB random read IOPS, 8R0 should get you to 300-400K, or the controller/CPU limit.
here is my results
GullLars,
That would be 6/12 workers in mbreslins case. (980X)
Gulllars we cant break the 290.000 IO's with 9260. ROC has cap limit near 292-295.000 IO's
252483 iops at 16W QD16 isn't so bad :)
Wonder what 2 controllers would give :D
Have you tried at 512B?
Ohhhhhh I see when I pick the drive it doesn't select it for all workers, so only the first worker ran. I'll redo it in a couple hours with all tests, going to get some dinner now.
273.561 iops @16QD 16 workers
Here are some graphs for Tiltevros' results. I'll mesh mbreslin with computurd and tilt's results for comparative numbers on barefoot vs c300 vs x25-M in my next post comming up in a few mins.
Attachment 104801
Attachment 104802
As you'll see, there was a couple of holes in the results. It seems you stopped the 4W and 8W configs after total QD 128, and the 16W after QD 512. I'll use 1, 2, 16W for the comparison mentioned above.
for some reason as i said in my previews post it hangs out after 7th test
i posting new cvs with a new run @ 4W and 8W test in a sec
Tilt,
What % is CPU utilization at when it stops responding?
(QD > 128 is of purely academic interest anyways)
when it ends and starts the ramp up task manager shows from 5% to 0% and then hangs out but the maximum response time goes like 1991,1680ms lol?
Anvil, i agree, QD > 128 is unrealistic, but i include it here to show the theoretical scaling potential.
Here's a comparative IOPS and IOPS/average accesstime for Computurd's Barefoot array (wich didn't show much difference with more workers), vs mbreslin's C300 array (wich for some reason only did 1 worker), vs Tiltevros' x25-M array with 1, 2, 16 workers.
Attachment 104803
Attachment 104804
The C300's seem to have great potential.
Tiltevros seems to need 2 threads to get full IOPS performance, 1 thread seems to give CPU bottleneck.
These are insane numbers guys.
EDIT: i'd love some numbers from you too anvil when you get your fastpath key.
ok new post with 4W and 8W
Hmm,
How do you cool the controller Tilt?
It might just be saturated...
gulllars my Cpus are running at stock speeds of 2,26GHz dont forget that :) and 99Mhz PCIE
one cooler from my old 3Dfx card i know its a shame but i am afraid of the BBU explosion @ 62oC :S
As a sidenote, you may wonder why i'm making graphs for IOPS/average accesstime, when IOPS={1/average accesstime}*QD.
While IOPS is a good meassure for throughput, and accesstime for responsetime, IOPS/accesstime gives you a meassure for quality of service. There's little point in doubling throughput if it causes you to have 10x higher latencies. This type of graph shows where you start trading off accesstime for IOPS, and clearly marks the point where diminishing returns set in (the peak of the graph). This top point (or top points) will show you what load the array is most effective at.
I'll post my results but I'll probably stick to the G2s and the SF drives.
The C300 drives at 6Gb/s just might be what makes the difference to the better scaling at low QDs.
We'll see what happens using more workers.