Ao1, can you post your XLS file?
Ao1, can you post your XLS file?
I've posted the hIOmon PerfAnalysisExportFile here:
http://cid-541c442f789bf84e.office.l...x/.Public?uc=1
This was generated on the screen shot below. This time the CPU is over clocked to 3.5. The file opens 3 to 4 seconds faster.
• Fast IOPS 100% Read 100%
• Busy Time 95.01MS
• Max Read IOPS. 1.7
CPU - QX 6850. (3MHz stock) Graphics Card - ATI 6950 with 2GB RAM.
Err I meant the file that open at 3.6MB/s?;)
Same link. It's called AS SSD 4K.xlsx. 20.3MB.
Process Monitor seems to be monitoring at the logical disk level. With hIOmon it is possible to see that at the device level the SSD is not the cause of the delay. The max IOPs are really jumping out. 1.6/ 1.7. Its like the CPU is waiting to process an xfer before the next one is called. Don't know really, just a guess. It speeds up when the CPU is running faster. Would be interested to see what you think. In theory if your CPU is @ 3MHz it should take just as long to open even with 4 x X25-E.
I don't see this as being CPU bound I/O - the process is definitely CPU bound but by Excel processing not by I/O processing.
On my laptop, I see that 32KB I/O from Excel takes a mere 0.01-0.02 milliseconds to process, and if the CPU is running at half the speed it takes ~0.02ms most of the time. That does indicate the CPU speed plays a role, but I am guessing it has more to do with MEMORY speed (since fast I/O is mostly a memory copy operation).
I don't know how to read your hIoMon screenshot, but from that it seems the fast I/O takes 0.14ms at least to finish?
You can run IoMeter with 32KB sequential reads and see the response times.
At the volume level the read time was 0.0943ms to xfer 21,366,272B. Max read IOP time 0.0006. 165 IOPS Sequential (21,135,360B). 39 IOPS Random (226,816B). Avg QD 1.019 Max 2. Those stats should be more or less the same at the device level looking at post #99.
Even though the data xfer took 0.0943ms I still ended up waiting 15 seconds for the application to fully open.
Btw when I upped the CPU clock it automatically upped the RAM clock. :up:
Hence why I say it is not the I/O that is bottlenecking Excel but Excel processing itself.
Hmm, now I am not sure I expressed the right unit. It takes 0.01-0.02 MICRO seconds, not milliseconds for the fast I/O (procmon Duration column).
At 0.01-0.02 microseconds per 32KB, that's around 7-14milliseconds for 20MB. (seems like a lot if you think that's device transfer rate, but in reality it is memory transfer rate - so it is really low even).
So it's more than safe to say Excel is using the CPU - let's move on ;)
I know, hence the apparent increase of the fast I/O speed - since all it does is more or less copy the data in memory.Quote:
Btw when I upped the CPU clock it automatically upped the RAM clock. :up:
Just did a recheck, the Duration is expressed in SECONDS. So if the 32KB I/O takes 0.01-0.02 milliseconds that's quite a bit.
When wil the M4 be availble?
Have we decided whether to go with the M4 or V3 yet?
Since m4 is cheaper, I'm going m4 -- waiting for 512GB version for the lappy and a few for storage ;) (I'll be waiting 2 months though 'till I go back to Europe ;))
I found a great IDF 2011 paper by Intel on benchmarking SSD's. Not only does it discuss pitfalls of typical benchmarks (Anandtech included) it also looks at performance where it matters.
Even if it was assumed that benchmarks were correct any real life benefit is hamstrung by the OS/ Apps/ CPU/ GPU.
https://intel.wingateweb.com/bj11/sc...734BA3BAAF85A9
Great find, Ao1. I like the way the author of that presentation (James Myers) thinks -- it makes sense. The presentation emphasizes "bursts", periods when a lot of IO is taking place, and "hourglass moments", periods when the user is waiting on the computer to finish something.
Two points from the presentation that I found interesting:
1) The average QD during the bursts was about 20
2) The hourglass moments were 80% writes.
This suggests a theory (or at least a hypothesis): Sequential write speed and high-queue depth random write speed are important for minimizing hourglass moments.
yes i read this a few weeks back. i have emailed Intel about whether or not they have an audio on the presentation like they did on the last IDF> no response :(
looks to be moving to traces to measure ssd is the way to go. if you look at PCMark7 i believe they will be moving more towards this very method of quantitative measuring of performance, they do state that they will no longer measure theoretical peaks of bandwidth to measure storage performance.
that might be a bit of an oversimplified blanket statement there that would be hard to make given the data. there are differences, but they are hindered by OS/APPS/CPU/GPU, not eliminated. as he points out on page 25Quote:
Even if it was assumed that benchmarks were correct any real life benefit is hamstrung by the OS/ Apps/ CPU/ GPU.
that is a big difference. 3 seconds compared to 7 seconds is a large margin. percentage wise your looking at 67 percent difference.Quote:
Some SSDs in 3-4 sec range, others 5-7 secs
he does sum it up nicely towards the end by saying "ecosystem improvements needed to showcase "ssd class" performance.
we need new filesystems. also, new games that are coming will access data differently when they detect an SSD, and this is jut the beginning of apps that will change their access patterns when ssd are detected. just will take time. they talked alot about those changes in last years IDF.
it is VERY hard to make a complete reasoning from the PDF alone. we need the audio to really put it together. they did these for san fran IDF, but for Beijing i cant find the whole presentations. GAH i wish we could get them...
Hi John,
I picked up on the QD20 as well and it's been puzzling me. :confused:
I tried an experiment. I created a folder with 4,956 4k files. 19.3 MB (20,299,776 bytes).
I set hIOmon to monitor at 1 second intervals and monitored at the device level. (Not what happens at the SSD, but as close as possible within the Windows file system). I copied the folder from the E drive to the E drive.
Once it had started copying I copied it again. So, two folders were being copied at the same time. 38.6MB of 4k xfers. Surely this would show a high QD? Nope.
• Avg QD read = 1.0248. Max 2
• Avg QD write = 1.256. Max 9
If I look at the file log for \Device\Harddisk1\DR1 entries it tells me that the read time for a 25,385,984 bytes xfer was 1.27s with a 97.64% fast IOP count. (73% random/ 27% sequential).
That comes out at 19MB/s for 4K file xfers with an avg QD 1.024 ~ around the type of figure you could expect from AS SSD/ CDM @ QD1.
Maybe the QD is measured at the SSD or maybe I am missing a trick with hIOmon?
EDIT:
I tried copying the same folder 12 times. (All at once):
• Avg QD read = 1.0245. Max 3
• Avg QD write = 1.417. Max 11
Makes hardly any difference.
Try something on a much larger scale - 4K files of say 100MB size total, copied 5 times at least. I can make a small Delphi app to automate it.
I would not expect much difference however, because what you see in terms of writes at the device level is what the cache manager sends to the device (presuming you are copying with Explorer). CM doesn't send a lot of data asyncrhonously (QD > 1), so even in the case I suggested you'll likely end up with the same QD measured.
All copy operations would get cached, and then dumped at the CM's discretion, not in parallel.
If you used the test scenario by utilizing a utility such as Total Commander or similar that can copy files while bypassing the cache you should see a higher QD - about as much as there are copy runs at the same time.
Thanks alfaunits :)
I tried copying 12 folders at the same time and I think once Windows figured the folders were all the same it stopped reading at the devive level and starting reading from cache. ~ 2GB of data got written at the device level, so all the copying ended up on the disk but only ~70MB was read from the disk.
I will try Total Commander now. Good idea. That should then read/ write to the disk and bypasses the Windows file system, so lets see what happens then. :up:
OK, using Total Commander, copying the same folder 4 times all at once. (Got one xfer going, then started the next etc)
QD Avg ~ 1.003
Max = 4.
One copy = max QD 1. Two Copies = max QD 2. etc.
are you measuring at device level A01?
from the presentation it says that loading mpeg2 and saving mpeg4 are very drive intensive. dunno if that is just an example or what...
Hi Comp, yep I recorded everything at the device level. I dont think QD changed much at any level.
Did you set up Total commander to use "Big file copy option" without restrictions? Its default is to use the Shell API which in turn use the cache.
It sounds rather strange that multiple transfers at once in TC would not get higher QD (presuming they did happen at the same time, with 20MB/s it's hard to even start 3 100MB copy streams at the same time).