Ao1, can you post your XLS file?
I've posted the hIOmon PerfAnalysisExportFile here:
http://cid-541c442f789bf84e.office.l...x/.Public?uc=1
This was generated on the screen shot below. This time the CPU is over clocked to 3.5. The file opens 3 to 4 seconds faster.
• Fast IOPS 100% Read 100%
• Busy Time 95.01MS
• Max Read IOPS. 1.7
CPU - QX 6850. (3MHz stock) Graphics Card - ATI 6950 with 2GB RAM.
Same link. It's called AS SSD 4K.xlsx. 20.3MB.
Process Monitor seems to be monitoring at the logical disk level. With hIOmon it is possible to see that at the device level the SSD is not the cause of the delay. The max IOPs are really jumping out. 1.6/ 1.7. Its like the CPU is waiting to process an xfer before the next one is called. Don't know really, just a guess. It speeds up when the CPU is running faster. Would be interested to see what you think. In theory if your CPU is @ 3MHz it should take just as long to open even with 4 x X25-E.
I don't see this as being CPU bound I/O - the process is definitely CPU bound but by Excel processing not by I/O processing.
On my laptop, I see that 32KB I/O from Excel takes a mere 0.01-0.02 milliseconds to process, and if the CPU is running at half the speed it takes ~0.02ms most of the time. That does indicate the CPU speed plays a role, but I am guessing it has more to do with MEMORY speed (since fast I/O is mostly a memory copy operation).
I don't know how to read your hIoMon screenshot, but from that it seems the fast I/O takes 0.14ms at least to finish?
You can run IoMeter with 32KB sequential reads and see the response times.
At the volume level the read time was 0.0943ms to xfer 21,366,272B. Max read IOP time 0.0006. 165 IOPS Sequential (21,135,360B). 39 IOPS Random (226,816B). Avg QD 1.019 Max 2. Those stats should be more or less the same at the device level looking at post #99.
Even though the data xfer took 0.0943ms I still ended up waiting 15 seconds for the application to fully open.
Btw when I upped the CPU clock it automatically upped the RAM clock.![]()
Last edited by Ao1; 04-25-2011 at 06:27 AM.
Hence why I say it is not the I/O that is bottlenecking Excel but Excel processing itself.
Hmm, now I am not sure I expressed the right unit. It takes 0.01-0.02 MICRO seconds, not milliseconds for the fast I/O (procmon Duration column).
At 0.01-0.02 microseconds per 32KB, that's around 7-14milliseconds for 20MB. (seems like a lot if you think that's device transfer rate, but in reality it is memory transfer rate - so it is really low even).
So it's more than safe to say Excel is using the CPU - let's move on
I know, hence the apparent increase of the fast I/O speed - since all it does is more or less copy the data in memory.Btw when I upped the CPU clock it automatically upped the RAM clock.![]()
Have we decided whether to go with the M4 or V3 yet?
24/7 Cruncher #1
Crosshair VII Hero, Ryzen 3900X, 4.0 GHz @ 1.225v, Arctic Liquid Freezer II 420 AIO, 4x8GB GSKILL 3600MHz C15, ASUS TUF 3090 OC
Samsung 980 1TB NVMe, Samsung 870 QVO 1TB, 2x10TB WD Red RAID1, Win 10 Pro, Enthoo Luxe TG, EVGA SuperNOVA 1200W P2
24/7 Cruncher #2
ASRock X470 Taichi, Ryzen 3900X, 4.0 GHz @ 1.225v, Arctic Liquid Freezer 280 AIO, 2x16GB GSKILL NEO 3600MHz C16, EVGA 3080ti FTW3 Ultra
Samsung 970 EVO 250GB NVMe, Samsung 870 EVO 500GBWin 10 Ent, Enthoo Pro, Seasonic FOCUS Plus 850W
24/7 Cruncher #3
GA-P67A-UD4-B3 BIOS F8 mod, 2600k (L051B138) @ 4.5 GHz, 1.260v full load, Arctic Liquid 120, (Boots Win @ 5.6 GHz per Massman binning)
Samsung Green 4x4GB @2133 C10, EVGA 2080ti FTW3 Hybrid, Samsung 870 EVO 500GB, 2x1TB WD Red RAID1, Win10 Ent, Rosewill Rise, EVGA SuperNOVA 1300W G2
24/7 Cruncher #4 ... Crucial M225 64GB SSD Donated to Endurance Testing (Died at 968 TB of writes...no that is not a typo!)
GA-EP45T-UD3LR BIOS F10 modded, Q6600 G0 VID 1.212 (L731B536), 3.6 GHz 9x400 @ 1.312v full load, Zerotherm Zen FZ120
OCZ 2x2GB DDR3-1600MHz C7, Gigabyte 7950 @1200/1250, Crucial MX100 128GB, 2x1TB WD Red RAID1, Win10 Ent, Centurion 590, XFX PRO650W
Music System
SB Server->SB Touch w/Android Tablet as a remote->Denon AVR-X3300W->JBL Studio Series Floorstanding Speakers, JBL LS Center, 2x SVS SB-2000 Subs
I found a great IDF 2011 paper by Intel on benchmarking SSD's. Not only does it discuss pitfalls of typical benchmarks (Anandtech included) it also looks at performance where it matters.
Even if it was assumed that benchmarks were correct any real life benefit is hamstrung by the OS/ Apps/ CPU/ GPU.
https://intel.wingateweb.com/bj11/sc...734BA3BAAF85A9
Great find, Ao1. I like the way the author of that presentation (James Myers) thinks -- it makes sense. The presentation emphasizes "bursts", periods when a lot of IO is taking place, and "hourglass moments", periods when the user is waiting on the computer to finish something.
Two points from the presentation that I found interesting:
1) The average QD during the bursts was about 20
2) The hourglass moments were 80% writes.
This suggests a theory (or at least a hypothesis): Sequential write speed and high-queue depth random write speed are important for minimizing hourglass moments.
Last edited by johnw; 04-29-2011 at 06:03 PM.
yes i read this a few weeks back. i have emailed Intel about whether or not they have an audio on the presentation like they did on the last IDF> no response
looks to be moving to traces to measure ssd is the way to go. if you look at PCMark7 i believe they will be moving more towards this very method of quantitative measuring of performance, they do state that they will no longer measure theoretical peaks of bandwidth to measure storage performance.
that might be a bit of an oversimplified blanket statement there that would be hard to make given the data. there are differences, but they are hindered by OS/APPS/CPU/GPU, not eliminated. as he points out on page 25Even if it was assumed that benchmarks were correct any real life benefit is hamstrung by the OS/ Apps/ CPU/ GPU.
that is a big difference. 3 seconds compared to 7 seconds is a large margin. percentage wise your looking at 67 percent difference.Some SSDs in 3-4 sec range, others 5-7 secs
he does sum it up nicely towards the end by saying "ecosystem improvements needed to showcase "ssd class" performance.
we need new filesystems. also, new games that are coming will access data differently when they detect an SSD, and this is jut the beginning of apps that will change their access patterns when ssd are detected. just will take time. they talked alot about those changes in last years IDF.
it is VERY hard to make a complete reasoning from the PDF alone. we need the audio to really put it together. they did these for san fran IDF, but for Beijing i cant find the whole presentations. GAH i wish we could get them...
Last edited by Computurd; 04-29-2011 at 07:37 PM.
"Lurking" Since 1977
![]()
Jesus Saves, God Backs-Up *I come to the news section to ban people, not read complaints.*-[XC]GomelerDon't believe Squish, his hardware does control him!
Hi John,
I picked up on the QD20 as well and it's been puzzling me.
I tried an experiment. I created a folder with 4,956 4k files. 19.3 MB (20,299,776 bytes).
I set hIOmon to monitor at 1 second intervals and monitored at the device level. (Not what happens at the SSD, but as close as possible within the Windows file system). I copied the folder from the E drive to the E drive.
Once it had started copying I copied it again. So, two folders were being copied at the same time. 38.6MB of 4k xfers. Surely this would show a high QD? Nope.
• Avg QD read = 1.0248. Max 2
• Avg QD write = 1.256. Max 9
If I look at the file log for \Device\Harddisk1\DR1 entries it tells me that the read time for a 25,385,984 bytes xfer was 1.27s with a 97.64% fast IOP count. (73% random/ 27% sequential).
That comes out at 19MB/s for 4K file xfers with an avg QD 1.024 ~ around the type of figure you could expect from AS SSD/ CDM @ QD1.
Maybe the QD is measured at the SSD or maybe I am missing a trick with hIOmon?
EDIT:
I tried copying the same folder 12 times. (All at once):
• Avg QD read = 1.0245. Max 3
• Avg QD write = 1.417. Max 11
Makes hardly any difference.
Last edited by Ao1; 04-30-2011 at 06:00 AM.
Try something on a much larger scale - 4K files of say 100MB size total, copied 5 times at least. I can make a small Delphi app to automate it.
I would not expect much difference however, because what you see in terms of writes at the device level is what the cache manager sends to the device (presuming you are copying with Explorer). CM doesn't send a lot of data asyncrhonously (QD > 1), so even in the case I suggested you'll likely end up with the same QD measured.
All copy operations would get cached, and then dumped at the CM's discretion, not in parallel.
If you used the test scenario by utilizing a utility such as Total Commander or similar that can copy files while bypassing the cache you should see a higher QD - about as much as there are copy runs at the same time.
Thanks alfaunits
I tried copying 12 folders at the same time and I think once Windows figured the folders were all the same it stopped reading at the devive level and starting reading from cache. ~ 2GB of data got written at the device level, so all the copying ended up on the disk but only ~70MB was read from the disk.
I will try Total Commander now. Good idea. That should then read/ write to the disk and bypasses the Windows file system, so lets see what happens then.![]()
OK, using Total Commander, copying the same folder 4 times all at once. (Got one xfer going, then started the next etc)
QD Avg ~ 1.003
Max = 4.
One copy = max QD 1. Two Copies = max QD 2. etc.
are you measuring at device level A01?
from the presentation it says that loading mpeg2 and saving mpeg4 are very drive intensive. dunno if that is just an example or what...
"Lurking" Since 1977
![]()
Jesus Saves, God Backs-Up *I come to the news section to ban people, not read complaints.*-[XC]GomelerDon't believe Squish, his hardware does control him!
Hi Comp, yep I recorded everything at the device level. I dont think QD changed much at any level.
Did you set up Total commander to use "Big file copy option" without restrictions? Its default is to use the Shell API which in turn use the cache.
It sounds rather strange that multiple transfers at once in TC would not get higher QD (presuming they did happen at the same time, with 20MB/s it's hard to even start 3 100MB copy streams at the same time).
Bookmarks