Bob nice stats there! :D
the i7 860 is 3656/12205
the i7 920 is 3673/11793
I'll work out some averages for WUs for each of them when I get home from work tonight.
emu
Printable View
Most excellent work Bob...Goosey is blown away impressed. Perhaps a private penalty for all that goodness...:rofl:
OK, after tonight's update I have some PPD numbers. Only five days worth, but I think these are worth sharing. They both appear to be reasonably settled into the HFCC WUs. Remember, this is with both rigs at 3.7 gig.
920 Rig
10/2/09 0:008:10:59:18 32,373 53
10/1/09 0:008:22:37:26 35,344 54
9/30/09 0:006:03:52:55 24,962 39
9/29/09 0:007:18:16:47 31,486 49
9/28/09 0:008:18:14:21 35,678 55
Average PPD = 31,968.6
860 rig
10/2/09 0:008:03:56:55 30,764 51
10/1/09 0:008:19:30:51 32,655 54
9/30/09 0:008:15:40:42 31,515 50
9/29/09 0:010:04:12:05 36,808 61
9/28/09 0:009:16:20:31 34,293 59
Average PPD = 33,207
So, they are in the same area. The 860 scored a little higher due to the bad day the 920 had on 9/30. I'm not sure what that was about but I think it would even out as I run more days of data.
I'm thinking about making a max clock attempt and seeing how far I can get each one. With the individual WU data, and this data, I'm pretty convinced they are on equal footing clock-for-clock. What remains to be seen is what happens to power efficiency if I force max clocks.
Clearly this data shows that if you're not going to push the rig really hard, the 860 is a more power efficient choice for pure crunching.
Bob
Great stats Bob. I'll update this post with WCG average PPD when WCG comes back up.
My feeling is that if you want power efficient then the i7 860 is the way to go just dont expect to be able to oc it like the 920s it will need much more juice. I'm really keen to find the max OC I can get with stock voltages.
Maybe I jumped the gun on the update? If so, I'll edit my data above. It looked like I had full data for 10/2....If it's more, I think I'm OK with that....:D
Bob
Bob,
I see you are dividing total points by "calendar days" which doesn't really reflect what the machines are producing due to varying amounts of run time per calendar day caused by the validation process and WU reporting timing. A more accurate calculation is the average number of points per hour of runtime.
For the 920:
962.01 hours of CPU time
159,843 total points
Average points per hour of runtime 166.16
Multiply by 192 to get a theoretical "perfect" calendar day with 8 days of CPU run time 31,903 ppd.
For the 860
1091.7 hours of CPU time
166,035 total points
Average points per hour of runtime 152.09
Multiply by 192 to get a theoretical "perfect" calendar day with 8 days of CPU run time 29,201 ppd
Looking at these numbers, the 920 is outperforming the 860 by about 9%.
Hmm nice pickup fallwind this removes the fluctuation in calculated runtime due to all sorts of variables.
Using your calcs I get 164.4 PPH for the 860 and 174.4 PPH for the 920. Both at 4GHz
this would give me a 31561 PPD average for the 860 and a 33487 PPD average for the 920. which is about a 6% difference between the two.
So 8% differemce between 4GHz and 3.7Ghz for the 860 at least. 5% between the 920s.
I'll have to wrap my head around that a bit. Does this mean the 920 is doing better in the quorum? The actual awarded points are subject to the quorum award. Therefore that number would fluctuate a bit?
I also guess I don't understand what the validation process and report timing have to do with the WU CPU time taken.
We've seen by the raw WU calculation that the time taken for WUs is similar. Perhaps it would be interesting to roll the claimed vs granted into the mix too. If I'm correct in understanding the math above, the granted WU sum divided by the WU time sum, should equal your number above for each machine. Yes?
Still gotta give this a bit more thought. Certainly 9% difference is significant. I may have to run a lot longer to take out the possible effect of starting the 860 rig with so many HCC WUs. When I compared WUs, I only used HFCC units.
Well, these rigs are not going anywhere for quite some time, so we can do whatever you guys want to see here...:up:
Certainly thanks for bringing that out. I've never really gotten down to brass tacks of one rig vs another. :toast:
(BTW, don't get too comfortable while my main rigs are out...This "test" is all just part of my master plan to catch back up when the rigs come home. I'm sneaking more hardware onto my account this way...:ROTF: :rofl:)
Bob
OK, since this was bugging me, I cut a bit into my normal Friday "bar-time"....:p:
I took the same 105 WU data sets I had posted previously, added up the total runtime, the total claimed, and the total granted. I then did the math. Note that the WUs report in Boinc points, so I did the final conversion (x7) to WCG points at the end of the strings.
These are all HFCC WUs.
920 rig
Average WU Time 3.779 CPU hours
Claimed points per CPU hour 28.008
Granted points per CPU hour 23.927
Claimed PPD ideal day
BOINC 5,377.471
WCG 37,642.296
Granted PPD ideal day
BOINC 4,594.027
WCG 32,158.186
860 rig
Average WU time 3.854 CPU hours
Claimed points per CPU hour 26.608
Granted points per CPU hour 22.504
Claimed PPD ideal day
BOINC 5,108.75
WCG 35,761.281
Granted PPD ideal day
BOINC 4,320.807
WCG 30,245.647
This WU analysis says that the 920 is 6.3% better when looked at this way.
I'm still going to give more thought to this. The numbers reported in the raw PPD above are actual production. Isn't that what the machine really did? :confused:
In any case, a few more days of data are probably needed.
Regards,
Bob
Not really, the numbers you posted are only for WU that were reported and validated during the 24hr period used for stats, not what the machine actually "crunched". Here's an extreme example: a machine crunches through 20 WU in 24 hours, but at 6pm when the stats are run, there are 6 work units "ready to report" and another 6 were inconclusive and therefore didn't validate. This machine will only get credit for 8 WU which is not a true refection of what the machine crunched that day because it crunched 20 WU. Calculating average points per hour of runtime eliminates the above variables and gives you a solid number to compare 2 machines.
Looking at your "claimed vs granted" numbers, the 2 machines are very close percentage wise. The 920 overclaims by 14.6% the 860 overclaims by 15.4%. So it doesn't look like either one is getting disadvantage more than the other in the quorum.
Here's a couple questions on my mind: how many sticks of RAM are in each machine? Is HFCC quorum 1 or quorum 2?
I'll answer the last questions first, since it might have more bearing than the rest? The 860 rig is running 2x2gig in double channel. The 920 is running 6 gig, 3x2gig in triple channel. The 920 is running 1800MHz high grade Kingston Ram, the 860 a lower grade Corsair 1600MHz. This never made a difference before, but we are looking for the truth here....IDK. I'm running the timings in the screenies earlier. We've got this weird thing called QPI too.... LMK if something doesn't look right in any of that. I tried to match the rigs.
When we started this test, I switched over to HFCC since it had a quorum of 1. Since we started, it became a quorum of 2.....:mad: So, in my mind, it's no better than HCC in this evaluation. .......And for the record, I'd rather run HCC. That's my project of interest.
Thx for your reply on the math, I think I understand now. Realize on both of these rigs, there are no inconclusive, invalids, errors, "too lates", or any of that. I run my clocks so there are no anomalies. That's why my clocks are lower than most would report, I think....
As to "ready to report" WUs, those would fall into the next day report, therefore, it is important to average over a longer time? No?
I'm about pratical WU production, as I know you are, not "points", so no errors, etc. are allowed. That is lost production in my book and counts against the offending rig. Of the data presented here, I have zero errors, zero invalids, zero "too late", and zero inconclusive WUs.
So, my rigs report only what they have earned. The quorum may make them report late at times, but if I average over days and weeks, does that take out the variance? I think so.
Man, where the hell is Meshmesh when you need him. He had this stuff down cold....
Yo, MM, I know you're reading this. :yepp: Please have Mesh check in here and straighten me out.....:up:
Let's all get this straight, it's a really important point about gauging how our rigs are really doing.....:up:
Regards,
Bob
Hmm.. interesting. Even though I really really REALLY hate math :D
Now where does the difference in PPD lie.. certainly not Dual vs. Tri Channel, I can assure you that. Maybe ramtimings? Unlikely though. Something to do with that QPI link I'd say. Can you further increase the QPI mult on the 860? Or raise BCLK and lower multiplier?
For example, I run my 750 at 200BCLK, which gives 7,2 GT/s bandwidth with the 36x QPI mult (highest there is for the 750).
I can and will raise the QPI on either rig, if we need to. Right now they are equal. It is clear that this makes a difference based on the initail results I had with the QPI at x16. It scored low there......
But "the how" we count a PPD difference, seems to be the issue? I don't think the 860 is slacking right now, by my criteria of no errors, etc. Real PPD. ???? IDK.
jcool, can you see a difference in the rigs based on the screenies?
Regards,
Bob
Yeah, I can't imagine dual channel vs triple channel makeing a difference in WCG, same goes for RAM speed and timings. I run my i7 crunchers in dual channel and they seem to produce on par with my main rig with triple. There could be a few % difference, I haven't checked that closely.
Absolutely! I think 2 weeks would be the minimum to eliminate the variables and get a good average. 30 days even better. But since there is a quorum of 2, you are always at the mercy of your wingman but with enough data points, it "should" average out eventually.Quote:
As to "ready to report" WUs, those would fall into the next day report, therefore, it is important to average over a longer time? No?
The point I was trying to make in my previous post was it's unfair(at this point in time) to compare the 2 machines in terms of points per calendar day. You have 5 days of stats which is 120 hours. An i7 will produce 8 days or runtime per calenday day or 960 hours in 5 days. The 920 is pretty much spot on with 962 hours, the 860 is way over at 1091 hours. The only way this can happen is if work units that were crunched before the test began have been validated during the test. Effectively the 860 has had a head start. So it's PPD will be skewed higher if you calculate total points divided by calendar days. Dividing total points by CPU time instead of calendar days eliminates this anomaly.
Okey Dokey, I'll leave the rigs alone at 3.7 gig and let them keep piling up data. I do think the leftover HCC units may have been skewing the results a bit. Those are mostly gone now so I'll start with this week's data when it comes in. I only have four HCC units left in the 860 rig's pending validation queue. :up:
What is a little curious to me is that I have a page and a half of HFCC pending val on the 920 rig, and only one WU pending val for the 860 rig right now. I'll keep an eye on that too.
Thx for the help sorting this out.:clap:
Bob
I might be able to help out a little this week.
I got my X3440 in along with mobo. I do not have a good cooler yet, but might be able to get something to work, will have to see.
I have 2 different sets of identical ram. I will run these in the lynfield and the 920. I will get the same oc on both. This should keep any variable out. Except the fact that lynnfield might require more juice to get the same oc due to on-chip pci-e.
Anyway, I do not have a killawatt, but will try to help out as much as I can. I am curious to see if I wasted my time going lynnfield or not. I hope not as I could have built another 920 rig for the same price or a few $ more. Oh well, trial and error right:ROTF:
I should have lynnfield up by tonight and might have to readjust one of my 920's to match. will then try to compare.
I'm sure we all would like to know how it clocks. You may have found a gold nugget there...
Bob
I know you folks don't know me well so if you disagree please just let me know and I'll move along :-)
I think a quicker and more accurate way to create direct comparisons would be to process the same exact WUs (shut BOINC down, copy and paste BOINC data folder, suspend network activity) and then look at total runtime for the entire set. The set would only have to be the max WUs that could be processes in 12 hours because beyond that would not tell us anything different from a statistical perspective.
This eliminates all the caveats when trying to use points as a comparative basis (each WU set is completely different, wingmen, WCG changing target runtime length, when was the last time you ran BOINC benchies, the list goes on). It then becomes very easy to identify changes that produce significant improvements ... BCLK, vCore, QPI, RAM amount, bandwidth, timings, HD vs. SSD, graphics, power usage 110 vac vs 220 vac. You can add any data element you want and as long as you continue to use the same WU set you will have directly comparable results. I know there is a lot of conventional wisdom on some of these elements so we could skip testing those ones (or do a quick sanity check to make sure) and move on to those we do not already have a handle on.
Now someone will come along and ask if we can swap WU between different OS, the answer is no and that is where we have to handle the averages as best we can. First I would suggest that comparisons be done within compatible architectures. Instead of trial and error I could email WCG as they already have these groups defined. Once there is a *best* consensus within a group we could start to compare runtimes from X number of WUs between groups for the same subproject ... I am still trying to stay away from points because of all the issues mentioned. We certainly will need to refine this strategy because the RICE and CMD2 subprojects have soft and hard stops based on time limits if you have not finished the WU. We can figure out those deets when we get there.
If there is interest I will start to lay out what we elements we want to test and create some form of data structure (Excel or maybe some free SQL tool). I think this will not only provide results we can all benefit from right away but also positions us with a testing methodology to quickly evaluate new platforms, OS, crunching efficiency theories etc.
All that being said, thanks for your patience, and is there a configuration you would like me to test on my i7 920 C0 (currently at 19*207) and are there any suggestions on a cheap and accurate multimeter?
I know the original focus is on power efficiency but for those of use not running farms and/or not real concerned about power costs I would suggest that 1% improvement in runtime is significant but also at the bottom limit of what we should be looking for. Roughly for an i7:
8 cores * 24 = 192 hours
192 * 1% = 1.92 hours
on average you would turn one extra WU every two days giving an ROI of just under 1 week for each 12 hour test that improves runtime by 1% ... yes, I'm convinced this is a good low end target.
The 860 stats from Sept 28-30 are what is throwing things off. There's well over 8 days CPU time on each day which is normal when switching from quorum 2 to quorum 1 projects. Now that the HCC WU's are gone a clearer piture for the 860 will emerge. The PV queue for the 920 will throw off next weeks stats though so maybe we'll look at the average starting on the 1st and run the test through the 15th or something. Looking forward to next weeks stats! :up:
You're absolutely right. That would give a 100% accurate picture of WCG processing power. It would involve a bit of planning and taking both machines offline for a day or so(which no cruncher likes to do! :) ) to run the batch of WU but would provide a clear comparison.Quote:
I think a quicker and more accurate way to create direct comparisons would be to process the same exact WUs (shut BOINC down, copy and paste BOINC data folder, suspend network activity) and then look at total runtime for the entire set. The set would only have to be the max WUs that could be processes in 12 hours because beyond that would not tell us anything different from a statistical perspective.
I need a new HSF and quick. I was running the x3440 on stock cooler; 3.8ghz@1.3v. normal op temps were in the low 40's, ran WCG and skyrocketed to 90-95c. I put it at 3.5ghz @1.2v on stock hsf load temps 80c. I cannot believe they provided such a crappy hsf for this cpu :mad: :shakes:
I really do not want to spend too much. Looks like the coolermaster @$30 might be the best hsf/cost, but I do not know that it would provide adequate cooling. Next in-line would be a megahalem for $65 plus shipping, or a noctua for 65-70 plus shipping. So its either go cheap or double the price for about the best air cooler atm.
I would get another TRUE if they would ever release the bolt-through kit :shakes:. I guess when i finally decide to go H2O, I will have some spare coolers:ROTF:
I am leaning towards the megahalem, du to complexity of the mounting for the coolermaster, plus I want maximum performance. I also hear that the megahalems have a better machined base than the true and mounting is far superior than the TRUE. Guess I will order one. Need to find another psu too as I am collecting too many gpu's:rofl:
Looks like I might not be able to help till I get adequate cooling; maybe I shoud run stock 860 clocks and see how it does:shrug:
If its any help I have a Megahalem on my i7 920 (1336) and at 1.26Vcore its at 65 DegC (Max) load with two xigmatek fans.
Good idea! So we just run a batch of X WUs and check the completion times right?
If so, make sure to include enough WUs in the process so that highly multithreaded machines won't be half idle. Which presents a problem.. 24 WUs would be a good number for SP rigs, it's divisible by 8 (Core i5/i7), by 4 (i5 750 or older Quads), and by 12 (Gulftown). However, my Dual Gainestown as well as the Quad Opteron (16 Threads) would have an "idle problem" there.
So I'd suggest 24 "test"-WUs and just leaving the multi sockets out? Since most people crunch on single sockets.
Some thoughts on WCG benching - HTH.
Fallwind's "BOINC data snapshot" method is definitely the most accurate way of comparing crunching speeds. I did this a while back to compare speeds under XP-32 vs under XP-64 + 64-bit client, on the one QX9650 machine. (64-bit was about 0.5% faster, for HCC).
So you definitely can run the same set of data on 2 versions of Windoze on the same machine.
However, Linux BOINC data files are not compatible with Windoze ones, so you can't compare across these OSes. I don't know about Linux vs Mac OSX (both *nix).
I don't know whether you can run the same data on different machines with same OS. I know the BOINC system is fussy about device IDs, and won't let you crunch, upload and claim WUs from the wrong device, so you can't do "sneakernet", but it may let you just run foreign WUs. (I forget whether, in my win-32/Win-64 comparo, I had to give the machine the same LAN device ID)
Remember NOT to let BOINC try to upload the same set of results more than once. Run tests with network activity disabled.
To make results comparable, whether using the data snapshot method or credits awarded/hr, you need to be running the same mix of WCG projects, especially with HT (critical). Even without HT, different projects need different amounts of main memory access, and hence get different amounts of memory bus contention. Example: On my A64X2 (512kb cache/core), PerfMonitor 1.0.0.8 shows that HCMD2 gets 99.6% L2 cache success rate, while faah gets around 87%.
Easiest is to run just 1 project.
If wou want the most accurate comparisons without using the data snapshot method, and a minimum number of WUs crunched, I suggest:
* Pick a project with a quorum of 1. If HFCC has moved to Q2, I guess that leaves faah.
If that's unsuitable, pick a project with minimum variation in run-times. HCC scores here, but you'll need a larger sample size to average out the wingmen.
* If using a quorum of 1, look in your Results Status pages (not Device Stats), and pick out only those valid results that had a quorum of 1, ie don't count check-WUs or repair-WUs where a wingman got Invalid. That eliminates the wingmen.
Then calculate average Credits Awarded/hr.
There is some variation in CA for faah WUs of the same length, and I suspect that longer faah WUs get higher CA/hr.
There is probably variation in CA/hr between WU batches, too. See Sekerob's Charts.
PS 1: To avoid confusion when talking about BOINC or WCG "points", note that BOINC calls them "credits", so I suggest you use these names too, ie 1 BOINC "credit" = 7 WCG "points".
PS 2: I was hoping that i7-860 would be a clear leader in energy/credit, but it seems to need more volts than 920 @ same clock, and crunches more slowly. I'll stay with LGA775 until 32nm.
PS 3: Thanks for all of your work, as well as being first kids on the block to risk forking out $$ for Lynnfields.
This has come up a couple of times so a quick recap on Zero Redundancy (ZR) quorum:
ZR is a misleading term because for every 10 WUs that WCG sends out for these subprojects (currently HFCC and FAAH but it was also for DDDT and FLU) they will send out a second copy so it is not really zero redundancy! They do this to make sure everything is on the up and up. If you have any errored WUs on any subproject they will start sending you 2 quorum WUs for your ZR subprojects until you can prove that you are reliable again.
To see what quorum a subproject uses there is a skinny little column about in the middle of this chart ... http://www.worldcommunitygrid.org/fo...d?thread=17855