just found the spi part...again ~18-19C ambient, 1M and 32M.
1M
32M
You said to post error messages, here is one, booted at 4.76, used set fsb to go to 4.85, ran spi 1m 3x in a row, same error message each time.
![]()
just found the spi part...again ~18-19C ambient, 1M and 32M.
1M
32M
You said to post error messages, here is one, booted at 4.76, used set fsb to go to 4.85, ran spi 1m 3x in a row, same error message each time.
![]()
Right, it's been a while since I mentioned that the program doesn't like it when the FSB or bclk is messed with.
It's a side-effect of the anti-cheat protection.
Part of the protection is specifically targeted to help guard against the "time-slowing" cheats in this thread:
http://www.xtremesystems.org/forums/...ad.php?t=46926
Those cheats are obviously a highly guarded secret, but one of the mods a while ago was gracious enough show me one so that I can "try" to counter it.
So no, it's not a hardware error. If you tried booting at 4.85, it should be fine. I do realize this is more than a minor inconvenience, but I have no ideas on an alternate approach.
And I AM curious as to how SuperPi XS Mod 1.5 is immune to the cheat.
For the next version, I guess I should change the error message for this particular check to
"Abnormal Frequency Measurement - Unable to Validate Benchmark
Note that modifying your Bus Speed is known to cause this error.
If you used SetFSB or a similar tool to get to this frequency, try rebooting at this speed."
EDIT: And nice records btw
And new world speed record for 1M among any computer. (using a publicly available program*)
*My latest build caps the % status printing to once per second, so less time is spent printing.... lol
So I actually have a slightly faster time than that @ only 4.2 GHz (0.326654 secs)
Last edited by poke349; 09-07-2009 at 11:09 AM.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
and another one @ 100m
96.384 - v0.4.1 x64 SSE3 - El Greco - AMD Phenom II X3 720 @ 3.5 GHz
now 83.5
![]()
"friends don't let friends run RAID-0"
Just a little heads up for v0.4.3
Automatic version detection.
As far as optimizations go...
I mentioned earlier in the summer that I had been playing with the Intel Compiler. (since enough people had suggested that I make the switch)
It took a bit of tweaking, but I finally got the Intel Compiler to produce a worthwhile speedup over Visual Studio for all SSE versions.
So thanks for that suggestion everybody.
I'll probably be complementing that with a couple of algorithmic improvements sometime in the next few weeks.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Version v0.4.2.
25,000,000 digits after OC to 3.7Ghz
![]()
Nice, haven't seen too many dual cores lately...
I suppose most of the Pi benchers are still running C2Ds, but were scared away by all the quad-core and multi-socket ownage in this thread.
Aside from that:
Now about those SetFSB and validation errors.
I've noticed that some people were mistakening sanity errors for hardware errors.
So to clear it up, Sanity Check Errors are NOT computation errors. They pop up when the program detects some abnormal situations such as clock-tampering, system speed tampering...
The (unwanted) side-effect is that SetFSB and similar tools will also trigger the error.
Different versions of the program have different levels of sensitivity. (I've been re-tuning it between versions to find a good balance between being able to catch cheats vs. minimizing false-positives.)
I've changed the error message in v0.4.3 to hopefully clear up some of the confusion... since few people will read this post after it gets buried.
Boot up Frequency: 167 MHz bclk
TurboV to: 168 MHz bclk
*More changes to come... More optimizations...
Not like anyone cares about x86 anymore (I don't either), but I've re-tuned the multi-threading settings in the x86 binaries and now they put a much better load on the cores.
So x86 is a lot faster now. (almost as fast as x64 in some cases)
If you haven't already noticed from the screenies, there's support for SSE4.1.
(I know... I'm not being fair to AMD...But it was a little too tempting.
Bulldozer will solve that.
)
I've also added checksums to batch benchmarks![]()
Last edited by poke349; 09-23-2009 at 10:43 PM.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
I will add more PI benchmarks but first I need to buy new MobO... Bios my evga nforce 650i ultra board died after this OC.![]()
I will add new results soon
Sorry for my english.
Sorry hear that.Is that the second mobo that has fried on this thread?
The program can be more stressful to the system than prime95 so I hope that everyone will keep that in mind when pushing the limit with this program.
*And your english is fine. I didn't notice anything until you mentioned it.
Now for some good news...
As of Friday afternoon, I finished the last of the optimizations that I had planned for v0.4.3 and now I've locked it down for beta-testing...
Spent much of yesterday night and this morning running benchmarks on all the computers I had easy access to.
And I've updated the first post with the results...
Several things:
First, I want to make it clear that I'm not being fair (nor am I trying to be fair) to AMD by supporting SSE4.1. I optimized for whatever machines I had access to.
Second... The speedups in this version are rather astounding - especially for x86 and Core i7.
Single-threaded x86 (no SSE) is now comparable to PiFast 4.3.
Throw in SSE3 and it beats PiFast 4.3.
The specially optimized binary for Core i7 is amazingly fast compared to v0.4.2 x64 SSE3...
Just look at the results on the first post.
This took me completely by surprised when I finally ran the numbers.
So it's safe to say that all results obtained with v0.4.3 are NOT comparable to v0.4.2 and earlier...
Third... Batch mode now has checksums that are "fully compatible" with the regular benchmark checksums. When v0.4.3 is released, feel free to enter results from either normal or batch modes.
And for that matter, batch mode also gives you thread-control so you can reduce the # of threads for those smaller benchmarks where too many threads will hurt.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
sweet. so the core i7 gets special treatment.
now you just need to implement this through openCL.![]()
Not just Core i7.
There will be 5 binaries in this version.
x86
x86 SSE3
x64 SSE3
x64 SSE4.1 ~ Ushio (tuned for my LanBox)
x64 SSE4.1 ~ Nagisa (tuned for my workstation)
So two SSE4.1 versions tuned for Core i7 and Harpertown.
x64 SSE3 has been re-tuned for a smaller cache than the previous versions.
This will help out AMD chips and any non-12MB cache Core 2 Quads.
Prior to v0.4.3, x64 and x64 SSE3 were both tuned for 12MB cache... (for my workstation)...
Turns out that this was the culprit that was hurting virtually all other processors including all AMD - since nothing else had 3MB cache/thread... And not surprisingly, it hurt i7 the most.
EDIT 1: The two SSE4.1 versions are fully compatible with each other and should theoretically run on Bulldozer as well. The only difference between them is the tuning.
EDIT 2:
The speedup via SSE4.1 is very small (a fraction of a %). So non-12MB Yorkfields will use x64 SSE3 instead of x64 SSE4.1 ~ Nagisa because of the more favorable tuning.
And it is possible to override whatever the auto-selector chooses.
Last edited by poke349; 09-26-2009 at 04:36 PM.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Here's a few screenies of v0.4.3 on my three fastest machines: (click to enlarge)
![]()
![]()
And here's something VERY interesting that I totally wasn't expecting to see...
Temperatures!!!
prime95 - small FFTs:
69/63/67/64
IntelBurnTest - 4096MB + 8 threads:
68/62/66/63
y-cruncher v0.4.3 - x64 SSE4.1 stress test-7GB
70/65/69/65
I don't recall v0.4.2 running hotter than prime95...
Did those optimizations really make things a bit more toasty?
Screenies here: (click to enlarge)
![]()
![]()
Temps on Core 2 were evenly matched between prime95 and y-cruncher. LinPack still owns @ several degrees hotter.
Anyways... I promise that v0.4.3 won't be beta-testing for 2 months like v0.4.1 was...
Assuming no problems, I might release it in a week.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Is it possible to get speed-up from running this on a GPU? It seems that this algorithm lends itself well to being parallelized efficiently and if each computation is independent from one another then it will work well on a GPU. But it seems this program requires a large amount of memory and I don't know if the GPU can access CPU memory.
Have you looked into this at all?
Oh hey! If I'm not mistaken, you were one of the first to post a benchmark.(but lost due to the rollback
)
This is partly true. Although the algorithm is far from perfectly parallelizable, it should (in theory) be good enough to scale well into hundreds of cores.It seems that this algorithm lends itself well to being parallelized efficiently and if each computation is independent from one another then it will work well on a GPU
But, there is one big problem:
GPUs have very poor support for double-precision floating-point - which is what this program relies on. A GPU crunching DP-FP isn't much better than a CPU. There is no way to efficiently use single-precision FP instead.
Very True.But it seems this program requires a large amount of memory
The larger the computation, the better is scales.
When you have many cores, the minimum computation size needed to achieve decent multi-core scaling is MASSIVE - which means... it'll need a LOT of memory.
Also, the speed of the program is nearing the point where the minimum computation size that will make a "sufficiently long" benchmark is more than the total memory in the average computer.
(Right now, you can't do a ram-only Pi benchmark that lasts more than 10 minutes on an i7 machine with 6GB of ram.)
It can, but it must go through the PCIe bus - a sure bottleneck.and I don't know if the GPU can access CPU memory
Bandwidth is already a problem on Core 2. A GPU would be faster computationally, but PCIe bandwidth will probably kill it.
Pretty much the moment I got it to scale on my mom's Q6600.Have you looked into this at all?(back in December 2008...)
But after a bit of research I decided that it isn't the time yet.
The hardware isn't ready. (poor DP-FP, not enough memory, not enough bandwidth, no set programming standard)
The algorithm isn't ready.
After I got my Harpertown rig up and running with 8-cores and 64GB ram, I started to notice some weaknesses in the algorithm on large computations... Bad enough to prevent scaling into hundreds of threads.
As of now, these problems have only been partially solved.
And "I'm" not ready either...
I have no experience with CUDA or OpenCL...
Also, I'm still an undergrad... So wth do I know about parallel programing? lol
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Hi pokethis is as far as i've got with 1B and 1G digits.
Vista HP64
500,000,000 digits
248.709 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3
1,000,000,000 digits
548.419 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3
SuperPi-size 1G digits
591.198 - v0.4.2 x64 SSE3 - cheapseats - Intel Core i7 920 @ 4.294 GHz (204.5x21) - 6 GB DDR3
Code:y-cruncher v0.4.2 Build 7438 ( www.numberworld.org ) Copyright 2008-2009 Alexander J. Yee ( a-yee@northwestern.edu ) Distribute Freely - Please Report any Bugs Version: x64 SSE3 0 Benchmark Pi 1 Validate a Pi Benchmark 2 Batch Benchmark Pi (run multiple benchmarks) 3 Stress Test (beta) 4 Custom Compute a Constant - Compute other constants (e, Golden Ratio, etc...) - Choose your own settings 5 Digit Viewer (view digits from .txt and .ycd files) 6 Compare Digits (compare digits from different runs) 7 About 8 A Word of Warning... Enter your choice: option = 0 Benchmark Pi: Select a Benchmark Type: 0 Single-Threaded 1 Multi-Threaded option = 1 Select a Benchmark Size: Option Decimal Digits Approx. Memory Needed 1 25,000,000 117 MB 2 50,000,000 253 MB 3 100,000,000 458 MB 4 250,000,000 1.19 GB 5 500,000,000 2.39 GB 6 1,000,000,000 4.79 GB 7 2,500,000,000 11.5 GB 8 5,000,000,000 23.0 GB 9 10,000,000,000 46.0 GB 10 25,000,000,000 116 GB 11 50,000,000,000 250 GB 12 100,000,000,000 467 GB 0 I prefer SuperPi sizes... (1M, 2M, 4M...) option = 5 Threads = 8 Allocating and Reserving Memory... 2.39 GB Constructing FFT lookup tables... Compute: Pi Decimal Digits : 500,000,000 Hexadecimal Digits: 415,241,012 Mode: Ram Only Begin Computation: Computing: Pi Algorithm: Chudnovsky Formula Summing Series: 35,256,838 terms Time: 183.983 seconds ( 0.051 hours ) InvSqrt... Time: 6.479 seconds ( 0.002 hours ) Final Multiply... Time: 3.589 seconds ( 0.001 hours ) Compute Pi Time: 194.061 seconds ( 0.054 hours ) Constructing Base Conversion Table: Time: 9.623 seconds ( 0.003 hours ) Base Converting (Primary Cutting Parameters): Time: 44.989 seconds ( 0.012 hours ) Writing Decimal Digits: 500,000,001 digits written Total Computation Time: 248.709 seconds ( 0.069 hours ) Total Time (including writing digits): 258.591 seconds ( 0.072 hours ) Benchmark Successful. The digits appear to be OK. Program Version: 0.4.2 Build 7438 (x64 SSE3) Processor(s): Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Frequency: 4,089,823,801 Hz (frequency may be inaccurate) Thread(s): 8 Digits: 500,000,000 Total Time: 248.709 seconds Checksum: 928615c626ff9b210085a5e9d5f7efbd Press any key to continue . . . -------------------------------------- y-cruncher v0.4.2 Build 7438 ( www.numberworld.org ) Copyright 2008-2009 Alexander J. Yee ( a-yee@northwestern.edu ) Distribute Freely - Please Report any Bugs Version: x64 SSE3 0 Benchmark Pi 1 Validate a Pi Benchmark 2 Batch Benchmark Pi (run multiple benchmarks) 3 Stress Test (beta) 4 Custom Compute a Constant - Compute other constants (e, Golden Ratio, etc...) - Choose your own settings 5 Digit Viewer (view digits from .txt and .ycd files) 6 Compare Digits (compare digits from different runs) 7 About 8 A Word of Warning... Enter your choice: option = 0 Benchmark Pi: Select a Benchmark Type: 0 Single-Threaded 1 Multi-Threaded option = 1 Select a Benchmark Size: Option Decimal Digits Approx. Memory Needed 1 25,000,000 117 MB 2 50,000,000 253 MB 3 100,000,000 458 MB 4 250,000,000 1.19 GB 5 500,000,000 2.39 GB 6 1,000,000,000 4.79 GB 7 2,500,000,000 11.5 GB 8 5,000,000,000 23.0 GB 9 10,000,000,000 46.0 GB 10 25,000,000,000 116 GB 11 50,000,000,000 250 GB 12 100,000,000,000 467 GB 0 I prefer SuperPi sizes... (1M, 2M, 4M...) option = 6 Threads = 8 Allocating and Reserving Memory... 4.79 GB Constructing FFT lookup tables... Compute: Pi Decimal Digits : 1,000,000,000 Hexadecimal Digits: 830,482,024 Mode: Ram Only Begin Computation: Computing: Pi Algorithm: Chudnovsky Formula Summing Series: 70,513,673 terms Time: 410.583 seconds ( 0.114 hours ) InvSqrt... Time: 13.291 seconds ( 0.004 hours ) Final Multiply... Time: 7.488 seconds ( 0.002 hours ) Compute Pi Time: 431.371 seconds ( 0.120 hours ) Constructing Base Conversion Table: Time: 19.610 seconds ( 0.005 hours ) Base Converting (Primary Cutting Parameters): Time: 97.374 seconds ( 0.027 hours ) Writing Decimal Digits: 1,000,000,001 digits written Total Computation Time: 548.419 seconds ( 0.152 hours ) Total Time (including writing digits): 568.351 seconds ( 0.158 hours ) Benchmark Successful. The digits appear to be OK. Program Version: 0.4.2 Build 7438 (x64 SSE3) Processor(s): Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Frequency: 4,089,857,858 Hz (frequency may be inaccurate) Thread(s): 8 Digits: 1,000,000,000 Total Time: 548.419 seconds Checksum: de9afc254d422ce2181755d0eae3b9fd Press any key to continue . . . ----------------------------------------- y-cruncher v0.4.2 Build 7438 ( www.numberworld.org ) Copyright 2008-2009 Alexander J. Yee ( a-yee@northwestern.edu ) Distribute Freely - Please Report any Bugs Version: x64 SSE3 0 Benchmark Pi 1 Validate a Pi Benchmark 2 Batch Benchmark Pi (run multiple benchmarks) 3 Stress Test (beta) 4 Custom Compute a Constant - Compute other constants (e, Golden Ratio, etc...) - Choose your own settings 5 Digit Viewer (view digits from .txt and .ycd files) 6 Compare Digits (compare digits from different runs) 7 About 8 A Word of Warning... Enter your choice: option = 0 Benchmark Pi: Select a Benchmark Type: 0 Single-Threaded 1 Multi-Threaded option = 1 Select a Benchmark Size: Option Decimal Digits Approx. Memory Needed 1 25,000,000 117 MB 2 50,000,000 253 MB 3 100,000,000 458 MB 4 250,000,000 1.19 GB 5 500,000,000 2.39 GB 6 1,000,000,000 4.79 GB 7 2,500,000,000 11.5 GB 8 5,000,000,000 23.0 GB 9 10,000,000,000 46.0 GB 10 25,000,000,000 116 GB 11 50,000,000,000 250 GB 12 100,000,000,000 467 GB 0 I prefer SuperPi sizes... (1M, 2M, 4M...) option = 0 Option Decimal Digits Approx. Memory Needed 20 1 M - 1,048,576 10.8 MB 21 2 M - 2,097,152 13.8 MB 22 4 M - 4,194,304 22.5 MB 23 8 M - 8,388,608 44.1 MB 24 16 M - 16,777,216 84.3 MB 25 32 M - 33,554,432 160 MB 26 64 M - 67,108,864 332 MB 27 128 M - 134,217,728 631 MB 28 256 M - 268,435,456 1.29 GB 29 512 M - 536,870,912 2.46 GB 30 1 G - 1,073,741,824 5.21 GB 31 2 G - 2,147,483,648 9.83 GB 32 4 G - 4,294,967,296 20.9 GB 33 8 G - 8,589,934,592 39.3 GB 34 16 G - 17,179,869,184 84.1 GB 35 32 G - 34,359,738,368 157 GB 36 64 G - 68,719,476,736 338 GB 37 128 G - 137,438,953,472 629 GB option = 30 Threads = 8 Allocating and Reserving Memory... 5.21 GB Constructing FFT lookup tables... Compute: Pi Decimal Digits : 1,073,741,824 Hexadecimal Digits: 891,723,283 Mode: Ram Only Begin Computation: Computing: Pi Algorithm: Chudnovsky Formula Summing Series: 75,713,479 terms Time: 444.407 seconds ( 0.123 hours ) InvSqrt... Time: 13.980 seconds ( 0.004 hours ) Final Multiply... Time: 7.747 seconds ( 0.002 hours ) Compute Pi Time: 466.143 seconds ( 0.129 hours ) Constructing Base Conversion Table: Time: 21.508 seconds ( 0.006 hours ) Base Converting (Primary Cutting Parameters): Time: 103.475 seconds ( 0.029 hours ) Writing Decimal Digits: 1,073,741,825 digits written Total Computation Time: 591.198 seconds ( 0.164 hours ) Total Time (including writing digits): 612.818 seconds ( 0.170 hours ) Benchmark Successful. The digits appear to be OK. Program Version: 0.4.2 Build 7438 (x64 SSE3) Processor(s): Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Frequency: 4,089,825,417 Hz (frequency may be inaccurate) Thread(s): 8 Digits: 1,073,741,824 Total Time: 591.198 seconds Checksum: a86ee7dcafada01318a8a3ae166500f4 Press any key to continue . . .
Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)
CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug
Nice results... Always nice to see large runs.
----------------------------------------------------------------
Anyways... Seeing as how I let a few bugs get through into v0.4.1 (even after 2 months of beta-testing...), I'll just make v0.4.3 a public beta.
Version 0.4.3 is Out!!!
Have fun with it.No more boring white output...
Feel free to override whatever the launcher selects. (Just run the binaries directly.)
Depending on your computation size, the version that the launcher chooses isn't always the best.
For example:
On Core i7, 1M is fastest using x64 SSE4.1 ~ Nagisa. (The launcher chooses x64 SSE4.1 ~ Ushio)
On dual-Harpertown (and probably 12M Yorkfield as well), 25m is fastest using x64 SSE4.1 ~ Ushio. (The launcher chooses x64 SSE4.1 ~ Nagisa.)
The binaries are tuned using LARGE computations. So the launcher should be able to nail the best binary for computations larger than 250m.
But for small computations, anything goes. Take your pick from 5 binaries.![]()
Last edited by poke349; 09-29-2009 at 07:48 PM.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Today it's nice day! I have evga nforce 780i sli premiumand I going to OC to 4 GHZ my c2d xD!
But this time I don't destroy the mobo ! xD
Today nothing special...I scared destroy new mobo xD.
Version v0.4.2.
50,000,000 digits after OC to 3.7Ghz
This time I don't risked xD
Last edited by dranzi666; 10-02-2009 at 05:24 AM. Reason: image link dead
its CL7-7-7-20
its a bug when after wake from sleep
Last edited by cstkl1; 10-02-2009 at 01:03 AM.
I found magic voltage xD and this time system after OC is stable and I could made more benchmarks.
New results:
&&
Have fun ! xD
xD
Last edited by dranzi666; 10-02-2009 at 09:07 AM. Reason: update new program version benchmark xD
CPU: Intel Core 2 Duo E6750 2.66 GHz 4 MB L2 (64nm)
MoBo #1: EVGA nForce 650i Ultra (dead)
MoBo #2: EVGA nForce 780i SLI Premium
GPU: GeForce 8600 GTS 512 MB
RAM: 2 x DDR2 1 GB PDP Patriot 1000 mhz CL 5
HDD: Seagate 320 GB SATA II Raid Edition
PSU: Amacrox Warrior 500W
CPU Cooling: Pentagram HP-90
MOUSE: Logitech G5
tried to get the CPU details right this time poke![]()
Vista HP64
500,000,000 digits
202.881 - v0.4.3 x64 SSE4.1 - cheapseats - Intel Core i7 920 @ 4.09 GHz (4.294 GHz Turbo Boost) - 6 GB DDR3
1,000,000,000 digits
449.062 - v0.4.3 x64 SSE4.1 - cheapseats - Intel Core i7 920 @ 4.09 GHz (4.294 GHz Turbo Boost) - 6 GB DDR3
Code:500Million Benchmark Successful. The digits appear to be OK. Version: 0.4.3 Build 7681 (x64 SSE4.1 ~ Ushio) Processor(s): Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Frequency: 4,089,829,337 Hz (frequency may be inaccurate) Thread(s): 8 Digits: 500,000,000 Total Time: 202.881 seconds Checksum: 46117a3f76fa532b12fe3c237edb8ef8 ------------------------- 1Billion Benchmark Successful. The digits appear to be OK. Version: 0.4.3 Build 7681 (x64 SSE4.1 ~ Ushio) Processor(s): Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU Frequency: 4,089,825,762 Hz (frequency may be inaccurate) Thread(s): 8 Digits: 1,000,000,000 Total Time: 449.062 seconds Checksum: afa902f4e32e19a40fe2f257f0783327
----------------------------
these are just for comparison
1,000,000,000 digits
201x21 / uncore 16x v 20x / memory x4 v x5
x16 + x4 (465.566s)
x20 + x4 (458.798s)
x20 + x5 (457.367s)
![]()
Maximus V GENE [0086] :: 3770K L212B244 :: H70 :: EB3103A (PSC)
CL|WCL|RTL performance (SB) : 32M scaling charts : PSC WCL > CL performance bug
Nice results
Interesting... so the program really loves uncore, but it doesn't respond much to memory speed. (For i7 at least, I'd expect Core 2 to be much more sensitive to memory speed.)
I guess that's a good start as far as figuring out what tweaks the program responds to.![]()
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
some Core i5, and Phenom II Results on the latest Version..
Phenom II X2 550, Unlocked to Quad, Noctua Aircooling.
3.71Ghz
Core i5 750, Stock Cooling (could not complete 250m, too unstable sorry) NO TURBO MODE, the frequency reported is Accurate.
New version is pretty quick!
One thing I've noticed is a high level of inconsistancy between runs. The first run is typically slower than 2nd and 3rd. Then occasionally if you keep re-running a slower pass will appear again.
So the above are the 2n'd run results, as they're typically as much as 0.5s quicker than first, and also ever so slightly quicker than 3rd or subsequent runs.
Yeah, new compiler + a ton of optimizations...
There's still plenty of places that can be improved, but I'm gonna call it quits until after grad-school apps are done.
I've definitely noticed it myself as well.
The first runs are probably slower because the OS hasn't fully prepared the buffer and the memory/paging stuff for it. (and maybe hasn't buffered the entire binary yet)
The inconsistencies should've been there since the first version. It's inherent because of the way the program creates and destroys threads.
It's very inconsistent and sensitive to background programs.
There are some thread-management/scheduling settings in windows that can be tweaked. Though I haven't played with them yet, it might be possible to get better consistency and some speedup by tweaking them.
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
And as soon as I release the new version...
The guy in Japan has some new numbers to show for.
So we know who's been doing some website camping lately.
So now the question is: Does anyone have a pair of Gulftown samples?
We need someone to trip him up a bit.
2 x Intel Xeon W5590 @ 3.33 GHz (3.46 GHz Turbo Boost)
72 GB (18 x 4 GB) DDR3 ram
25m - 6.360
50m - 11.885
100m - 25.096
250m - 68.309
500m - 146.704
1b - 321.974
2.5b - 901.162
5b - 1,968.124
10b - 4,480.503
25b - 14,431.975 (4 hours, 28 minutes - swap mode)
50b - 112,256.531 (31 hours, 11 minutes - swap mode + pagefile thrashing)
1M - 0.299
2M - 0.600
4M - 1.072
8M - 2.199
16M - 4.038
32M - 7.718
64M - 16.219
128M - 34.016
256M - 72.441
512M - 156.233
1G - 343.595
2G - 756.990
4G - 1,676.540
8G - 3,916.248
16G - 8,628.893 (2 hours, 24 minutes - ram only + a bit of pagefile thrashing)
32G - 25,978.250 (7 hours, 13 minutes - swap mode)
And the big one... With significant pagefile thrashing.
![]()
Last edited by poke349; 10-09-2009 at 07:08 AM. Reason: typo
Main Machine:
AMD FX8350 @ stock --- 16 GB DDR3 @ 1333 MHz --- Asus M5A99FX Pro R2.0 --- 2.0 TB Seagate
Miscellaneous Workstations for Code-Testing:
Intel Core i7 4770K @ 4.0 GHz --- 32 GB DDR3 @ 1866 MHz --- Asus Z87-Plus --- 1.5 TB (boot) --- 4 x 1 TB + 4 x 2 TB (swap)
Bookmarks