Is it not possible just because the bios isnt there yet or because it actually isnt supported to disable one core on the rest from the processor itself?
Rig 1:
ASUS P8Z77-V
Intel i5 3570K @ 4.75GHz
16GB of Team Xtreme DDR-2666 RAM (11-13-13-35-2T)
Nvidia GTX 670 4GB SLI
Rig 2:
Asus Sabertooth 990FX
AMD FX-8350 @ 5.6GHz
16GB of Mushkin DDR-1866 RAM (8-9-8-26-1T)
AMD 6950 with 6970 bios flash
Yamakasi Catleap 2B overclocked to 120Hz refresh rate
Audio-GD FUN DAC unit w/ AD797BRZ opamps
Sennheiser PC350 headset w/ hero mod
Bachelor of Science in Music Production 2016, Mid 2012 mack book Pro i7 2.6 8gb ram Nvidia 250m 1gb . Pro Tools , Logic X, Presonus one, Reaper, Garage Band. Cubase, Cakewalk.
Bachelor of Science in Music Production 2016, Mid 2012 mack book Pro i7 2.6 8gb ram Nvidia 250m 1gb . Pro Tools , Logic X, Presonus one, Reaper, Garage Band. Cubase, Cakewalk.
great work chew, this may not be what everyone was hoping to hear but it will surely be of assistance to people reading up to see whther BD fits their needs or not!
So on to the juicy stuff, your findings are consistent with a classic resource sharing bottleneck, or so it seems..
To sum it up:
- when a CPU heavy task fully loads both clusters in a CU, the shared resources have a hard time feeding or dealing with both clusters so you notice a bottleneck and performance per thread drops accordingly -> ergo, 4c/4cu wins
- when only one cluster is loaded, or both clusters are being used, but one of them (or both) is/are not being fully loaded, shared resources can keep up and so performance per thread takes a lesser/no impact -> ergo, clockspeed wins
This is fully understandable and seems perfectly logical
Next subject, max clocks:
I'm still confused by something here, please see which interpretation is the correct one (or if both apply):
- 4c/2cu has lower power/thermal footprint, ergo for the same voltage, same cooling, it will clock higher?
or
- Luck of the draw means that you're likely to get different OC potential on all modules, and therefore you might (probably) end up with having to choose a cluster from a module with worse potential when taking the 4c/4cu approach, therefore limiting overall clocks a bit?
Also, for this last part, do you think it plausible that with better air or water cooling (for instance, i have 360 rad custom water loop) and maybe some extra volts one could push those limits higher and reach higher clocks? Perhaps 4.8 ~ 5Ghz across all cores? or is this a pipe dream for this stepping?
Last question:
Have you any idea of the actual power consumption difference between running 4c/2cu vs 4c/4cu vs 8c/4cu @ same clocks all around, for instance, 4.6Ghz? I am considering an upgrade but at this point im afraid my TX650 would not cut it powering a BD 8c OC plus a 6970 or 6870 CF?
My personal opinion on this:
Best case scenario (and the most balanced option for varied daily usage) would be for us to be able to leave them all on, and have AOD or something clock each loaded core higher.. While at the same time having some way to make the os treat one of the clusters in each cu as a "Hyperthreading-like" affair... However im guessing this would require a hefty power draw and adequate high end cooling to control..
Crappy math:
So taking into account an announced (and somewhat accurate, from what we can see) performance penalty of roughly 20% when fully loading both clusters (ex: Civilization or starcraft) we can speculate that it would take roughly 5 ~ 5.2Ghz clocks all around for a "FX-4110" to match performance of a PII x4 @ 4 ~4.2 Ghz. Whereas for less intense games with a 2main, 2 secondary threads would require only around 4.3 to 4.4Ghz.
This would be an avantage to the octa-cores that they could "shine" in both scenarios, we would just need a way to make this hassle free and be able to pair this up with NOT having to reboot or manually change these settings whenever we want to game, or encode, or whatever :P
once again, thank you guys for your awesome work! and also everyone sorry for the long post, feel free to ignore if you wish but i just wanted to see if I had all the facts straight from this "investigation"
Please, do correct my conclusions if you seem fit as I have a feeling these are some of the points other people could benefit from aswell
Last edited by omninmo; 10-18-2011 at 12:41 AM.
The answer could be simple. If you use 2CU/4C, there are 2 floating point units available, in 4CU/4C mode 4. So a task using 4 cores, gets access to 4 vs. 2 fpus. The performance gain in 50% int and 50 % fpu scenarios, is up to 25 %. Xbox ported games, mostly use 2 cores. So they get access to 2 fpus in both scenarios and you see performance gain of about 5 % only, from doubled cache per core.
Console ports are not the problem, there is always enough power for a FX6 or FX8 to reach >60 fps. The problem is Dirt, F1, Civ5, Crysis 2,... . Northbridge overclocking and 4CU/4C can be the difference between playable or not.
i just installed win 8 beta, on a thuban system though. It seems, its stable enough to work for some gaming, and i guess this scheduler is more aware of what BD needs. I am looking forward to tests coming in at the other thread covering this..
1. ASUS Sabertooth 990fx | FX 8320 || 2. DFI DK 790FXB-M3H5 | X4 810
8GB Samsung 30nm DDR3-2000 9-10-10-28 || 4GB PSC DDR3-1333 6-7-6-21
Corsair TX750W | Sapphire 6970 2GB || BeQuiet PurePower 450w | HD 4850
EK Supreme | AC aquagratix | Laing Pro | MoRa 2 || Aircooled
Asus Crosshair IV Extreme
AMD FX-8350
AMD ref. HD 6950 2Gb x 2
4x4Gb HyperX T1
Corsair AX1200
3 x Alphacool triple, 2 x Alphacool ATXP 6970/50, EK D5 dual top, EK Supreme HF
Take a look here at the picture "Fritz Chess benchmark 4.3 (watts atx 12v)":
http://www.hardware.fr/articles/842-...windows-8.html
2/4 = 60 watts (100%)
4/4 = 84 watts (140 %)
2/4 turbo = 95 watts (158 %)
8/8 = 101 watts (168 %)
totally power consumption in your scenario (oc and with HD6990) is about 500 watts in peak:
http://www.rage3d.com/reviews/cpu/am.../index.php?p=7
To reach 4/4 @stock in dirt with 2/4, you nearly have to double the power draw. In other games (console ports) its useless, because fps are mostly >60 fps and there is no gain in performance, gpu is limiting.
There are two different methods to get what you want:
1. let the FX-8 or FX-6 work in stock mode and bind the game/the app manually through the task manager to 1357 or 0246 or use a tool like "amd overdrive" or "core affinity resident" to build a profile; you do not need the expensive asus board and simulate the windows 8 task manager in a smarter way than inefficient cu parking+turbo
2. get the FX-8 or FX-6 into 4/4 or 3/3 mode to build X4 oder X3; asus is needed; thats it but you have to go into bios every time
Last edited by son14; 10-18-2011 at 01:18 AM.
Oh, and by the way folks... but this might be a decent solution to allow us to manage our threads the way we like them and give us the extra flexibility of deciding if we want 4c/2cu, 4c/4cu or 8c/4cu on an app to app basis WITHOUT much hassle and without having to reboot!
http://img.tomshardware.com/us/2004/...taskassign.zip
just set your turbo OCs accordingly and setup each game in the way it will benefit you the most? win/win?
if someone could please test this and see if it plays nice with BD and let the rest of us now, would be grateful!
EDIT: hmmm seems people are already aware of utilities like this, my bad xD
Last edited by omninmo; 10-18-2011 at 01:18 AM.
http://www.hardware.fr/articles/842-...windows-8.html
1. They see 2M/4C in sum 15 % behind 4M/4C and fritz chess 8130/6417=127%
2. In the first post here it was reached 122 % in all tests and 8813/6335=139 % in fritz chess; fritz chess is 108 % of 1.
So 1. should be task manager method and 2. deactivating cores in bios.
Interesting detail:
you need 6CU/6C clock by clock to beat an FX-8150 through all benchmarks
Last edited by son14; 10-18-2011 at 01:35 AM.
awesome, chew!.. Only thing missing is a current deneb OC'd under same cooling etc (that i'd like to see)
This answer is not for that question, because he was speaking about 4CU/8C vs. 4CU/4C, which is irrelevant in case of games with 4 threads or less.
So, let's we go back to 4CU/4C vs. 2CU/4C.
You're searcing for the answer in the right direction, floating-point wise. Although, let's not forget this FPU is 2-way SMT, with double the resources of K10's FPU: it's 2xFMAC vs. 1xFADD+1xFMUL, FMAC being 1xFADD+1xFMUL combined. The thing is that all of this computing power can only be fully utilized with FMA code, because you can't have an FADD and FMUL independently started in the same cycle on an FMAC unit, only if it's an FMA instruction.If you use 2CU/4C, there are 2 floating point units available, in 4CU/4C mode 4. So a task using 4 cores, gets access to 4 vs. 2 fpus. The performance gain in 50% int and 50 % fpu scenarios, is up to 25 %. Xbox ported games, mostly use 2 cores. So they get access to 2 fpus in both scenarios and you see performance gain of about 5 % only, from doubled cache per core.
Thus, for legacy code 4CU/4C mode means double the resources utilizable all the time (not shared with the second thread), per hw thread, so it can have independent FADD's and FMUL's started in the same cycle on the two FMAC units. (Extra bonus is that it can have also 2xFADD or 2xFMUL, not just 1xFADD+1xFMUL, like on K10.) In 2CU/4C mode, it can have usually only 1xFADD OR 1xFMUL started in the same cycle (mostly one FMAC unit available because of the second thread engage the other one), per hw thread.
(And so, I think there won't be such a significant gain in performance for FMA heavy code, going 4CU/4C from 2CU/4C. EDIT: Or, even if there be so, both cases will be signicicantly faster than the same algorithm with legacy code.)
(I think it's not the best wording, because less than quad-threaded apps can also be "using 4 cores" upon the constant core-variation of Windows. So, it's about how many threads an application have.)...So a task using 4 cores...
Definitely worth a testing.The problem is Dirt, F1, Civ5, Crysis 2,... . Northbridge overclocking and 4CU/4C can be the difference between playable or not.
So, you're saying if we enable the 2nd core here, then we can't disable ony other cores, alone in a CU?
Wow, I've forgot about this little utility. Perhaps it could be enhanced and adapted to BD, so all of it could work fully automatic! Would be even more useful!
Last edited by dess; 10-18-2011 at 06:15 AM.
If core 2 is enabled the CU0 would be running in dual core mode.
This means all of the remaining CU must be running in dual too.
You can disable more cores while core 2 is enabled but then you would need to turn a whole compute unit off.
I think the limitation is because each core needs to have a pair.
Atleast the schematics show all of the individual cores connected. I might be wrong thou.
If core 2 is enabled the CU0 would be running in dual core mode.
This means all of the remaining CU must be running in dual too.
You can disable more cores while core 2 is enabled but then you would need to turn a whole compute unit off.
I think the limitation is because each core needs to have a pair.
Atleast the schematics show all of the individual cores connected. I might be wrong thou.
interesting thread here
Intel Core i5 6600K + ASRock Z170 OC Formula + Galax HOF 4000 (8GBx2) + Antec 1200W OC Version
EK SupremeHF + BlackIce GTX360 + Swiftech 655 + XSPC ResTop
Macbook Pro 15" Late 2011 (i7 2760QM + HD 6770M)
Samsung Galaxy Note 10.1 (2014) , Huawei Nexus 6P
[history system]80286 80386 80486 Cyrix K5 Pentium133 Pentium II Duron1G Athlon1G E2180 E3300 E5300 E7200 E8200 E8400 E8500 E8600 Q9550 QX6800 X3-720BE i7-920 i3-530 i5-750 Semp140@x2 955BE X4-B55 Q6600 i5-2500K i7-2600K X4-B60 X6-1055T FX-8120 i7-4790K
Please, please, please tell me all (or at least some) of this testing is with 64-bit applications on a 64-bit OS.
The application I am working on, the 64-bit version is 40% faster than the 32-bit version. There are twice as many general purpose registers in 64-bit mode. My application makes heavy use of 64-bit integers and no floating point. It is like Fritz except it is a breadth first search instead of a depth first search. It will use all the memory that you can throw at it and thrash it hard.
It wouldn't be that hard to add a dialog box to let you set the threads to run on the desired cores.
My next machine was going to be a dual socket C32 machine, I need the memory slots more than I need processing cores. But after reading Anands "Rendering and HPC Benchmark Session Using Our Best Servers", I have concerns about the memory performance of BD in dual socket boards.
@chew*,thanks for all your testing on this matter,you sir are awesome for going out of your way to help the community
_________________________________________________
............................ImAcOmPuTeRsPoNgE............................
[SIGPIC][/SIGPIC]
MY HEATWARE 76-0-0
It would probably do well as a 6 core, the third core pair on this chip appears to be the really sore thumb on this chip.
Thats what always drops out in prime 95.
I gamed till like 5am this morning at 4.6 4 core, tbh I can not tell the diff between this rig or my sandy rig.
Both run very smooth with no glitches.
I did have some hitches with a kingston SSD though, not agreeing with the ch5 to well, might be firmware.
It's fine with a normal sata drive.
heatware chew*
I've got no strings to hold me down.
To make me fret, or make me frown.
I had strings but now I'm free.
There are no strings on me
Last edited by chew*; 10-18-2011 at 10:53 AM.
heatware chew*
I've got no strings to hold me down.
To make me fret, or make me frown.
I had strings but now I'm free.
There are no strings on me
Once again thanx for all the usual posts and info. So refreshing not having to hear about BD bashing.
Kudos to Chew and the other xtreme addicts!
INTEL 2600K @ 4.5ghz 24/7 Corsair H100
ASUS P8Z68-V PRO
2 x CORSAIR 4GB DDR3 1600 (CL8)
4TB Seagate SATA2
SAPPHIRE 7950 (GPU 1100 | MEM 1500)
Cosmos S
Asus XONAR DX
Corsair 850W PSU
Bookmarks