ah missed that :( , to bad..
Printable View
Why?
Speed why not but, Q6600 is entry level quad, it's not like using the lowest speed barcelona vs the highest intel offer... :shrug:
FSB, should he add an IMC inside his Q6600? Why not increase the fsb on the max level with stock cooler on both platforms... :)
4 cores vs 8 cores...
When I buy a cpu, I'd like to now how it performs for a given price, not how it performs if it would be worst...
you are right when it comes to finding which one is the best performance/cost ratio plataform.
But you are wrong if you consider which is the best plataform. then, you need to look to how far can it get in clock (and it will take some time for us to know in any situation), and how well they perform at given clocks. ;)
with 8core it will need to go by ht link so speed up not as fast as native 4core speed up. in unganged mode it use 2imc. in ganged mode it use 1imc ...i guess.. hehe?
I have missed the 3.4GHz like you before :)
-----------------------------------------------------------------
So from http://www.adobeforums.com/webx/.3bc4aee5
So the Multiprocessor Speedup is lower than the eight core barcelona system but the single cpu score should be higher even at 2GHz, leading to higher multiple cpu score. :shrug:Quote:
Processor : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
MHz : 2660
Number of CPUs : 8
Operating System : WINDOWS 32 BIT 5.2.3790
Graphics Card : Quadro FX 1500/PCI/SSE2
Resolution : <fill this out>
Color Depth : <fill this out>
# ************************************************** *
Rendering (Single CPU): 2701 CB-CPU
Rendering (Multiple CPU): 15867 CB-CPU
Multiprocessor Speedup: 5.88
Shading (OpenGL Standard) : 3852 CB-GFX
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>..
Processor : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
MHz : 2660
Number of CPUs : 8
Operating System : WINDOWS 32 BIT 5.2.3790
Graphics Card : GeForce 8800 GTX/PCI/SSE2
Resolution : <fill this out>
Color Depth : <fill this out>
# ************************************************** *
Rendering (Single CPU): 2679 CB-CPU
Rendering (Multiple CPU): 15466 CB-CPU
Multiprocessor Speedup: 5.77
Shading (OpenGL Standard) : 4308 CB-GFX
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >.
64 bit OS/64 bit Cinebench:
Processor : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
MHz : 2660
Number of CPUs : 8
Operating System : WINDOWS 64 BIT 5.2.3790
Graphics Card : Quadro FX 1500/PCI/SSE2
Resolution : <fill this out>
Color Depth : <fill this out>
# ************************************************** *
Rendering (Single CPU): 3069 CB-CPU
Rendering (Multiple CPU): 18357 CB-CPU
Multiprocessor Speedup: 5.98
Shading (OpenGL Standard) : 3794 CB-GFX
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>
Processor : Intel(R) Xeon(R) CPU X5355 @ 2.66GHz
MHz : 2660
Number of CPUs : 8
Operating System : WINDOWS 64 BIT 5.2.3790
Graphics Card : GeForce 8800 GTX/PCI/SSE2
Resolution : <fill this out>
Color Depth : <fill this out>
# ************************************************** *
Rendering (Single CPU): 3038 CB-CPU
Rendering (Multiple CPU): 18314 CB-CPU
Multiprocessor Speedup: 6.03
Shading (OpenGL Standard) : 4506 CB-GFX
(3038/2.66*2.4 = 2741 or 3038/2.66*3.4= 3883 similar to 3881 shown by siyah at 3.4 on Q6600 so at 2GHz single cpu should be 2280 and multi cpu should be 7900 for the Q6600 and 13770 for the eight core xeon system)
If I do the same calculation taking into account barcelona results single cpu speed = 1907*3.1/2 = 2955 even a Multiprocessor speedup equal to 4 this would give 11800 for multi cpu score...
yeah, got it almost instantly after i have made the post before :D
i calculate it this way.. 3.1ghz 9.7k on 32bit c10 with.. 3.88x speed up.. in 64bit kyosen result 20% improvement on single threaded plus 4x speed up,.. so my estimate score 9.7k / 3.88 x 120% x 4
Yeah, roughly there:
http://www.techreport.com/articles.x/13470/11
There's the catch on where you start implying the law. It's not a law. It's a theoretical prediction. That's also how it's explained from the link you posted to explain the so called Amdahl's Law. Physical laws are absolutes given system constraints and controlled factors.
The prediction holds true in the case where you only variate one system component like I mentioned, such as in CPU's, but not where more than one are manipulated ((such as increasing DDR speed+CPU speed) in which case you will see more than 100% scaling). It also relies on you knowing the absolute performance possible of a CPU as your base marker to judge scaling. Too many of these facts we do not have so to talk in absolute comparisons about that would be conjectures or estimates at best and inaccurate. We need more information and testing to know the theoretical peak performance of CPU components and then we can judge the achieved performance as a percentage of the maximum possible performance of a given CPU. This is where you will find that the maximum always stays consistent and the performance scaling always stays below 100%.
Thank you for your welcome words and a great thank you to kyosen for the help.
The Opteron 2344 is the secound barcelona system i configured and we get this board after reading this thread in your forum.
Now, with the help from kyosen i can oc the 2344.
At this moment it runs @1.9GHz and so it outperformed my 2347 System.
Ok, I was a bit confused what you were trying to say. A more appropriate descriptor would have been "DCT" and not "IMC". IMC is a generic term which most people use to describe AMD's memory controller as a whole. I apologize for assuming you were like "most people" :p:
At 3.4GHz/1134MHz and enough background processes, he got ~1550 CB higher than a Yorkfield at 3GHz (1333FSB) running DDR3-1333 RAM in Vista SP1 beta and nearly 3000 CB higher than the QX6850.
On vr-zone this should be 32-bit cinebench score :shrug:
Vr-zone QX9650: 11857
techreport QX9650: 13256
http://www.techreport.com/articles.x/13470/11
If I take results I've post before in this thread between 32 bit and 64 bit score there is around 14% difference. So
11857*1.14= 13516 which is not so far of techreport 13256
Yeh, it seems the score is to do with 32-bit (VR-Z) vs 64-bit (TR) AND P35 (TR) vs X38 (VR-Z). RAM remains same.
Good quick correction. :p: I was just about to mention it.Quote:
TR score was 12% higher than VR-Z just with those two differences which means we need similar testing to compare scores or we can't really judge one over the other with any CPU.Quote:
If I take results I've post before in this thread between 32 bit and 64 bit score there is around 14% difference. So
11857*1.14= 13516 which is not so far of techreport 13256
One thing we need to remember is that there is going to be some degree of variability in these benchmarks, which may explain some the difference in scores reported. Occationaly, the variance can be quite a bit running the same system. I've witnessed this multiple times as a former benchmark freak. Any review really needs to run these test multiple times and take an average. Don't get me wrong, I'm not saying that AMD is suddenly going to take the lead, but it does explain some of the variability.
You're describing one of the two available DCT-modes, the so-called Ganged Mode. But that protocol is far from new - the very same one was used by socket 940/939: one controller operates two channels in lockstep, almost like in RAID 0.
K10 features two independent controllers, which allows it to send independent commands to each channel (in Unganged Mode). That's the most important new feature of the K10 IMC.
Sorry, but that's just not a fair comparison. The i865 is unable to continuously access both channels in parallel, probably because the FSB is not fully dedicated to the RAM (unlike AMD's memory bus). And thanks to the dual independent controllers of the K10, loaded latency is reduced, unrelated data can be fetched simultaneously, and the channels can even operate in different directions at the same time. AFAIK, no comparable features are offered by any Intel chipset.
CPU-Z latest beta screenshot
Now it can treat each core clock...we can select the core with mouse right-click.
And it also shows Core VID:)
http://www.oohashi.jp/c-board/file/C...core_clock.png
Thanks to Franck, as usual:up:
3DMark06 with GeForce8800GT on WinXP x64
2350(B1)-2.2G=220x10、GeForce8800GT,
3DMark Score, CPU Score = 9973, 3212
http://www.oohashi.jp/c-board/file/C...core_clock.png
In previous test on WinXP 32bit: 10148, 3267
http://www.oohashi.jp/c-board/file/3....2G-220x10.png
So, I couldn't find no merit for WinXP x64 about 3DMark06, as expected.
NorthBridge multiplier again
I rebooted after changing register in this time,
and I could confrim that NB clock was down on BIOS display.
http://www.oohashi.jp/c-board/file/O...NB1.0_BIOS.jpg
And SuperPI4M time at same Core clock and different NB clock:
NB-1.8G: 3m48.406s, http://www.oohashi.jp/c-board/file/S...-1.8_WinXP.png
NB-1.6G: 3m48.016s, http://www.oohashi.jp/c-board/file/S...-1.6_WinXP.png
NB-1.4G: 3m50.500s, http://www.oohashi.jp/c-board/file/S...-1.4_WinXP.png
NB-1.2G: 3m53.532s, http://www.oohashi.jp/c-board/file/S...-1.2_WinXP.png
NB-1.0G: 3m58.359s, http://www.oohashi.jp/c-board/file/S...-1.0_WinXP.png
NB>=1.6G looks enough for single thread program with dual DDR2-667.
There may be difference between NB-1.6G and NB-1.8G in case of DDR2-800 setting.
BTW, I've learned reboot(warm reset) is needed for changing HT Link multiplier too.
Yeah, tictac and macci was/is right, as usual:yepp:
Folding@Home
Quick test on 2350(B1)-2.0G, DDR2-667.
Screenshot at 7% steps:
http://www.oohashi.jp/c-board/file/F...0_step-35k.png
Thanks for the test... :up:
Something seems weird about that SMP test.... my Opteron is only 6 minutes slower than that. I would think it would be better. I guess my opteron is running 800MHz faster, but still I guess I was expecting it to be faster than that.
Just quick guesstimation:
my result: 15m24s=924s for each 5000 steps with 2.0G x4 cores K10 Optreon
your score: 15m24s + 6m = 1284s with 2.8G x2 cores K8 Optreon
So, 1284/x * 2.8/2.0 /y = 924
...here x is efficiency of increased cores x2->x4, and y is performance gain per core.
for example, if y is ~1.05, x is ~1.85 from the formula above...
...yeah x should be within 2.0 in this case.
In my experience for SuperPI, y=~1.05 is feasible, at least on my board and current BIOS, so far.
I don't know usual efficiency x for Folding@Home, but 1.85 looks feasible too...
are you saying that, from this comparison, that the performance gain from K8 to K10 is about... 5%?
2.0ghz Barcelona gets 924s
2.8ghz K8 Opteron gets 1284s
2.0 x 8 = 16,000mhz
2.8 x 4 = 11,200mhz
16,000/924 = 17.316
11,200/1284 = 8.723
17.316/8.723 = 1.985 speedup factor.
edit - i'm assuming a dual quad and a dual dual here, but half the numbers (1 proc each) and you get the same... unless there's a number out somewhere in which case my bad ;)
Does the SMP client work one work unit across all cores or is it one per core? I assume it is one per core in this calculation. One across all cores will be different numbers.
~5% performance gain from K8 to K10 is based on my own results about SuperPI4M run.
I don't know whether the gain is same or not for both SuperPI and F@H, at this moment.
My intention of previous post is just suggestion of rough estimation formula.
Under assumption of that formula,
*if x = 2.0(ideal scaling), y < 1.0, i.e. gain from K8 to K10 is negative...it's not feasible.
*if x = ~1.95, y = 1.0, i.e. no gain from K8 to K10...it's not feasible too.
*if x = ~1.90, y = ~1.02
*if x = ~1.85, y = ~1.05
*if x = ~1.8, y = ~1.08
*if x = ~1.6, y = ~1.2
...
x=1.8~1.85 looks feasible for me as result of multi-thread program,
then, y=1.08~1.05...and it's not inconsistent with my SuperPI1M&4M results.
The thing is, for multithreaded programs, you can't fairly compare the K8 vs K10 because there isn't any quad cores K8. The only way to do a fair comparison would be to disable two of the cores when testing.
The SMP client spawns 4 total threads that actively do calculations. If you have less than 4 cores, 2 threads are shared on a core.
If you have MORE than 4 cores (dual socket quad cores), then you have to run multiple instances of the client to load up all cores.
Here is a SS from my server:
http://img474.imageshack.us/img474/6...3223xe1.th.jpg
Dual G0 Xeon E5310 @ 2.0GHz.
On a single client on this machine I normally get about 13.8-14.2 min per 1%. When you load up 2 clients, the time increases to 14.4 min per 1% per client. I can only assume that is b/c both processors share the same FSB and memory.
Which part of the core does the SMP client use most (what does it depend most on)?
How does it scale with frequency?
*edit* nvm,
Core 2 quad (Kentsfield) 2.96 GHz: 1 SMP instance 2653: 8:5x
Core 2 quad (Kentsfield) 2.96 Ghz: 2 SMP 2653: 15:45
OS: tweaked WinXP pro
IP-35E, 370 fsb, 2 GB Adata 1110 MHz CL5, performance level 5
If I could get 450+ fsb it would run even faster, but this mobo is not a good clocker.
Concerning unusualy high speedup in 64bit Cinebench, notice that with 4 threads, total L2 cache available to application is 4x0.5MB =2MB, while single thread only has 0.5MB available to it.
It is known that sometimes it is possible to get more than 100% (n*100%) speedup because of bigger aggregate cache available then application is running several threads.
Note that Core2 has significantly lower speedup because of shared nature of L2.
i know L3 cache is being recognized, but is it being utilized?
Well about the ganged and unganged memory, I asked my professor about it, so hopefully he'll give a good response.
Excerpt from BKDG For AMD Family 10h Processors Page 60.
2.8 DRAM Controllers (DCTs)
The DCTs support DDR2 DIMMs or DDR3 DIMMs. Products may be configurable between DDR2 and DDR3 operation.
A DRAM channel is the group of the DRAM interface pins that connect to one series of DIMMs. The processor supports two DDR channels. The processor includes two DCTs. Each DCT controls one 64-bit DDR DIMM channel.
For DDR products, DCT0 controls channel A DDR pins and DCT1 controls channel B DDR pins. However, the processor may be configured: (1) to behave as a single dual-channel DCT; this is called ganged mode; or
(2) to behave as two single-channel DCTs; this is called unganged mode.
A logical DIMM is either one 64-bit DIMM (as in unganged mode) or two identical DIMMs in parallel to create a 128-bit interface (as in ganged mode). See section 1.5.2 [Supported Feature Variations] on page 20 for information about supported package/DRAM configurations.
For DDR products, when the DCTs are in ganged mode, as specified by [The DRAM Controller Select Low Register] F2x110[DctGangEn], then each logical DIMM is two channels wide. Each physical DIMM of a 2-channel logical DIMM is required to be the same size and use the same timing parameters. Both DCTs must be programmed with the same information (see section 2.8.1 [DCT Configuration Registers] on page 61). When the DCTs are in 64-bit mode, a logical DIMM is equivalent to a 64-bit physical DIMM and each channel is controlled by a different DCT.
Hi, guys! Is it now confirmed that the ram clock is derived only from the external clock ("Ext.CLK", "System Clock", etc.), and not from the NB clock? (I think the IMC is part of the NB, so it would seem logical to me.)
ps., BTW, I'm sorry, but just why do you keep on writing FSB and "bus speed", in relation to the external clock? This is only a clock, not a bus, expecially not a Front Side Bus, like at Intel. (HT bus clock is derived from it by a multiplier, as we know.)
Another Quote from AMD's k10 dev guide
Quote:
2.6 The Northbridge (NB)
Each processor includes a single Northbridge that provides the interface to the local CPU core(s), the interface to system memory, the interface to other processors, and the interface to system IO devices. The NB includes all power planes except VDD; see section 2.4.1 [Processor Power Planes And Voltage Control] on page 25 for more information.
The NB of each node is responsible for routing transactions sourced from CPU cores and links to the appropriate CPU core, cache, DRAM, or link. See section 2.9.3 [Access Type Determination] on page 107 for more information.
2.6.1 Northbridge (NB) Architecture
Major NB blocks are: System Request Interface (SRI), Memory Controller (MCT), DRAM Controllers (DCTs), L3 cache, and Cross Bar (XBAR). SRI interfaces with the CPU core(s). MCT maintains cache coherency and interfaces with the DCTs; MCT maintains a queue of incoming requests called MCQ. XBAR is a switch that routes packets between SRI, MCT, and the links.
The MCT operates on physical addresses. Before passing transactions to the DCTs, the MCT converts physical addresses into normalized addresses that correspond to the values programmed into [The DRAM CS Base Address Registers] F2x[1, 0][5C:40]. Normalized addresses include only address bits within the DCTs’ range.
The normalized address varies based on DCT interleave and hoisting settings in [The DRAM Controller Select Low Register] F2x110 and [The DRAM Controller Select High Register] F2x114 as well as node interleaving based on [The DRAM Base/Limit Registers] F1x[1, 0][7C:40].
So SRI, XBAR, MCT, L3 DCT0 and DCT1 all run on NB speed?Code:Core 0 ----
|
Core 1 ----
|---- SRI ---- XBAR ---- cHT
Core 2 ---- |
| MCT --- L3
Core 3 ---- |
---
| |
DCT0 DCT1
| |
LDIMM0 LDIMM1
Or do the DCT's run on ram clock?
I have to answer myself and to tell that i am wrong :(
With the step postet by tictac i only changed the vcore of the first core, to change the second, third till eight´s core i have to switch the cores in Cyrstalcpuid and then reload with MSR Editor.
I made a regdump with cpu-z and i wonder about the msr code of core 1-3
Now I inserted second CPU and made a new regdump.
Second CPU runs with 1.10V and the MSR Code is
40 = 1.15V
48 = 1.10V
Now I changed every register to 38 = 1.20V and beginn a stresstest at 2.0GHz with my Barcelona 2344
kyosen
Could you please test a sisoft sandra 2007 benchmark on a 32 bit system to obtain such kind of results:
http://www.expreview.com/img/news/07...6047070_rs.jpg
I'd like to know if the low memory score is strange or is due to sandra.
thanks
sorry.. indiana i cant get back to you.. i am viewing this from opera mini mobile phone.. each processor have it own MSR.. 64 stand for p-state0.. 65 p-state1.. and so on..
yeah you did it right.. change cpu id from crystal spui than open MSR editor for each processor... :up: :up:
other p-state than 0 used for power saving CnQ implementation..
Maybe I just missed it, but how is the memory clock calculated? Is there a calculateing difference between Barcelona and Phenom?
Ok, memory clock is obtained from bus clock but how?
So how you get 333MHz for DDR2-667 on Barcelona, and 400MHz for DDR2-800 or 533MHz for DDR2-1066 for Phenom?
i came out with this.... but no confirmation yet :shrug:
Memory Clock Speed
Memory clock = NB Speed / memory dividerCode:NB Speed HTT DDR400 DDR533 DDR667 DDR800 DDR1066
1600Mhz 200MHz 200MHz 266MHz 320MHz 400MHz 533MHz
1800MHz 200MHz 200MHz 257MHz 300MHz 360MHz 450MHz
Memory divider
DDR 400 = NB Multiplier
DDR 466 = NB Multiplier - 1
DDR 533 = NB Multiplier - 2
DDR 667 = NB Multiplier - 3
DDR 1066 = NB Multiplier - 4
indiana_74... few more tweak to set your HT link speed and NB Speed
Link : http://www.xtremesystems.org/forums/...=164768&page=2
This is interesting. Thanks for the info!
The cores and the NB are separated from each other, so the voltages can be adjusted separated too. With a locked Phenom where you can't raise the multiplier just the HT freq, the NB freq is going with it? Or it can be controlled separetly too? And the L3 and Ram freq?
It would go with it, but probably you can degrade the NB's own multiplier. (Hopefully AMD won't lock it downwards. edit: would be a stupid idea.)
L3 freq is the same as NB's, isn't it?
BTW, HT freq is already a derived clock, using the HT multiplier - you mean the external clock/system clock. (HT clock = ext. clock * HT multiplier.)
"L3 freq is the same as NB's, isn't it?"-> yes and OCing the whole NB would easily bring down latency of L3 and mem. latency.Now the NB runs at rather low 1.8Ghz.WIth the NB clock up to 2.5Ghz,we could see a nice boost in some cache sensitive apps.Also,every single Phenom we saw had its ram clocked low with poor timing :(.I can't believe people who could get their hands on such a sparse part at this moment ,couldn't get some decent LL DDR2 kit :shrug:
Hi Tictac !
Are you use the memory clock is computed from NB speed ?
I'd rather assume that :
Memory clock = HTT Speed / memory divider
But it's just a guess ...
If you're correct, that would mean that memory clock also depends on the single/dual plane setting (the NB speed depends on it).
yeah.. memory speed computed from nb speed.. ht link speed use from nb to chipset or other cpu.. nb speed use to bridge nb to the 4cpu core... so i go with nb speed..
If I'm understanding the AMD K10 Dev Guide correctly the DCTs run on the NB at NB speed.
As a side note for video BMs: Setting the DCT/DRAM to 128-bit mode will not support 32-byte bursts - only the 64-bit mode will support 32-byte bursts. May make a big difference when running video benches ...
.
Yes, K10 won't support 32-byte granularity in dual channel (single controller), just like any other AMD DDR2 rig. Of all the K8 systems only socket 939 rev E supports 32-byte mode, but none of the earlier 939 revisions or any AM2 CPUs do. Remember that the channels are accessed in perfect parallel (lockstep), and that DDR2 has a minimum prefetch width of 4 columns.
1 4n-prefetch x 2 ch = 8 columns = 64 bytes.
On the K10s unganged mode (2x 64-bit independent) supports the 32-byte burst, ganged mode (2x 64-bit parallel) does not. That might explain some discrepancies in the benches.
Bits and bytes - original post edited ... :)
I tried Vcore&Vnb mods, and got 2.6G@Vcore=1.38V, Vnb=1.35V.
I pushed it further, and saw 2.67G SuperPI1M run and then,
the system freezed...it's dead as prepared for arrival of Intel 45nm:hitself:
Last screenshot:
http://www.oohashi.jp/c-board/file/S...K10B1-2.6G.png
Sad news indeed.
One or both cpu's dead? mobo?
digital vrm per chance?
when you get your rig going again any chance we could get a sciencemark run?
@kyosen
Due to the optimization of Cinebench 10 for the Intel Core architecture, is it possible to run a test on Cinebench 9.5 to get a fair comparision between K10 and Core2Q ?
-------------------------------------------
"One major improvement in CINEMA 4D has been the fine-tuned optimization for the Intel® Core™ microarchitecture, which took advantage of MAXON’s ten-years experience in multi-threading. [...] On the next-gen Intel® Core™ 2 processor codenamed Yorkfield, CINEMA 4D performs twice as fast as on an Intel® Core™ 2 Duo processor. CINEMA 4D takes advantage of various multi-core processors from Intel, e.g. Intel® Xeon®, Intel® Core™ 2 Duo and Intel® Core™ 2 Quad processors."
"In order to fully exploit the power of Intel processors, MAXON keeps improving its software code using Intel® C++ Compiler. [...] The Intel®
VTune™ Performance Analyzer and Intel® Threading Tools support MAXON’s developers in controlling performance improvements and identify performance
related issues.
Thanks to the accurate software tuning and the latest Intel technology advancements, CINEMA 4D now runs more than twice as fast as it did only one year ago - comparing the CINEMA 4D performance on the next-gen Quad-Core processor codenamed "Yorkfield" and on the Intel® Core™ 2 Extreme X6800 Dual-Core processor running at the same frequency - as measured by MAXON's benchmarking tool CINEBENCH."
-------------------------------------
My finding on benchmarking is that certainly the software optimization is a little bias. Intel always releasing new C++ compiler which can greatly optimized for their own processor. Performance gain is doubtful (real performance gain vs scores gain).
Anyway, it's like back into the old day of AMD which a few benchmark can keep up with Intel but the price are much low. I don't see running an AMD platform is like riding a car traveling at 60km/h while Intel's platform is at 60mph.
That certainly holds true for this hardware comparison of Barcelona 1.9GHz v Xeon 1.86GHz:
http://www.hardwarezone.com/articles...?cid=2&id=2411
Where did you see a fair comparision. Not using fully capabilities of new architecture is suppose to show something or you believe maxon have just discovered vtune?
look at slide around 25:
http://www.securitytechnet.com/resou...2/PCS027PS.pdf
So the conclusion is if you're interessed in CINEMA 4D performance you take the fastest and compare plateforms. If not, and your case is to do something really fair, you do your own code optimized on one side for intel next gen cpu and one the other side for amd next gen cpu. :shrug:
Is it doubtful?
http://www.trustedreviews.com/images...e/5887-mp3.gif
Used to be a program that would disable Intel processor checks to allow optimized code to run on processors other than "GenuineIntel" ... might help, might not..
EDIT
Wow, the harper and clover system in that hardwarezone writeup are like apples and oranges... 8x 533mhz FB-DIMM's in the Clover and 4x 667mhz FB-DIMM's in the Harper.. best to use 4 sticks (8 is slower) and not matched for speed at all..
It's the proof of the compiler bias. Intel Compiler improves performance against ms-compiler only on intel machines. That's why I always stated that real benchs are compiled benchs with information on compiler and flags released. And no, it doesn't mean that specs are the only bench to be trusted, but yes it means that there will be more trustable benchies on linux than on windows.
But let's avoid a flamme war on OSes here. :)
Yes and no. Because then why other compilers would give rise to higher performance increasements when told to use SSE2 and SSE3 on AMD machines?
I mean, if icc gives an increase performance from "do not use SSE extensions" to "use SSE extensions" of, lets say, 15% on an intel cpu and of just 10% on an AMD cpu, then that would mean that you are right. But when the same code, compiled with MS compiler, sun compiler, portland group compiler or even gcc shows that turning on the SSE on both architectures yelds similar increasements (say, 15% and 14%), then you have to begin to consider that, yes, intel compiler is giving problem for AMD machines, consciously. ;)
If you take the previous case I have give on lame compiled with intel or with ms, you see that with Intel compiler, it's faster for Intel but even for AMD this is a little bit faster than MS compiler. The increasements could not be only consider in terms of % if with AMD cpu the best compiler is the Intel one... So I'm agree that Intel compiler is optimized for intel cpu but not for "intel compiler giving problem for AMD machines, consciously".
http://www.tgdaily.com/content/view/34799/118/
Do you believe that when AMD releases new math library to support barcelona, it will increase by the same ratio computing power on intel cpu? Obviously not but I will not say AMD is giving problem for Intel machines, consciously.
"And no, it doesn't mean that specs are the only bench to be trusted, but yes it means that there will be more trustable benchies on linux than on windows."
At least, spec allow to compare best performance for a given set of software at optimum speed, this is probably a better bench to compare 2 architectures. But you will not ask software developper like maxon to sell slower software because it will be most fair for amd. And this is an other side of the problem. :shrug:
It´s me again :)
I think that this don´t work.
I try to push my 2344HE to 2GHz but it was not stable, 1.9GHz = not stable, 1.8GHz = not stable :(
I use this methode to give more Vcore.
I push it to 1.8GHz with 1.2V Vcore (original is 1.1 and 1.15V) and it was not stable.
No i use a voltmeter to check the wattage of the complete system and it shows me that there is NO difference between 1.1V and 1.25V Vcore when the system is running at 100% :shakes:
That is not possible, there must be a difference!
The CPU does not change the Vcore :confused:
What can i do now?
OK, let's take a look at this.
note. these calculations are by no means accurate. i have assumed the current a CPU draws is constant, etc. But at least it provides an indication
At stock, the procecessor's TDP is 68W. Stock is 1.1V, so 68/1.1=6.18A. Now if you feed it and extra 0.15V, that mean 6.18*0.15=0.93 extra watts, a lot of which isnt actually being used (TDP is a theoretical maximum), and therefore the difference is not large enough to show up when measuring power, especially not of the whole system.
i though that power draw scales ^2 with voltage. Or i was told so a while ago on this forums...
so if its 68W @ 1.1V it should be (1.25/1.1)^2*68= ~88W or 20W more at 1.25V
when you factor in C/W values... if you are ~30C above ambient at stock, raising the Vcore from 1.1 to 1.25 should add very roughly 8.8C to your load.
again, these are very rough equations, but i found they are pretty close to real world data. even if your cpu uses less wats than its TDP (and it actually does use less) its correct to certain margin of error, because than difference between actual power draw and TDP is on both sides on equation.
I experimented with them on my system to "guess" core temp at certain speed/volts and then i found they are +/- 1C from real values.
if something is unclear in my post, dont shoot me... i tried my best with my english lol
edit: Cronos, you were 3min late lol ;)
Okay, so I understand you right when I say that there must be a difference when I rise the Vcore from 1.1V to 1.2V.
Note that the CPU are working at there limit when I messure the wattage.
All eight cores are at fully load with QMC workunits and on my other systems i can see a difference when I change it only for 0.05V.