The thing is, for multithreaded programs, you can't fairly compare the K8 vs K10 because there isn't any quad cores K8. The only way to do a fair comparison would be to disable two of the cores when testing.
The thing is, for multithreaded programs, you can't fairly compare the K8 vs K10 because there isn't any quad cores K8. The only way to do a fair comparison would be to disable two of the cores when testing.
The SMP client spawns 4 total threads that actively do calculations. If you have less than 4 cores, 2 threads are shared on a core.
If you have MORE than 4 cores (dual socket quad cores), then you have to run multiple instances of the client to load up all cores.
Here is a SS from my server:
Dual G0 Xeon E5310 @ 2.0GHz.
On a single client on this machine I normally get about 13.8-14.2 min per 1%. When you load up 2 clients, the time increases to 14.4 min per 1% per client. I can only assume that is b/c both processors share the same FSB and memory.
Which part of the core does the SMP client use most (what does it depend most on)?
How does it scale with frequency?
*edit* nvm,
Last edited by Sparky; 11-07-2007 at 10:43 AM.
The Cardboard Master Crunch with us, the XS WCG team
Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64
Core 2 quad (Kentsfield) 2.96 GHz: 1 SMP instance 2653: 8:5x
Core 2 quad (Kentsfield) 2.96 Ghz: 2 SMP 2653: 15:45
OS: tweaked WinXP pro
IP-35E, 370 fsb, 2 GB Adata 1110 MHz CL5, performance level 5
If I could get 450+ fsb it would run even faster, but this mobo is not a good clocker.
Concerning unusualy high speedup in 64bit Cinebench, notice that with 4 threads, total L2 cache available to application is 4x0.5MB =2MB, while single thread only has 0.5MB available to it.
It is known that sometimes it is possible to get more than 100% (n*100%) speedup because of bigger aggregate cache available then application is running several threads.
Note that Core2 has significantly lower speedup because of shared nature of L2.
i know L3 cache is being recognized, but is it being utilized?
Athlon XP-M 2500+ 0343MPMW The King is Dead!
Phenom II X6 1090T 1025GPMW Long Live the King!
-------------------------------------------
I'm from the church of the operating room
Well about the ganged and unganged memory, I asked my professor about it, so hopefully he'll give a good response.
Excerpt from BKDG For AMD Family 10h Processors Page 60.
2.8 DRAM Controllers (DCTs)
The DCTs support DDR2 DIMMs or DDR3 DIMMs. Products may be configurable between DDR2 and DDR3 operation.
A DRAM channel is the group of the DRAM interface pins that connect to one series of DIMMs. The processor supports two DDR channels. The processor includes two DCTs. Each DCT controls one 64-bit DDR DIMM channel.
For DDR products, DCT0 controls channel A DDR pins and DCT1 controls channel B DDR pins. However, the processor may be configured: (1) to behave as a single dual-channel DCT; this is called ganged mode; or
(2) to behave as two single-channel DCTs; this is called unganged mode.
A logical DIMM is either one 64-bit DIMM (as in unganged mode) or two identical DIMMs in parallel to create a 128-bit interface (as in ganged mode). See section 1.5.2 [Supported Feature Variations] on page 20 for information about supported package/DRAM configurations.
For DDR products, when the DCTs are in ganged mode, as specified by [The DRAM Controller Select Low Register] F2x110[DctGangEn], then each logical DIMM is two channels wide. Each physical DIMM of a 2-channel logical DIMM is required to be the same size and use the same timing parameters. Both DCTs must be programmed with the same information (see section 2.8.1 [DCT Configuration Registers] on page 61). When the DCTs are in 64-bit mode, a logical DIMM is equivalent to a 64-bit physical DIMM and each channel is controlled by a different DCT.
Hi, guys! Is it now confirmed that the ram clock is derived only from the external clock ("Ext.CLK", "System Clock", etc.), and not from the NB clock? (I think the IMC is part of the NB, so it would seem logical to me.)
ps., BTW, I'm sorry, but just why do you keep on writing FSB and "bus speed", in relation to the external clock? This is only a clock, not a bus, expecially not a Front Side Bus, like at Intel. (HT bus clock is derived from it by a multiplier, as we know.)
Another Quote from AMD's k10 dev guide
2.6 The Northbridge (NB)
Each processor includes a single Northbridge that provides the interface to the local CPU core(s), the interface to system memory, the interface to other processors, and the interface to system IO devices. The NB includes all power planes except VDD; see section 2.4.1 [Processor Power Planes And Voltage Control] on page 25 for more information.
The NB of each node is responsible for routing transactions sourced from CPU cores and links to the appropriate CPU core, cache, DRAM, or link. See section 2.9.3 [Access Type Determination] on page 107 for more information.
2.6.1 Northbridge (NB) Architecture
Major NB blocks are: System Request Interface (SRI), Memory Controller (MCT), DRAM Controllers (DCTs), L3 cache, and Cross Bar (XBAR). SRI interfaces with the CPU core(s). MCT maintains cache coherency and interfaces with the DCTs; MCT maintains a queue of incoming requests called MCQ. XBAR is a switch that routes packets between SRI, MCT, and the links.
The MCT operates on physical addresses. Before passing transactions to the DCTs, the MCT converts physical addresses into normalized addresses that correspond to the values programmed into [The DRAM CS Base Address Registers] F2x[1, 0][5C:40]. Normalized addresses include only address bits within the DCTs’ range.
The normalized address varies based on DCT interleave and hoisting settings in [The DRAM Controller Select Low Register] F2x110 and [The DRAM Controller Select High Register] F2x114 as well as node interleaving based on [The DRAM Base/Limit Registers] F1x[1, 0][7C:40].So SRI, XBAR, MCT, L3 DCT0 and DCT1 all run on NB speed?Code:Core 0 ---- | Core 1 ---- |---- SRI ---- XBAR ---- cHT Core 2 ---- | | MCT --- L3 Core 3 ---- | --- | | DCT0 DCT1 | | LDIMM0 LDIMM1
Or do the DCT's run on ram clock?
I have to answer myself and to tell that i am wrong
With the step postet by tictac i only changed the vcore of the first core, to change the second, third till eight´s core i have to switch the cores in Cyrstalcpuid and then reload with MSR Editor.
I made a regdump with cpu-z and i wonder about the msr code of core 1-3
Now I inserted second CPU and made a new regdump.
Second CPU runs with 1.10V and the MSR Code is
40 = 1.15V
48 = 1.10V
Now I changed every register to 38 = 1.20V and beginn a stresstest at 2.0GHz with my Barcelona 2344
Last edited by indiana_74; 11-09-2007 at 05:46 AM.
kyosen
Could you please test a sisoft sandra 2007 benchmark on a 32 bit system to obtain such kind of results:
http://www.expreview.com/img/news/07...6047070_rs.jpg
I'd like to know if the low memory score is strange or is due to sandra.
thanks
sorry.. indiana i cant get back to you.. i am viewing this from opera mini mobile phone.. each processor have it own MSR.. 64 stand for p-state0.. 65 p-state1.. and so on..
yeah you did it right.. change cpu id from crystal spui than open MSR editor for each processor...
other p-state than 0 used for power saving CnQ implementation..
Last edited by tictac; 11-09-2007 at 07:39 AM.
Maybe I just missed it, but how is the memory clock calculated? Is there a calculateing difference between Barcelona and Phenom?
IQ_NOT_LESS_OR_EQUAL
outdated hardware
Ok, memory clock is obtained from bus clock but how?
So how you get 333MHz for DDR2-667 on Barcelona, and 400MHz for DDR2-800 or 533MHz for DDR2-1066 for Phenom?
IQ_NOT_LESS_OR_EQUAL
outdated hardware
i came out with this.... but no confirmation yet
Memory Clock Speed
Memory clock = NB Speed / memory dividerCode:NB Speed HTT DDR400 DDR533 DDR667 DDR800 DDR1066 1600Mhz 200MHz 200MHz 266MHz 320MHz 400MHz 533MHz 1800MHz 200MHz 200MHz 257MHz 300MHz 360MHz 450MHz
Memory divider
DDR 400 = NB Multiplier
DDR 466 = NB Multiplier - 1
DDR 533 = NB Multiplier - 2
DDR 667 = NB Multiplier - 3
DDR 1066 = NB Multiplier - 4
indiana_74... few more tweak to set your HT link speed and NB Speed
Link : http://www.xtremesystems.org/forums/...=164768&page=2
Last edited by tictac; 11-09-2007 at 09:12 AM.
This is interesting. Thanks for the info!
The cores and the NB are separated from each other, so the voltages can be adjusted separated too. With a locked Phenom where you can't raise the multiplier just the HT freq, the NB freq is going with it? Or it can be controlled separetly too? And the L3 and Ram freq?
IQ_NOT_LESS_OR_EQUAL
outdated hardware
It would go with it, but probably you can degrade the NB's own multiplier. (Hopefully AMD won't lock it downwards. edit: would be a stupid idea.)
L3 freq is the same as NB's, isn't it?
BTW, HT freq is already a derived clock, using the HT multiplier - you mean the external clock/system clock. (HT clock = ext. clock * HT multiplier.)
Last edited by dess; 11-09-2007 at 01:19 PM. Reason: spelling
"L3 freq is the same as NB's, isn't it?"-> yes and OCing the whole NB would easily bring down latency of L3 and mem. latency.Now the NB runs at rather low 1.8Ghz.WIth the NB clock up to 2.5Ghz,we could see a nice boost in some cache sensitive apps.Also,every single Phenom we saw had its ram clocked low with poor timing .I can't believe people who could get their hands on such a sparse part at this moment ,couldn't get some decent LL DDR2 kit
Bookmarks