Page 14 of 16 FirstFirst ... 4111213141516 LastLast
Results 326 to 350 of 380

Thread: Barcelona Opteron 2350(B1) arrived

  1. #326
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    The thing is, for multithreaded programs, you can't fairly compare the K8 vs K10 because there isn't any quad cores K8. The only way to do a fair comparison would be to disable two of the cores when testing.

  2. #327
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    Quote Originally Posted by kyosen View Post
    Just quick guesstimation:
    my result: 15m24s=924s for each 5000 steps with 2.0G x4 cores K10 Optreon
    your score: 15m24s + 6m = 1284s with 2.8G x2 cores K8 Optreon
    So, 1284/x * 2.8/2.0 /y = 924
    ...here x is efficiency of increased cores x2->x4, and y is performance gain per core.
    for example, if y is ~1.05, x is ~1.85 from the formula above...
    ...yeah x should be within 2.0 in this case.
    In my experience for SuperPI, y=~1.05 is feasible, at least on my board and current BIOS, so far.
    I don't know usual efficiency x for Folding@Home, but 1.85 looks feasible too...
    FYI, this is about on par with Kentsfield at 2.0GHz.

    I pull down about 14min flat per 1% on a 2653 WU on my octacore Clovertown server (running 2 instances - one per socket).

  3. #328
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    Quote Originally Posted by STEvil View Post
    Does the SMP client work one work unit across all cores or is it one per core? I assume it is one per core in this calculation. One across all cores will be different numbers.
    The SMP client spawns 4 total threads that actively do calculations. If you have less than 4 cores, 2 threads are shared on a core.

    If you have MORE than 4 cores (dual socket quad cores), then you have to run multiple instances of the client to load up all cores.

    Here is a SS from my server:


    Dual G0 Xeon E5310 @ 2.0GHz.

    On a single client on this machine I normally get about 13.8-14.2 min per 1%. When you load up 2 clients, the time increases to 14.4 min per 1% per client. I can only assume that is b/c both processors share the same FSB and memory.

  4. #329
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    Which part of the core does the SMP client use most (what does it depend most on)?
    How does it scale with frequency?

  5. #330
    Xtremely High Voltage Sparky's Avatar
    Join Date
    Mar 2006
    Location
    Ohio, USA
    Posts
    16,040
    *edit* nvm,
    Last edited by Sparky; 11-07-2007 at 10:43 AM.
    The Cardboard Master
    Crunch with us, the XS WCG team
    Intel Core i7 2600k @ 4.5GHz, 16GB DDR3-1600, Radeon 7950 @ 1000/1250, Win 10 Pro x64

  6. #331
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    Quote Originally Posted by KTE View Post
    Which part of the core does the SMP client use most (what does it depend most on)?
    How does it scale with frequency?
    P2653
    X3210 @ 2.00Ghz : 13:15
    X3210 @ 2.96Ghz : 9:30

  7. #332
    Xtreme Member
    Join Date
    Jul 2007
    Location
    Finland
    Posts
    105
    Core 2 quad (Kentsfield) 2.96 GHz: 1 SMP instance 2653: 8:5x
    Core 2 quad (Kentsfield) 2.96 Ghz: 2 SMP 2653: 15:45

    OS: tweaked WinXP pro
    IP-35E, 370 fsb, 2 GB Adata 1110 MHz CL5, performance level 5

    If I could get 450+ fsb it would run even faster, but this mobo is not a good clocker.

  8. #333
    Xtreme Enthusiast
    Join Date
    Mar 2007
    Posts
    557
    Concerning unusualy high speedup in 64bit Cinebench, notice that with 4 threads, total L2 cache available to application is 4x0.5MB =2MB, while single thread only has 0.5MB available to it.

    It is known that sometimes it is possible to get more than 100% (n*100%) speedup because of bigger aggregate cache available then application is running several threads.

    Note that Core2 has significantly lower speedup because of shared nature of L2.

  9. #334
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    Quote Originally Posted by JVguest View Post
    Core 2 quad (Kentsfield) 2.96 GHz: 1 SMP instance 2653: 8:5x
    Core 2 quad (Kentsfield) 2.96 Ghz: 2 SMP 2653: 15:45

    OS: tweaked WinXP pro
    IP-35E, 370 fsb, 2 GB Adata 1110 MHz CL5, performance level 5

    If I could get 450+ fsb it would run even faster, but this mobo is not a good clocker.
    I think I'll have to investigate what impact FSB and memory has on SMP results. Because you are getting about a 7% increase in performance from me with a faster FSB and memory speed on the same project.
    Last edited by Start; 11-07-2007 at 03:26 PM.

  10. #335
    Xtreme Addict
    Join Date
    Mar 2005
    Location
    Dallas, TX USA
    Posts
    1,381
    i know L3 cache is being recognized, but is it being utilized?
    Athlon XP-M 2500+ 0343MPMW The King is Dead!
    Phenom II X6 1090T 1025GPMW Long Live the King!

    -------------------------------------------
    I'm from the church of the operating room

  11. #336
    Xtreme Enthusiast
    Join Date
    Jun 2007
    Posts
    546
    Well about the ganged and unganged memory, I asked my professor about it, so hopefully he'll give a good response.

  12. #337
    Xtreme Addict
    Join Date
    Sep 2007
    Location
    Munich, DE
    Posts
    1,401
    Excerpt from BKDG For AMD Family 10h Processors Page 60.

    2.8 DRAM Controllers (DCTs)

    The DCTs support DDR2 DIMMs or DDR3 DIMMs. Products may be configurable between DDR2 and DDR3 operation.

    A DRAM channel is the group of the DRAM interface pins that connect to one series of DIMMs. The processor supports two DDR channels. The processor includes two DCTs. Each DCT controls one 64-bit DDR DIMM channel.
    For DDR products, DCT0 controls channel A DDR pins and DCT1 controls channel B DDR pins. However, the processor may be configured: (1) to behave as a single dual-channel DCT; this is called ganged mode; or

    (2) to behave as two single-channel DCTs; this is called unganged mode.
    A logical DIMM is either one 64-bit DIMM (as in unganged mode) or two identical DIMMs in parallel to create a 128-bit interface (as in ganged mode). See section 1.5.2 [Supported Feature Variations] on page 20 for information about supported package/DRAM configurations.

    For DDR products, when the DCTs are in ganged mode, as specified by [The DRAM Controller Select Low Register] F2x110[DctGangEn], then each logical DIMM is two channels wide. Each physical DIMM of a 2-channel logical DIMM is required to be the same size and use the same timing parameters. Both DCTs must be programmed with the same information (see section 2.8.1 [DCT Configuration Registers] on page 61). When the DCTs are in 64-bit mode, a logical DIMM is equivalent to a 64-bit physical DIMM and each channel is controlled by a different DCT.


  13. #338
    Registered User
    Join Date
    Feb 2007
    Posts
    6
    Quote Originally Posted by tictac View Post


    One Question, I have two Barcelonas in my system, is it correct that register C0010065 is for second CPU?

    In my system the first cpu has VCore 1.15V and second has 1.1V at default, the register shows the number 40 and 44 when made rdmsr.

  14. #339
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Hi, guys! Is it now confirmed that the ram clock is derived only from the external clock ("Ext.CLK", "System Clock", etc.), and not from the NB clock? (I think the IMC is part of the NB, so it would seem logical to me.)

    ps., BTW, I'm sorry, but just why do you keep on writing FSB and "bus speed", in relation to the external clock? This is only a clock, not a bus, expecially not a Front Side Bus, like at Intel. (HT bus clock is derived from it by a multiplier, as we know.)

  15. #340
    Xtreme Addict
    Join Date
    Sep 2007
    Location
    Munich, DE
    Posts
    1,401
    Another Quote from AMD's k10 dev guide
    2.6 The Northbridge (NB)
    Each processor includes a single Northbridge that provides the interface to the local CPU core(s), the interface to system memory, the interface to other processors, and the interface to system IO devices. The NB includes all power planes except VDD; see section 2.4.1 [Processor Power Planes And Voltage Control] on page 25 for more information.
    The NB of each node is responsible for routing transactions sourced from CPU cores and links to the appropriate CPU core, cache, DRAM, or link. See section 2.9.3 [Access Type Determination] on page 107 for more information.

    2.6.1 Northbridge (NB) Architecture

    Major NB blocks are: System Request Interface (SRI), Memory Controller (MCT), DRAM Controllers (DCTs), L3 cache, and Cross Bar (XBAR). SRI interfaces with the CPU core(s). MCT maintains cache coherency and interfaces with the DCTs; MCT maintains a queue of incoming requests called MCQ. XBAR is a switch that routes packets between SRI, MCT, and the links.
    The MCT operates on physical addresses. Before passing transactions to the DCTs, the MCT converts physical addresses into normalized addresses that correspond to the values programmed into [The DRAM CS Base Address Registers] F2x[1, 0][5C:40]. Normalized addresses include only address bits within the DCTs’ range.
    The normalized address varies based on DCT interleave and hoisting settings in [The DRAM Controller Select Low Register] F2x110 and [The DRAM Controller Select High Register] F2x114 as well as node interleaving based on [The DRAM Base/Limit Registers] F1x[1, 0][7C:40].
    Code:
    Core 0 ----
               |
    Core 1 ----                 
               |---- SRI ---- XBAR ---- cHT
    Core 2 ----                 |
               |               MCT --- L3
    Core 3 ----                 |
                               ---
                              |   |
                            DCT0 DCT1
                              |   |
                          LDIMM0 LDIMM1
    So SRI, XBAR, MCT, L3 DCT0 and DCT1 all run on NB speed?
    Or do the DCT's run on ram clock?

  16. #341
    Registered User
    Join Date
    Feb 2007
    Posts
    6
    Quote Originally Posted by indiana_74 View Post
    One Question, I have two Barcelonas in my system, is it correct that register C0010065 is for second CPU?

    In my system the first cpu has VCore 1.15V and second has 1.1V at default, the register shows the number 40 and 44 when made rdmsr.
    I have to answer myself and to tell that i am wrong

    With the step postet by tictac i only changed the vcore of the first core, to change the second, third till eight´s core i have to switch the cores in Cyrstalcpuid and then reload with MSR Editor.

    I made a regdump with cpu-z and i wonder about the msr code of core 1-3

    Now I inserted second CPU and made a new regdump.

    Second CPU runs with 1.10V and the MSR Code is

    40 = 1.15V
    48 = 1.10V

    Now I changed every register to 38 = 1.20V and beginn a stresstest at 2.0GHz with my Barcelona 2344
    Last edited by indiana_74; 11-09-2007 at 05:46 AM.

  17. #342
    Xtreme Enthusiast
    Join Date
    Feb 2007
    Posts
    508
    kyosen

    Could you please test a sisoft sandra 2007 benchmark on a 32 bit system to obtain such kind of results:
    http://www.expreview.com/img/news/07...6047070_rs.jpg

    I'd like to know if the low memory score is strange or is due to sandra.
    thanks

  18. #343
    Fused
    Join Date
    Dec 2003
    Location
    Malaysia
    Posts
    2,769
    sorry.. indiana i cant get back to you.. i am viewing this from opera mini mobile phone.. each processor have it own MSR.. 64 stand for p-state0.. 65 p-state1.. and so on..

    yeah you did it right.. change cpu id from crystal spui than open MSR editor for each processor...

    other p-state than 0 used for power saving CnQ implementation..
    Last edited by tictac; 11-09-2007 at 07:39 AM.

  19. #344
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Posts
    504
    Maybe I just missed it, but how is the memory clock calculated? Is there a calculateing difference between Barcelona and Phenom?
    IQ_NOT_LESS_OR_EQUAL

    outdated hardware

  20. #345
    Xtreme Enthusiast
    Join Date
    Feb 2007
    Posts
    508
    Quote Originally Posted by Sunfire View Post
    Maybe I just missed it, but how is the memory clock calculated?
    Is it what you ask:
    Quote Originally Posted by cpuz View Post
    Hi Kyosen,
    I got your PM, we'll work on that.
    Everest is assuming that memory clock is obtained from bus clock on K10, and CPU-Z computes it from CPU clock (as for K8).
    IMO Everest is correct, but Tamas & I had no confirmation at the moment. We're expecting a benchmark can tell what the real clock is.

    Nice rig, I'm impatient to see results on the new BA stepping

  21. #346
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Posts
    504
    Ok, memory clock is obtained from bus clock but how?
    So how you get 333MHz for DDR2-667 on Barcelona, and 400MHz for DDR2-800 or 533MHz for DDR2-1066 for Phenom?
    IQ_NOT_LESS_OR_EQUAL

    outdated hardware

  22. #347
    Fused
    Join Date
    Dec 2003
    Location
    Malaysia
    Posts
    2,769
    Quote Originally Posted by Sunfire View Post
    Ok, memory clock is obtained from bus clock but how?
    So how you get 333MHz for DDR2-667 on Barcelona, and 400MHz for DDR2-800 or 533MHz for DDR2-1066 for Phenom?
    i came out with this.... but no confirmation yet

    Memory Clock Speed
    Code:
    NB Speed	HTT	      DDR400          DDR533         DDR667          DDR800          DDR1066    
    
    1600Mhz		200MHz		200MHz		266MHz		320MHz		400MHz		533MHz
    1800MHz		200MHz		200MHz		257MHz		300MHz		360MHz		450MHz
    Memory clock = NB Speed / memory divider

    Memory divider
    DDR 400 = NB Multiplier
    DDR 466 = NB Multiplier - 1
    DDR 533 = NB Multiplier - 2
    DDR 667 = NB Multiplier - 3
    DDR 1066 = NB Multiplier - 4

    indiana_74... few more tweak to set your HT link speed and NB Speed
    Link : http://www.xtremesystems.org/forums/...=164768&page=2
    Last edited by tictac; 11-09-2007 at 09:12 AM.

  23. #348
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Posts
    504
    Quote Originally Posted by tictac View Post
    i came out with this.... but no confirmation yet

    Memory Clock Speed
    Code:
    NB Speed	HTT	      DDR400          DDR533         DDR667          DDR800          DDR1066    
    
    1600Mhz		200MHz		200MHz		266MHz		320MHz		400MHz		533MHz
    1800MHz		200MHz		200MHz		257MHz		300MHz		360MHz		450MHz
    Memory clock = NB Speed / memory divider

    Memory divider
    DDR 400 = NB Multiplier
    DDR 466 = NB Multiplier - 1
    DDR 533 = NB Multiplier - 2
    DDR 667 = NB Multiplier - 3
    DDR 1066 = NB Multiplier - 4

    indiana_74... few more tweak to set your HT link speed and NB Speed
    Link : http://www.xtremesystems.org/forums/...=164768&page=2
    This is interesting. Thanks for the info!
    The cores and the NB are separated from each other, so the voltages can be adjusted separated too. With a locked Phenom where you can't raise the multiplier just the HT freq, the NB freq is going with it? Or it can be controlled separetly too? And the L3 and Ram freq?
    IQ_NOT_LESS_OR_EQUAL

    outdated hardware

  24. #349
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    103
    Quote Originally Posted by Sunfire View Post
    With a locked Phenom where you can't raise the multiplier just the HT freq, the NB freq is going with it? Or it can be controlled separetly too? And the L3 and Ram freq?
    It would go with it, but probably you can degrade the NB's own multiplier. (Hopefully AMD won't lock it downwards. edit: would be a stupid idea.)
    L3 freq is the same as NB's, isn't it?

    BTW, HT freq is already a derived clock, using the HT multiplier - you mean the external clock/system clock. (HT clock = ext. clock * HT multiplier.)
    Last edited by dess; 11-09-2007 at 01:19 PM. Reason: spelling

  25. #350
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by dess View Post
    It would go with it, but probably you can degrade the NB's own multiplier. (Hopefully AMD won't lock it downwards. edit: would be a stupid idea.)
    L3 freq is the same as NB's, isn't it?

    BTW, HT freq is already a devided clock, using the HT multiplier - you mean the external clock/system clock. (HT clock = ext. clock * HT multiplier.)
    "L3 freq is the same as NB's, isn't it?"-> yes and OCing the whole NB would easily bring down latency of L3 and mem. latency.Now the NB runs at rather low 1.8Ghz.WIth the NB clock up to 2.5Ghz,we could see a nice boost in some cache sensitive apps.Also,every single Phenom we saw had its ram clocked low with poor timing .I can't believe people who could get their hands on such a sparse part at this moment ,couldn't get some decent LL DDR2 kit

Page 14 of 16 FirstFirst ... 4111213141516 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •