Page 17 of 29 FirstFirst ... 71415161718192027 ... LastLast
Results 401 to 425 of 713

Thread: K10 Scores starting to surface

  1. #401
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman View Post
    This is not entirely true...

    I read somewhere long time ago that L3 in K10 is acting more like memory layer. In other words it is clocked by IMC independently from all 4 cores and on diagram I would put it after CrossBar...
    That's why L3 latency can vary from core point of view (cache latency itself is probably constant). It is similar to how DDR2-800 latency (again from CPU point of view) is different compared to DDR2-667 (same timings of course ).


    Edit: JumpingJack you typing too fast I barely read page 16 and typed my response and here surprise! another page with new info making my post partially obsolete
    I am not sure I undestand if you understand what I am trying to say ...

    A shared resource clocked at one speed to 4 other resources clocked at different speeds will necessitate asyncronous communications... there is no other way... thus AMD must provide functionality to account for floating clocks between 4 cores to one memory pool, L3.... just adding circuits to do this work will incur latency...

    Add on top of that, 1:1 divide latency < 3:2 divider latency < 2:1 divider latnecy... hence the 'observed' latency from any core is variable...... at least if you read Kanter's article this is what the FIFO buffers do... he did not mention the x-bar.

    There is research ongoing to work on achieving both low BW and low latency asynchronous networking, but there has always been this fundamental trade-off:
    Previously published NoCs which provide GS are &#198;THEREAL [18][9] and NOSTRUM [14]. Both are synchronous and employ variants of time division multiplexing (TDM) for providing per connection bandwidth (BW) guarantees. TDM has the drawback of the connection latency being inversely proportional to the BW, thus connections with low BW and low latency requirements, e.g. interrupts, are not supported.
    http://www.ee.technion.ac.il/courses...OC-async05.pdf

    Not quite the paper I would use, but the one I could find recently written that summarized the issue at hand that I could quote as a source and not have you take my word for it .... i.e. connection latency is hard to get very low in networks where a globalized clock is not real.... here he discusses time division multiplexing, a type of clock dividing.

    Edit: Found another paper which is much more detailed, and has some info on the FIFO implementation over a global clock:
    Simulation results for the FIFO and the two versions of the adder are given in Table 1. The
    optimized adder has 2-input c-elernents while the other adder is using 4-input C-elements.
    The operations/second indicate the number of logic evaluations done pcr second in each
    basic cell. Cycle time is the fastest time at which the pipeline cm send out successive data
    values. Latency is the time it takes for data to go from the input of the circuit until it is
    finally ready at the output. Pipelined systems work on the principle of reducing the cycle
    time at the cost of increased latency. The next section examines how an enhancement to
    the system cm reduce the latency even further.
    http://www.collectionscanada.ca/obj/...11/MQ34126.pdf
    (see page 73). This is an old paper, but he is showing 18 ns latency for a straight up FIFO buffer. This is a large number, and not to be considered true or accurate wrt K10.

    Jack
    Last edited by JumpingJack; 09-01-2007 at 03:40 PM.

  2. #402
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by JumpingJack View Post
    I am not sure I undestand if you understand what I am trying to say ...

    A shared resource clocked at one speed to 4 other resources clocked at different speeds will necessitate asyncronous communications... there is no other way... thus AMD must provide functionality to account for floating clocks between 4 cores to one memory pool, L3.... just adding circuits to do this work will incur latency...

    Add on top of that, 1:1 divide latency < 3:2 divider latency < 2:1 divider latnecy... hence the 'observed' latency from any core is variable...... at least if you read Kanter's article this is what the FIFO buffers do... he did not mention the x-bar.

    Jack
    I understand what your trying to say that's why I put edit.
    As I said I read long time ago (probably on RWT but not coming from DK) that L3 will be operating in similar way to normal memory and will be possible to clock it independently form cores.
    If I'm following your understanding correctly, your saying that L3 will be clocked from highest frequency core in CPU (2GHz K10-->2GHz L3) which in my opinion is not the case.

    Of course asynchronous clocking will add latency but it might be a good trade off compared to gains in power/flexibility. (besides look at L3 latency numbers, they are high for a CPU cache so clearly we have lots of logic circuity in between)

    Well, in the end we will find out shortly

    Edit: I'm just thinking why would AMD release different Phenom models with differently clocked HTT bus (from official roadmaps)?? The answer can be that together with increased HTT speed L3 cache is also clocked higher (and IMC) and that gives some tangible performance improvements.
    Last edited by Lightman; 09-01-2007 at 03:43 PM.
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  3. #403
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman View Post
    I understand what your trying to say that's why I put edit.
    As I said I read long time ago (probably on RWT but not coming from DK) that L3 will be operating in similar way to normal memory and will be possible to clock it independently form cores.
    If I'm following your understanding correctly, your saying that L3 will be clocked from highest frequency core in CPU (2GHz K10-->2GHz L3) which in my opinion is not the case.

    Of course asynchronous clocking will add latency but it might be a good trade off compared to gains in power/flexibility. (besides look at L3 latency numbers, they are high for a CPU cache so clearly we have lots of logic circuity in between)

    Well, in the end we will find out shortly

    Edit: I'm just thinking why would AMD release different Phenom models with differently clocked HTT bus (from official roadmaps)?? The answer can be that together with increased HTT speed L3 cache is also clocked higher (and IMC) and that gives some tangible performance improvements.
    I don't know how the L3 will be clocked it will however need one clock and as Informal and others push the detail envelop, I am beginning to understand some of the L3 details that I had otherwise not really considered.

    Your edit could be correct too....

    AMD has had quite a bit of experience getting the best clock/latency performance out of different clocked agents, the IMC is a good example as it the HT links all of which time on clocks different than the core but put data into the core....

    It is both interesting but irrelevant, performance will be what it performs at overall.... and we are hoping it is better than the showing that started this thread.

  4. #404
    Xtreme Cruncher
    Join Date
    May 2007
    Posts
    570
    here are my results from my opteron 2218 , if you guys want me to run any test on my quad to compare to the phenom let me know.


  5. #405
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by leoftw View Post
    here are my results from my opteron 2218 , if you guys want me to run any test on my quad to compare to the phenom let me know.
    Do you have a way to run 64-bit?

  6. #406
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman View Post
    I understand what your trying to say that's why I put edit.
    As I said I read long time ago (probably on RWT but not coming from DK) that L3 will be operating in similar way to normal memory and will be possible to clock it independently form cores.
    If I'm following your understanding correctly, your saying that L3 will be clocked from highest frequency core in CPU (2GHz K10-->2GHz L3) which in my opinion is not the case.

    Of course asynchronous clocking will add latency but it might be a good trade off compared to gains in power/flexibility. (besides look at L3 latency numbers, they are high for a CPU cache so clearly we have lots of logic circuity in between)

    Well, in the end we will find out shortly

    Edit: I'm just thinking why would AMD release different Phenom models with differently clocked HTT bus (from official roadmaps)?? The answer can be that together with increased HTT speed L3 cache is also clocked higher (and IMC) and that gives some tangible performance improvements.

    No problem ... when I get into detailed discussions like this, I tend to be verbose ... being a public forum, a number of people read what we write and, because it is a forum, I post a lot of references and quotes... don't take that as an afront to your knowledge base .... what I do try to do is provide ample detail so others, who may not completely follow, gain some level of understanding... (it also helps me learn more as I go along)

    Jack

  7. #407
    Xtreme Cruncher
    Join Date
    May 2007
    Posts
    570
    Quote Originally Posted by JumpingJack View Post
    Do you have a way to run 64-bit?
    I have to install 64bit for that , sorry . I'm having enough problems with these damn 32bit vista drivers :P

  8. #408
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by leoftw View Post
    I have to install 64bit for that , sorry . I'm having enough problems with these damn 32bit vista drivers :P
    No biggy....

    From 32 to 64 bit comparisions, for version 9.5 anyway, the K8 arch can improve Cinbench 10-15% in my recollection, so comparing 64 to 64 is a better feel for the K10 v K8 ....

  9. #409
    Xtreme Cruncher
    Join Date
    May 2007
    Posts
    570
    sorry about that guys

  10. #410
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    [QUOTE=bobjr;2406885]<censored>[QUOTE]

    This is a good way to earn a ban. he... you edited it

  11. #411
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    [QUOTE=JumpingJack;2406887][QUOTE=bobjr;2406885]<censored>

    This is a good way to earn a ban. he... you edited it
    Yea, it is, and I'm probably one of the easier going Mods here.
    I think he and I will have a little talk..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  12. #412
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by JumpingJack View Post
    No problem ... when I get into detailed discussions like this, I tend to be verbose ... being a public forum, a number of people read what we write and, because it is a forum, I post a lot of references and quotes... don't take that as an afront to your knowledge base .... what I do try to do is provide ample detail so others, who may not completely follow, gain some level of understanding... (it also helps me learn more as I go along)

    Jack


    @leoftw
    How your system is configured memory wise. Do you have DIMMs plugged for both CPU sockets?

    Can you run SuperPi? It is not x64 optimized so scores will be very comparable with your system. Same goes for CPU-Z cache latency test.
    Thanks for your effort!

    EDIT: I just noticed over 4x speedup in muliCPU test! Why is that?? have you done 1-CPU test at lower clocks???
    Last edited by Lightman; 09-02-2007 at 01:01 AM.
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  13. #413
    Xtreme Addict
    Join Date
    Jan 2007
    Location
    Brisbane, Australia
    Posts
    1,264
    Excellent stuff leoftw. That's exactly what we need for a compo.

    any chance you run the other isngle thread benchmarks we saw??

    Super pi 1M
    CPUmark99

  14. #414
    Xtreme Enthusiast
    Join Date
    Apr 2006
    Location
    Brasil
    Posts
    534
    Quote Originally Posted by Lightman View Post
    EDIT: I just noticed over 4x speedup in muliCPU test! Why is that?? have you done 1-CPU test at lower clocks???
    Probably since 1 core at full load is little total cpu load (25&#37 Cool 'n Quiet keeps all cores at lower clocks...

    leoftw, try turning Cool 'n Quiet off.

  15. #415
    Xtreme Enthusiast
    Join Date
    Aug 2003
    Posts
    567
    informal
    Go laugh on the floor at yourself.
    http://www.techarp.com/showarticle.a...tno=424&pgno=2
    This cache is 32-way set associative and is based on a non-inclusive victim cache architecture.

    This is from BIOS and Kernel Developer's Guide for AMD Family 10h Processors documentation.
    Last edited by VVJ; 09-02-2007 at 08:19 AM.

  16. #416
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by VVJ View Post
    informal


    This is from BIOS and Kernel Developer's Guide for AMD Family 10h Processors documentation.
    This is not surprising.... the associativity of cache relates to the number of cache lines (or memory blocks) fixed to that set. The associativity will increase with size of the cache as each set is fixed with respect to the number of blocks allocated to them, and since AMD will ultimately up or lower L3 cache the number set associativity must change.

    For example, AMD as a 2 meg-32 way associativity, a 4 meg would be 64 set associative, a 6 meg would be 96 way associative. Since they are allowing an associaitivty of 16 in their BIOS guide, then it appears AMD is at some point may be willing to release a 1 meg L3 cache chip (perhaps, because it is there does not mean there are plans).

    Intel's associativity for wolfdale will be 24 way associative for 6 meg but 12 way associaitve for 3 meg. Intel has not changed their caching for Wolfdale over conroe other than raw size because their 2 Meg Allendale is 8 way associative, while Conroe 4 Meg is 16 way associative.

    Jack
    Last edited by JumpingJack; 09-02-2007 at 08:34 AM.

  17. #417
    Xtreme Member
    Join Date
    Jun 2005
    Location
    Bulgaria, Varna
    Posts
    447
    I wonder, will the L3 cache in K10 act also as a snoop filter in multiprocessor systems?

  18. #418
    Xtreme Enthusiast
    Join Date
    Aug 2003
    Posts
    567
    JumpingJack, a shared L3 cache is a configurable part of the Northbridge. It may not include the L3 cache as well.

  19. #419
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by VVJ View Post
    informal


    This is from BIOS and Kernel Developer's Guide for AMD Family 10h Processors documentation.
    K10 design can support up to 8MB L3 cache then...
    Interesting !
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  20. #420
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by VVJ View Post
    JumpingJack, a shared L3 cache is a configurable part of the Northbridge. It may not include the L3 cache as well.
    Hmmmmm... this is the first time I have heard of cache associated with the northbridge....

  21. #421
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Quote Originally Posted by JumpingJack View Post
    Hmmmmm... this is the first time I have heard of cache associated with the northbridge....
    Socket 7 and earlier motherboards had caches L2 (or L3 if you had K6-III) associated to northbridge and these caches was clocked by FSB . That was not that long ago ....
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  22. #422
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman View Post
    Socket 7 and earlier motherboards had caches L2 (or L3 if you had K6-III) associated to northbridge and these caches was clocked by FSB . That was not that long ago ....
    You may be thinking of the time when L2 was off die, in which case it was loaded off the northbridge... through the backside bus then later on the frontside bus. In fact, the term northbridge is historic, in that in diagrams, this chip (with the memory controller) and the L2 were north of the CPU...

    Later, the northbridge moved down below the CPU between the CPU and southbridge, but the L2 was still distinct.

    Here is an example, but the diagram is after the north


    Here is another example:


    Of course, I could be wrong -- I am going from memory mostly. Hang tight, I will go look up the 'evolution of the northbridge' paper that shows the history, if I can find it.

    EDIT: I knew there was a configuration for off-die L2 cache through the backside bus.... still cannot find the paper, but found the block diagram:
    Last edited by JumpingJack; 09-02-2007 at 01:26 PM.

  23. #423
    Xtreme Mentor
    Join Date
    Nov 2005
    Location
    Devon
    Posts
    3,437
    Yes Jack, on K7 L2 was driven from BackSide Bus (cache was 2/3 or 3/5 of the core clock and packed on Slot A module), but earlier I'm not sure TBH!

    It was my personal experience which lead me to believe that L2 cache on motherboards was connected to Northbridge because performance of CPU varied considerably from motherboard you used... (I'm speaking solely about cache intensive tests).
    Besides if L2 cache on older motherboards was driven by CPU backside bus then how on earth very old P60 could known about 2MB cache on my Epox board??

    Edit: I found something!

    M1541 includes the higher CPU bus frequency (up to 100 MHz) interface for all Socket-7 compatible processors, PBSRAM and Memory Cache L2 controller to reduce cost and enhance performance, high performance FPM/EDO/SDRAM DRAM controller, PCI 2.1 compliant bus interface, smart deep buffer design for CPU-to-DRAM, CPU-to-PCI, and PCI-to-DRAM to achieve the best system performance. It also has the highly efficient PCI fair arbiter. M1541 also provides the most flexible 64-bit memory bus interface for the best DRAM upgrade-ability and ECC/Parity design to enhance the system reliability.
    Link
    Last edited by Lightman; 09-03-2007 at 12:12 AM.
    RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W

    RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU

    SmartPhone Samsung Galaxy S7 EDGE
    XBONE paired with 55'' Samsung LED 3D TV

  24. #424
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by Lightman View Post
    Yes Jack, on K7 L2 was driven from BackSide Bus (cache was 2/3 or 3/5 of the core clock and packed on Slot A module), but earlier I'm not sure TBH!

    It was my personal experience with lead me to believe that L2 cache on motherboards was connected to Northbridge because performance of CPU varied considerably from motherboard you used... (I'm speaking solely about cache intensive tests).
    Besides if L2 cache on older motherboards was driven by CPU backside bus then how on earth very old P60 could known about 2MB cache on my Epox board??

    Edit: I found something!



    Link
    Nice.

  25. #425
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    a nice picture reference for mobos http://redhill.net.au/b/b-92.html
    but it ends at 2002
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

Page 17 of 29 FirstFirst ... 71415161718192027 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •