Page 58 of 181 FirstFirst ... 8485556575859606168108158 ... LastLast
Results 1,426 to 1,450 of 4519

Thread: AMD Zambezi news, info, fans !

  1. #1426
    Xtreme Member
    Join Date
    Apr 2005
    Location
    London, UK
    Posts
    261
    Quote Originally Posted by Lightman View Post
    That's the famous banned on XS person we shouldn't even link to his blog here. [OBR if you're confused]
    There are more pics but don't bother looking, because they might be fake.
    Man, why people even bother posting anything from him? He admitted faking BD tests a while ago.
    And to answer freeloader's question: FPS are so low because they are from OBR imagination.
    it is hard enough to imagine that Phenom II x6 would be twice as slow than 6 core i7

  2. #1427
    Xtreme Member
    Join Date
    May 2007
    Location
    Sweden
    Posts
    127
    OBR is a piece of . 100% fake and has too much spare-time of making all bs crap images... He has NOT a bulldozer sample, and has never had. Wait for the cpu and don't waste so much energy on speculating.

    Best regards from Sweden!
    Ivy Bridge 3770K @ ????MHz
    6c Intel Xeon X7460 24MB cache 16GB RAM 22TB HDD fileserver
    Dual Intel Xeon E5620 workstation
    SB 2600K @ 5016MHz 1.37v HT on AIR primestable
    AMD Athlon X3 425 @ B25 4GHz+ AIR
    AMD Athlon X2 6400+ @ 3811MHz AIR
    AMD Athlon X2 3600+ @ 3200MHz AIR
    AMD Athlon XP 1700+ @ 2714MHz AIR
    Thermalright Ultra-120 Extreme
    Corsair 8GB XMS3 2000MHz
    ATI Radeon HD5850 @ 1000MHz+/1200MHz+
    Windows 7 Enterprise x64
    Corsair HX750W

  3. #1428
    Xtreme Addict
    Join Date
    Dec 2008
    Location
    Sweden, Linköping
    Posts
    2,034
    Giving OBR attention is like feeding a troll, they will just continue. Every time someone talks about OBR Bulldozer IPC goes down so please stop it
    SweClockers.com

    CPU: Phenom II X4 955BE
    Clock: 4200MHz 1.4375v
    Memory: Dominator GT 2x2GB 1600MHz 6-6-6-20 1.65v
    Motherboard: ASUS Crosshair IV Formula
    GPU: HD 5770

  4. #1429
    Xtreme Member
    Join Date
    Jan 2011
    Location
    145.21.4.???
    Posts
    319
    Man, every post that mentioned about THAT IDIOT had been deleted by admin in this forum. Don't waste your mind and resources to discuss anything about him.



    EDIT:

    Seems some guys emulate FX-6110 by using ES FX-8130p and compared to 2600k and 1100T. I don't have time to observe every result, and don't know whether it could reflect any problem.

    http://www.f-paper.com/?i708835-Phot...lation-testing
    Last edited by undone; 08-08-2011 at 04:42 AM.

  5. #1430
    Xtreme Mentor
    Join Date
    Feb 2009
    Location
    Bangkok,Thailand (DamHot)
    Posts
    2,693
    september / october
    Intel Core i5 6600K + ASRock Z170 OC Formula + Galax HOF 4000 (8GBx2) + Antec 1200W OC Version
    EK SupremeHF + BlackIce GTX360 + Swiftech 655 + XSPC ResTop
    Macbook Pro 15" Late 2011 (i7 2760QM + HD 6770M)
    Samsung Galaxy Note 10.1 (2014) , Huawei Nexus 6P
    [history system]80286 80386 80486 Cyrix K5 Pentium133 Pentium II Duron1G Athlon1G E2180 E3300 E5300 E7200 E8200 E8400 E8500 E8600 Q9550 QX6800 X3-720BE i7-920 i3-530 i5-750 Semp140@x2 955BE X4-B55 Q6600 i5-2500K i7-2600K X4-B60 X6-1055T FX-8120 i7-4790K

  6. #1431
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    I've read the article. It doesn't look so good for Bulldozer single thread performance. I hope for AMD's sake that they've concentrated more on IPC then simply adding more cores. I guess we will know in about five to six weeks.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  7. #1432
    Xtreme Member
    Join Date
    Apr 2005
    Location
    London, UK
    Posts
    261
    Quote Originally Posted by undone View Post
    Seems some guys emulate FX-6110 by using ES FX-8130p and compared to 2600k and 1100T. I don't have time to observe every result, and don't know whether it could reflect any problem.

    http://www.f-paper.com/?i708835-Phot...lation-testing
    Did I understand it correctly? They took Phenom II x6 overclocked it, gave it 1866Mhz ram and called it a FX-6110? Man that's even worse than OBR
    It's like simulating core 2 duo using pentium 4 :/

  8. #1433
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    BD module may have 20% better IPC than K10.5 core, because of better memory reordering, faster cache hierarchy, wider front end, beefer branch prediction, bigger L2 cache, and faster memory controller and L3 cache. Also, BD module can execute 2 ALU, 2 AGU, 2 intSSE and 2 fpSSE operations per cycle per thread. BD module is 4-issue design versus 3-issue K10.5. L1D cache is write trough and smaller than K10, but WT performance penality is compensated with WCC - Write Coalesce Cache. L1D is also 4 times smaller than K10.5, but it is 4-way associative, and 16K 4-way WT cache may have 92% hit rate, 64K 2-way L1D has a little better hit rate, something about 94%. But that isn't problem, because branch predictor is better, L2 is bigger and has better hit rate. 4-cycle use to load latency is hidden by 2-4 stage longer pipeline.
    BD pipeline is optimised for 15-20% higher frequency than K10 pipeline at same process node. Because of that, with turbo core 2, BD single thread performance may be much better than K10.5, and something on pair with Sandy Bridge.
    Multithread performance isn't that much better. It may be 35-45% better than six core thuban, and maybe little bit lower than 6-core SB-e and Westmere.
    FX8 has 8 small cores but SB-e has six fat, hyperthreaded cores, with at least 30-35% better IPC. However, FX8 with 4 modules may have low TDP and high frequency, much higher than six core i7.
    My assumption is that the 4 module BD have 10% better multithread performance with same thermal than 4-core SB, if we count on Amdahal's law, turbo core 2, memory bandwidth and latency and shared module resources. Single thread performance may be on pair with SB, with same thermal envelope.
    In comparision with Westmere, 4 module BD have 10-15% lower multithread performance. Maybe 10-core Komodo should outpace six core Westmere, and be on pair with SB-E.
    Here is my little study about prediction of BD performance.

    Quote Originally Posted by muziqaz View Post
    Did I understand it correctly? They took Phenom II x6 overclocked it, gave it 1866Mhz ram and called it a FX-6110? Man that's even worse than OBR
    It's like simulating core 2 duo using pentium 4 :/
    LOL!

    My predictions is based on math. Light multithreaded software has lower paralelisation, arround 0.7, and heavy multithreaded code has 0.95 paralelisation. FP intensive code also contain lot of integer code. Cinebench for example has 0.6 IPC for integer and 0.7 IPC for FP on K10 core. FPU is underutilised, because max for FP is 2 packed FP-SSE operations or 2x integer SSE and shuffle on K10 core. With BD module max. is 2 packed FP SSE or FMA + 2 integer SSE or FMA or 1 int SSE + 1 shuffle.
    Easily seen, FP core is underutilised with IPC=0.7. With 2 int threads FP IPC for both threads can go up to 2, but hardly more.
    Per thread FP IPC can be 0.8, and int IPC can be 0.7. That is 15-20% better per core IPC than K10. With 33% more cores, 15% better average IPC, and 15% higher frequency, overall multithread performance in FP intensive applications can be up to 55-60% better than Thuban. Also if IPC per core is same or little lower, because of shared resources, with higher frequency and more cores in such FP intensive applications performance can rise up to 40%, which is good.
    Last edited by drfedja; 08-08-2011 at 06:29 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  9. #1434
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    10% better in multithreading than SB 4c/8t sounds too low.
    you state that single thread perf will be much higher than thuban, but keep in mind that if its 10% faster clocks, 10% higher IPC (after the loss due to 2 cores in one module), and 33% more cores than thuban, it should be 60% faster than thuban in multithreading (think cb11.5)

    if you believe that 1 core vs 1 core of BD vs SB will be pretty close to each other, then why do you think 8 cores will struggle against 8 threads?
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  10. #1435
    Xtreme Cruncher
    Join Date
    Apr 2008
    Location
    Ohio
    Posts
    3,119
    They did more to BD then just increase clock speed. And i didn't see that they did with the turbo for the x6? Did they have it off?
    ~1~
    AMD Ryzen 9 3900X
    GigaByte X570 AORUS LITE
    Trident-Z 3200 CL14 16GB
    AMD Radeon VII
    ~2~
    AMD Ryzen ThreadRipper 2950x
    Asus Prime X399-A
    GSkill Flare-X 3200mhz, CAS14, 64GB
    AMD RX 5700 XT

  11. #1436
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by Manicdan View Post
    10% better in multithreading than SB 4c/8t sounds too low.
    you state that single thread perf will be much higher than thuban, but keep in mind that if its 10% faster clocks, 10% higher IPC (after the loss due to 2 cores in one module), and 33% more cores than thuban, it should be 60% faster than thuban in multithreading (think cb11.5)

    if you believe that 1 core vs 1 core of BD vs SB will be pretty close to each other, then why do you think 8 cores will struggle against 8 threads?
    Depends of type of workload. If module utilisation is high, difference will be lower. 8-core BD has only 4 FPU units.
    Compare to Thuban, IPC per module could be 20% higher, but IPC per module may be equal, or little bit higher, something arround 10%. Threads doesn't scale linear to core count, because of Amdahal's law.
    For 33% more cores due to Amdahal's law, and CB paralelisation of 95%, with same IPC cores you can squezee only 23.5% more performance. With core and thread count performance convergent to constant value.
    For 20% IPC improvement for BD module vs K10 core, there could be 8-9% improvement for BD core IPC if we know that the BD core IPC = 0.9 BD module IPC per thread .
    Calculation for CB is 1.09(IPC)x1.1(frequency)x1.23(core scaling) = 1.47x Thuban 1100T. With lower paralelisation or higher module utilisation, difference could be lower, close to 35%.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  12. #1437
    Xtreme 3D Team
    Join Date
    Jan 2009
    Location
    Ohio
    Posts
    8,499
    Quote Originally Posted by muziqaz View Post
    Did I understand it correctly? They took Phenom II x6 overclocked it, gave it 1866Mhz ram and called it a FX-6110? Man that's even worse than OBR
    It's like simulating core 2 duo using pentium 4 :/
    X6 at 3.8 and FX-8130P at 6 cores 3.8.
    Smile

  13. #1438
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    BD has 8x 128bit FPUs, OR 4x 256bit FPUs

    funny how you come to 1.47x when you already factored in the inefficiencies, then drop it down to 35% just because?
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  14. #1439
    Xtreme Member
    Join Date
    Apr 2005
    Location
    London, UK
    Posts
    261
    Quote Originally Posted by BeepBeep2 View Post
    X6 at 3.8 and FX-8130P at 6 cores 3.8.
    So why not include 8130's performance at 3.8ghz?

  15. #1440
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    BD FPU can issue up to four instructions, but front end can issue maximum 4 instructions + branch fusion. With two threads, with high ILP, BD core can retire up to 2 instructions per cycle which in realiti can't go over 1.6-1.8. For example, Phenom II core can reach 2.4 IPC with Linpack of max. 3 IPC. BD module probably can reach up to 3.5-3.6 IPC with two threads, which is 1.7-1.8 IPC per thread. With such heavy workload, BD core can issue less instructions than K10.5 core. But, there is rather exception than rule. In that case 8 core BD or 4 modules in BD can retire up to 14.4 instructions per cycle. Phenom II X6 can retire same 14.4 IPC with six cores.
    For example: CB10, has 1.3 IPC on K10 core, but CB10 can reach 1.5 IPC with SB core. This is 50% faster than Phenom II X6 in such workload with 33% more cores.

    Quote Originally Posted by Manicdan
    BD has 8x 128bit FPUs, OR 4x 256bit FPUs

    funny how you come to 1.47x when you already factored in the inefficiencies, then drop it down to 35% just because?
    I was correct my calculations. There was a error. ~30% is in case of high ILP code, with IPC up to 2 per thread or ~50% difference with lower ILP.
    Attachment 118752

    I've done simulation sheet for six core 3.2 GHz 6120p 95W BD. It is 0-35% faster than Thuban 1100T. Ipc is little higher because of lower frequency. IPC doesn't scale linear with frequency increase.
    Attachment 118753
    Last edited by drfedja; 08-08-2011 at 10:05 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  16. #1441
    Xtreme Member
    Join Date
    Nov 2006
    Posts
    324
    Quote Originally Posted by undone View Post
    Man, every post that mentioned about THAT IDIOT had been deleted by admin in this forum. Don't waste your mind and resources to discuss anything about him.
    EDIT:
    Seems some guys emulate FX-6110 by using ES FX-8130p and compared to 2600k and 1100T. I don't have time to observe every result, and don't know whether it could reflect any problem.

    http://www.f-paper.com/?i708835-Phot...lation-testing
    Agreen on mentioning "unmentionable" )
    but that link is not better:

    Attachment 118755
    Windows 8.1
    Asus M4A87TD EVO + Phenom II X6 1055T @ 3900MHz + HD3850
    APUs

  17. #1442
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    I made a small comparison between Core duo, SB, K8, K10 and BD.
    https://public.sheet.zoho.com/publis...v-architekture
    If something is wrong just comment and I will correct the mistakes.

  18. #1443
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by TESKATLIPOKA View Post
    I made a small comparison between Core duo, SB, K8, K10 and BD.
    https://public.sheet.zoho.com/publis...v-architekture
    If something is wrong just comment and I will correct the mistakes.
    You may want to add L1, L2 and L3 cache speed to that chart. Nice graph.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  19. #1444
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    I don't know their cache speeds, just the latency.

  20. #1445
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    L1 load to use latency is 4 cycles. Branch mispredict latency is 16-cycles, which implies that integer pipeline is 16-stages long. Also some simple ALU operations has 2 cycles longer latency. BD pipeline must be 14-16 stages for integer and 19-21 cycles for FP.
    BD module has 2x 2ALU + 2AGLU.

    BD L1D can do 1x256 bit load (AVX), and 1x128-bit store at the same time, or 2x128-bit load and no one store, because BD core is limited to two memory operations because of 2 AGU.
    Other combinations are 2x128-bit load, 1x128-bit load and 1x128-bit store, 2x64-bit store.
    Like SB, BD L1D has data cache bandwidth of 384 bit/cycle in both directions.

    K10 L1D can do 2x128-bit load, 1x128-bit load + 1x64-bit store, 2x64-bit store effectively 256-bits/cycle.
    Last edited by drfedja; 08-09-2011 at 05:13 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  21. #1446
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    I will add the value 14-16 for pipeline at least until we won't know the real number.

    2x64-bit store effectively 256-bits/cycle.
    you meant 128-bits/cycle, right?

    You are right, I wrote double the amount of LS units, I will repair It right away.

    If anything else is wrong just say it.
    Last edited by TESKATLIPOKA; 08-09-2011 at 06:09 AM.

  22. #1447
    Xtreme Member
    Join Date
    Apr 2007
    Location
    Serbia
    Posts
    102
    Quote Originally Posted by TESKATLIPOKA View Post
    I will add the value 14-16 for pipeline at least until we won't know the real number.

    2x64-bit store effectively 256-bits/cycle.
    you meant 128-bits/cycle, right?
    I mean 2x64-bit store for 10h or 2x128-bit load. Because 10h can't execute AVX 256 instructions, it can load data in 128-bit chunks.
    This is 128-bits /cycle for stores, or 256-bits/cycle for loads.
    In the Bulldozer core(not module), there is 256-bit load + 128-bit store in the same time. With Bulldozer module there is double of that operations.
    Bulldozer core can calculate 2 adresses at same time because it has 2 AGU - adress generation units.

    Sandy core can do also 2 adress operations at once, because it has 2 L/S AGU. It has slightly different approach for store. SB store unit is attached to scheduler

    You are right, I wrote double the amount of LS units, I will repair It right away.

    If anything else is wrong just say it.
    Yes, per core it has 2 ALU and 2 AGU. I've made detail diagram for Bulldozer module, K10, Nehalem and of course of Sandy Bridge HT core architecture.
    Attachment 118765
    Last edited by drfedja; 08-09-2011 at 07:20 AM.
    "That which does not kill you only makes you stronger." ---Friedrich Nietzsche
    PCAXE

  23. #1448
    Xtreme Member
    Join Date
    Mar 2008
    Posts
    358
    BLAH BLAH BLAH ..................



  24. #1449
    Xtreme Member
    Join Date
    Jan 2011
    Location
    Slovakia
    Posts
    169
    drfedja great work, I love these diagrams. If you don't mind I will link them to another forum I frequently visit.

  25. #1450
    Xtreme Addict
    Join Date
    Dec 2007
    Location
    Hungary (EU)
    Posts
    1,376
    Quote Originally Posted by drfedja View Post
    I mean 2x64-bit store for 10h or 2x128-bit load. Because 10h can't execute AVX 256 instructions, it can load data in 128-bit chunks.
    This is 128-bits /cycle for stores, or 256-bits/cycle for loads.
    In the Bulldozer core(not module), there is 256-bit load + 128-bit store in the same time. With Bulldozer module there is double of that operations.
    Bulldozer core can calculate 2 adresses at same time because it has 2 AGU - adress generation units.

    Sandy core can do also 2 adress operations at once, because it has 2 L/S AGU. It has slightly different approach for store. SB store unit is attached to scheduler


    Yes, per core it has 2 ALU and 2 AGU. I've made detail diagram for Bulldozer module, K10, Nehalem and of course of Sandy Bridge HT core architecture.

    BTW you interpret the Address Generation Units as units for calculate linear addresses as well as INC/LEA values. The Optimization Guide refers them as simple integer exetution units, too (AGLU).

    Would you briefly explain what kind of operations can these units execute?

    Thanks
    -

Page 58 of 181 FirstFirst ... 8485556575859606168108158 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •