Page 23 of 23 FirstFirst ... 1320212223
Results 551 to 560 of 560

Thread: AMD Bulldozer Thread

  1. #551
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Hans de Vries View Post
    The Open64 compiler produces up to 25% faster code as Intel's latest version 12 compilers even
    though the intentionally crippled results submitted by Intel run on a 40% higher clocked Bulldozer.....


    Open64 4.2.5.2 Compiler suite: (SPEC results submitted by Dell)
    2.6 GHz Bulldozer: SPEC_int_rate 134, SPEC_FP_rate 100

    Intel Studio XE 12.0.3.176 compilers: (SPEC results submitted by Intel)
    3.6 GHz Bulldozer: SPEC_int_rate 115, SPEC_FP_rate 79.8

    http://www.spec.org/cpu2006/results/res2011q4/

    Hans
    I'm sorry, but I couldn't find your DELL result. It looks to me that you looked for highest scores and divided by 4 to get the same number of cores. Example :

    For FP, you got the 100 from here : http://www.spec.org/cpu2006/results/...107-18771.html
    For INT, you got the 134 from here : http://www.spec.org/cpu2006/results/...107-18768.html

    Also you quoted peak values, instead of base values :

    DELL Opteron 6276 2.6GHz SpecInt_rate/FP_rate : 117 / 93
    Intel FX 3.6GHz SpecInt_rate/FP_rate : 106 / 79

    Are the systems really comparable ? A desktop system with 8GB RAM running Windows 7 vs. a 128GB server running Linux Red Hat 6.1 ?
    Apart from the HW differences, it is kind of expected that AMD's inhouse compiler (?) produces better results than ICC which probably if oblivious to BD's existence. Without knowing BD's caveats and new instructions set ( XOP, FMA ) the scores are not unexpected.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  2. #552
    Xtreme Enthusiast
    Join Date
    Aug 2005
    Posts
    519
    I tested Bulldozer yesterday for some of my h264 compression real-world duties.. X6 is way better. I suspect task scheduling and process/core assignment has to do with how big the difference is, but I have no problem with Intel's HT whatsoever.

    I think AMD would have been better off with adding two cores on X6 and improving it a bit. This 'work in progress' stuff is not good..
    2x Dual E5 2670, 32 GB, Transcend SSD 256 GB, 2xSeagate Constellation ES 2TB, 1KW PSU
    HP Envy 17" - i7 2630 QM, HD6850, 8 GB.
    i7 3770, GF 650, 8 GB, Transcend SSD 256 GB, 6x3 TB. 850W PSU

  3. #553
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by Hans de Vries View Post
    The Open64 compiler produces up to 25% faster code as Intel's latest version 12 compilers even
    though the intentionally crippled results submitted by Intel run on a 40% higher clocked Bulldozer.....


    Open64 4.2.5.2 Compiler suite: (SPEC results submitted by Dell)
    2.6 GHz Bulldozer: SPEC_int_rate 134, SPEC_FP_rate 100

    Intel Studio XE 12.0.3.176 compilers: (SPEC results submitted by Intel)
    3.6 GHz Bulldozer: SPEC_int_rate 115, SPEC_FP_rate 79.8

    http://www.spec.org/cpu2006/results/res2011q4/

    Hans
    Andreas Stiller (ct mag) wrote in his article, that the Intel 12.1 compilers create ~25% faster code (SPECfp_rate2006) compared to 12.0 while still using SSE3. AVX256 doesn't help much. AVX128 might show a better performance by using the 3 operand format (although FP moves are free). FMA4 is not being used as everyone would expect.

    On i7-2600K the 12.1 compilers create ~9% faster code in SPECfp vs. 12.0:

    Intel Compiler 12.1 results for i7-2600K

    Intel Compiler 12.0 results for i7-2600K

    Patching the GenuineIntel string and processorfamily in the SPECint executables resulted in a 45% boost in libquantum and ~20% in Xalancmbk according to him.
    Last edited by Dresdenboy; 12-09-2011 at 01:56 AM.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  4. #554
    Xtreme Member
    Join Date
    Sep 2010
    Posts
    139
    Quote Originally Posted by Dresdenboy View Post
    Andreas Stiller (ct mag) wrote in his article, that the Intel 12.1 compilers create ~25% faster code (SPECfp_rate2006) compared to 12.0 while still using SSE3. AVX256 doesn't help much. AVX128 might show a better performance by using the 3 operand format (although FP moves are free). FMA4 is not being used as everyone would expect.

    On i7-2600K the 12.1 compilers create ~9% faster code in SPECfp vs. 12.0:

    Intel Compiler 12.1 results for i7-2600K

    Intel Compiler 12.0 results for i7-2600K

    Patching the GenuineIntel string and processorfamily in the SPECint executables resulted in a 45% boost in libquantum and ~20% in Xalancmbk according to him.
    The dual quad core opterons 6204 i think (posted on semiaccurate) does blow the Zambezini score out of the water in the SPEC_rate scores though.
    about 50% more in integer_rate and close to 100% in fp rate. (there is no non rate score of those).
    This gives it a 50% advantage in SPEC FP_rate compared to the 2700K and equalling the SPEC INT_rate of the same 2700K.

    So it clearly does have an impact. (or the submitted score of zambezini is crippled... or the opteron score is rigged..)
    Last edited by flyck; 12-09-2011 at 06:57 AM.

  5. #555
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by flyck View Post
    The dual quad core opterons 6204 i think (posted on semiaccurate) does blow the Zambezini score out of the water in the SPEC_rate scores though.
    about 50% more in integer_rate and close to 100% in fp rate. (there is no non rate score of those).
    This gives it a 50% advantage in SPEC FP_rate compared to the 2700K and equalling the SPEC INT_rate of the same 2700K.

    So it clearly does have an impact. (or the submitted score of zambezini is crippled... or the opteron score is rigged..)
    The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
    Zambezi:
    4M/8C
    4x2MB L2
    1x8MB L3
    2xDDR3

    Opteron 6204:
    2x1M/2C
    2x2MB L2
    2x8MB L3
    8xDDR3
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  6. #556
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by Dresdenboy View Post
    The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
    Zambezi:
    4M/8C
    4x2MB L2
    1x8MB L3
    2xDDR3

    Opteron 6204:
    2x1M/2C
    2x2MB L2
    2x8MB L3
    8xDDR3
    I don't believe that's a 2 module chip it scores far better then a two module would.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  7. #557
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by Dresdenboy View Post
    The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
    Zambezi:
    4M/8C
    4x2MB L2
    1x8MB L3
    2xDDR3

    Opteron 6204:
    2x1M/2C
    2x2MB L2
    2x8MB L3
    8xDDR3
    I don't believe that's a 2 module chip it scores far better then a two module would.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  8. #558
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    the numbers seem off too

    2x 1M/2C
    but then also 2x8MB L3
    that means its an MCM of 1/4 chips
    2500k @ 4900mhz - Asus Maxiums IV Gene Z - Swiftech Apogee LP
    GTX 680 @ +170 (1267mhz) / +300 (3305mhz) - EK 680 FC EN/Acteal
    Swiftech MCR320 Drive @ 1300rpms - 3x GT 1850s @ 1150rpms
    XS Build Log for: My Latest Custom Case

  9. #559
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by demonkevy666 View Post
    I don't believe that's a 2 module chip it scores far better then a two module would.
    I said, that it is different.

    Edit/Addendum:

    6200 series: G34 socket with 4 memory channels.
    L3 is given as 16MB on this site:
    http://www.amd.com/de/products/serve...l-numbers.aspx
    So this means 2 dies.

    It looks like it even doesn't have turbo mode. So it might be there to do some specific high frequency trading tasks like pattern matching of tons of data (tick data). Memory throughput is as important as latency then.
    Last edited by Dresdenboy; 12-10-2011 at 08:03 AM.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  10. #560
    Xtreme Member
    Join Date
    Sep 2008
    Location
    Italy
    Posts
    246
    http://www.xtremehardware.com/

    The Best Scene of Hardware In Italy

    Follow Us And Add Me http://www.facebook.com/?ref=tn_tnmn...00003658778509

Page 23 of 23 FirstFirst ... 1320212223

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •