AMD Bulldozer Thread

**savantu** · 12-08-2011, 07:34 AM

Originally Posted by Hans de Vries

The Open64 compiler produces up to 25% faster code as Intel's latest version 12 compilers even
though the intentionally crippled results submitted by Intel run on a 40% higher clocked Bulldozer.....

Open64 4.2.5.2 Compiler suite: (SPEC results submitted by Dell)
2.6 GHz Bulldozer: SPEC_int_rate 134, SPEC_FP_rate 100

Intel Studio XE 12.0.3.176 compilers: (SPEC results submitted by Intel)
3.6 GHz Bulldozer: SPEC_int_rate 115, SPEC_FP_rate 79.8

http://www.spec.org/cpu2006/results/res2011q4/

Hans

I'm sorry, but I couldn't find your DELL result. It looks to me that you looked for highest scores and divided by 4 to get the same number of cores. Example :

For FP, you got the 100 from here : http://www.spec.org/cpu2006/results/...107-18771.html
For INT, you got the 134 from here : http://www.spec.org/cpu2006/results/...107-18768.html

Also you quoted peak values, instead of base values :

DELL Opteron 6276 2.6GHz SpecInt_rate/FP_rate : 117 / 93
Intel FX 3.6GHz SpecInt_rate/FP_rate : 106 / 79

Are the systems really comparable ? A desktop system with 8GB RAM running Windows 7 vs. a 128GB server running Linux Red Hat 6.1 ?
Apart from the HW differences, it is kind of expected that AMD's inhouse compiler (?) produces better results than ICC which probably if oblivious to BD's existence. Without knowing BD's caveats and new instructions set ( XOP, FMA ) the scores are not unexpected.

**R101** · 12-09-2011, 01:16 AM

I tested Bulldozer yesterday for some of my h264 compression real-world duties.. X6 is way better. I suspect task scheduling and process/core assignment has to do with how big the difference is, but I have no problem with Intel's HT whatsoever.

I think AMD would have been better off with adding two cores on X6 and improving it a bit. This 'work in progress' stuff is not good..

**Dresdenboy** · 12-09-2011, 01:33 AM

Originally Posted by Hans de Vries

The Open64 compiler produces up to 25% faster code as Intel's latest version 12 compilers even
though the intentionally crippled results submitted by Intel run on a 40% higher clocked Bulldozer.....

Open64 4.2.5.2 Compiler suite: (SPEC results submitted by Dell)
2.6 GHz Bulldozer: SPEC_int_rate 134, SPEC_FP_rate 100

Intel Studio XE 12.0.3.176 compilers: (SPEC results submitted by Intel)
3.6 GHz Bulldozer: SPEC_int_rate 115, SPEC_FP_rate 79.8

http://www.spec.org/cpu2006/results/res2011q4/

Hans

Andreas Stiller (ct mag) wrote in his article, that the Intel 12.1 compilers create ~25% faster code (SPECfp_rate2006) compared to 12.0 while still using SSE3. AVX256 doesn't help much. AVX128 might show a better performance by using the 3 operand format (although FP moves are free). FMA4 is not being used as everyone would expect.

On i7-2600K the 12.1 compilers create ~9% faster code in SPECfp vs. 12.0:

Intel Compiler 12.1 results for i7-2600K

Intel Compiler 12.0 results for i7-2600K

Patching the GenuineIntel string and processorfamily in the SPECint executables resulted in a 45% boost in libquantum and ~20% in Xalancmbk according to him.

**flyck** · 12-09-2011, 06:55 AM

Originally Posted by Dresdenboy

Andreas Stiller (ct mag) wrote in his article, that the Intel 12.1 compilers create ~25% faster code (SPECfp_rate2006) compared to 12.0 while still using SSE3. AVX256 doesn't help much. AVX128 might show a better performance by using the 3 operand format (although FP moves are free). FMA4 is not being used as everyone would expect.

On i7-2600K the 12.1 compilers create ~9% faster code in SPECfp vs. 12.0:

Intel Compiler 12.1 results for i7-2600K

Intel Compiler 12.0 results for i7-2600K

Patching the GenuineIntel string and processorfamily in the SPECint executables resulted in a 45% boost in libquantum and ~20% in Xalancmbk according to him.

The dual quad core opterons 6204 i think (posted on semiaccurate) does blow the Zambezini score out of the water in the SPEC_rate scores though.
about 50% more in integer_rate and close to 100% in fp rate. (there is no non rate score of those).
This gives it a 50% advantage in SPEC FP_rate compared to the 2700K and equalling the SPEC INT_rate of the same 2700K.

So it clearly does have an impact. (or the submitted score of zambezini is crippled... or the opteron score is rigged..)

**Dresdenboy** · 12-09-2011, 07:15 AM

Originally Posted by flyck

The dual quad core opterons 6204 i think (posted on semiaccurate) does blow the Zambezini score out of the water in the SPEC_rate scores though.
about 50% more in integer_rate and close to 100% in fp rate. (there is no non rate score of those).
This gives it a 50% advantage in SPEC FP_rate compared to the 2700K and equalling the SPEC INT_rate of the same 2700K.

So it clearly does have an impact. (or the submitted score of zambezini is crippled... or the opteron score is rigged..)

The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
Zambezi:
4M/8C
4x2MB L2
1x8MB L3
2xDDR3

Opteron 6204:
2x1M/2C
2x2MB L2
2x8MB L3
8xDDR3

**demonkevy666** · 12-09-2011, 11:41 AM

Originally Posted by Dresdenboy

The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
Zambezi:
4M/8C
4x2MB L2
1x8MB L3
2xDDR3

Opteron 6204:
2x1M/2C
2x2MB L2
2x8MB L3
8xDDR3

I don't believe that's a 2 module chip it scores far better then a two module would.

**demonkevy666** · 12-09-2011, 11:42 AM

Originally Posted by Dresdenboy

The 6204 is a special model for specific data stuffing tasks. Clocks aside it has much more uncore stuff per 4 cores than Zambezi for 8 cores.
Zambezi:
4M/8C
4x2MB L2
1x8MB L3
2xDDR3

Opteron 6204:
2x1M/2C
2x2MB L2
2x8MB L3
8xDDR3

I don't believe that's a 2 module chip it scores far better then a two module would.

**Manicdan** · 12-09-2011, 11:51 AM

the numbers seem off too

2x 1M/2C
but then also 2x8MB L3
that means its an MCM of 1/4 chips

**Dresdenboy** · 12-09-2011, 03:17 PM

Originally Posted by demonkevy666

I don't believe that's a 2 module chip it scores far better then a two module would.

I said, that it is different.

Edit/Addendum:

6200 series: G34 socket with 4 memory channels.
L3 is given as 16MB on this site:
http://www.amd.com/de/products/serve...l-numbers.aspx
So this means 2 dies.

It looks like it even doesn't have turbo mode. So it might be there to do some specific high frequency trading tasks like pattern matching of tons of data (tick data). Memory throughput is as important as latency then.

**tom1** · 12-16-2011, 05:07 AM

other

http://www.xtremehardware.it/eng-rev...-201112156202/

Thread: AMD Bulldozer Thread

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions