RiG1: Ryzen 7 1700 @4.0GHz 1.39V, Asus X370 Prime, G.Skill RipJaws 2x8GB 3200MHz CL14 Samsung B-die, TuL Vega 56 Stock, Samsung SS805 100GB SLC SDD (OS Drive) + 512GB Evo 850 SSD (2nd OS Drive) + 3TB Seagate + 1TB Seagate, BeQuiet PowerZone 1000W
RiG2: HTPC AMD A10-7850K APU, 2x8GB Kingstone HyperX 2400C12, AsRock FM2A88M Extreme4+, 128GB SSD + 640GB Samsung 7200, LG Blu-ray Recorder, Thermaltake BACH, Hiper 4M880 880W PSU
SmartPhone Samsung Galaxy S7 EDGE
XBONE paired with 55''Samsung LED 3D TV
@ Lightman
That was funny indeed![]()
Iron Lung 3.0 | Intel Core i7 6800k @ 4ghz | 32gb G.SKILL RIPJAW V DDR4-3200 @16-16-16-36 | ASUS ROG STRIX X99 GAMING + ASUS ROG GeForce GTX 1070 STRIX GAMING | Samsung 960 Pro 512GB + Samsung 840 EVO + 4TB HDD | 55" Samsung KS8000 + 30" Dell u3011 via Displayport - @ 6400x2160
http://www.anandtech.com/show/5176/a...unt-12b-not-2bThis is a bit unusual. I got an email from AMD PR this week asking me to correct the Bulldozer transistor count in our Sandy Bridge E review. The incorrect number, provided to me (and other reviewers) by AMD PR around 3 months ago was 2 billion transistors. The actual transistor count for Bulldozer is apparently 1.2 billion transistors. I don't have an explanation as to why the original number was wrong, just that the new number has been triple checked by my contact and is indeed right. The total die area for a 4-module/8-core Bulldozer remains correct at 315mm2.
Phenom II x6 1055T | ASRock 880G Ex.3 | 560Ti FrozrII 1GB| Corsair Vengeance 1600 2x4GB | Win7 64 | M4 128GB
VR Box - i5 6600 | MSI Mortar | Gigabyte G1 GTX 1060 | Viper 16GB DDR4 2400 | 256 SSD | Oculus Rift CV1 + Touch
The only thing with the number of transistors that bugs me is why it took them so long to realise they'd given out bad information. I mean seriously, is their QA so bad that even their counting/marketing is faulty?
The people in a big company might do very good work but this might get lost/reduced in quality due to processes and low quality decisions further up in the hierarchy.
I saw a review where the reviewer had an FX (retail) that had 24M of cache rather than 16M. Could another 8M of L3 be those missing transistors? I wouldn't know so just asking the question.
Could it also be that the server variants have the full compliment of cache as the speeds are slower so fit the TDP but the retail parts cannot fit inside the TDP with all that cache so some of it has been disabled?
The transistors might still be there but just not used in retail parts.
The marketing dept have then failed to make the rationalisation between server transistor count (used) and retail transistor count (used) which then compounded the 'so many transistors for so little performance' argument? If so, no wonder they got culled.
Just some musings from me here.
BSN* is reporting that allegedly AMD's CFO Thomas Seiffert has been let go yesterday... If true than it's one of those management decisions that make zero sense(like Killebrew,Moorhead,David Hoff, Rick Bergman,Dirk Meyer etc.). AMD's debt right now is around 1.8B while back in 2009 when he was appointed to his position it was ~7B.
Since Interlagos uses two of the same dies as the desktop variant and the die shot has been shown to the public, there would have been comments if there were more than the known 8MB of L3.
BTW, the transistor density of the modules (213M on 30.9 sqmm) is rather high (6.89M/sqmm) and the 1.2B number also seems to be a bit too low.
[MOBO] Asus CrossHair Formula 5 AM3+
[GPU] ATI 6970 x2 Crossfire 2Gb
[RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
[CPU] AMD FX-8120 @ 4.8 ghz
[COOLER] XSPC Rasa 750 RS360 WaterCooling
[OS] Windows 8 x64 Enterprise
[HDD] OCZ Vertex 3 120GB SSD
[AUDIO] Logitech S-220 17 Watts 2.1
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
The internet is full of people (besides google cache) creating quick copies:
http://investorvillage.com/smbd.asp?...g&mid=11217202
One hundred years from now It won't matter
What kind of car I drove What kind of house I lived in
How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
-- from "Within My Power" by Forest Witcraft
Since AMD did not want to submit Spec_INT/FP scores, Intel did it instead, just to rub salt on the wounds :
SPECint_base2006/SPECfp_base2006 (autoparallel=yes)
i7-2700k (3.5/3.9 GHz) 45.5 / 56.1
FX-8150 (3.6/4.2 GHz) 20.8 / 25.7
X6-1100T (3.3/3.7 GHz) 25.0 / 32.2
http://www.spec.org/cpu2006/results/res2011q4/
In the most widely used and accepted industry standard benchmark, it clearly shows how BD has significantly lower IPC than K10.5 despite a 300-500MHz advantage.
wowa.. I love bulldozers and this cpu is giving them a bad name, also making it harder for me to find awesome bulldozer media on the internet :p
[SIGPIC][/SIGPIC]
TJ07BW | i7 980x | Asus RIII | 12Gb Corsair Dominator | 2xSapphire 7950 vapor-x | WD640Gb / SG1.5TB | Corsair HX1000W | 360mm TFC Rad + Swiftech GTZ + MCP655 | Dell U2711
Autoparallel FAIL for ICC actually. It doesn't seem to work well for more than 4 cores (i7-3960X loses significantly to i7-2700K in lots of tests). They definitely should run it with OMP_NUM_THREADS=4 for FX8150 and set core affinity accordingly to "first cores in module". But it is Intel and I don't think they have intention to make AMD processor look good.
And comparing SSE3 code for AMD and AVX code for Intel is totally irrelevant.
It is Spec_INT, more or less single threaded. Autoparallel cannot offer significant speedups..
Why ? AVX and SSE have the same throughput for BD since it did not spend any transistors to optimize for AVX. The only question is whether having used FMA would have made a difference.And comparing SSE3 code for AMD and AVX code for Intel is totally irrelevant.
It offers both speedups (e.g. libquantum) and slowdowns(e.g. h264ref) depending on the subtest and hardware. To limit slowdowns Intel set number of threads to number of cores. For example slowdown of 3960X compared to 2700K in h264ref perfectly correlates with the fact that FX8150 loses the most in this subtest compared to 1100T.
What? You know there is a reason for 3-operand instructions. If you don't understand it from developers' viewpoint, at least you can relay on tests comparing SSE and AVX versions of x264 or other software.
And besides this ICC doesn't allow SSE instructions above SSE3 on AMD hardware so it is AVX (which is full superset of SSE instructions) for Intel vs SSE w/o SSSE3, SSE4.1 and SSE4.2 for AMD.
That's mostly with the compilers cracking libquantum ( known for a long time ). And what's the problem with setting the number of threads to the number of cores? The most fair aproach.
The difference between 128bit SSE and 128bit AVX is extremely limited even on SB. Constantly refering to speed ups of AVX ( 256bit ) vs. SSE on SB and extrapolating that to BD is faulty; SB executes 256bit instructions in a single cycle, BD breaks them into 2x128bit ones. I'll restate my point : BD AVX speedups are limited because it wasn't designed to perform due to time pressure, just to get it compatible. The difference between 128bit SSE and 128/256bit AVX on BD is going to be in the noise region ( actually 256bit AVX is discouraged since it incurs penalties in the breaking up and recombining phase ).What? You know there is a reason for 3-operand instructions. If you don't understand it from developers' viewpoint, at least you can relay on tests comparing SSE and AVX versions of x264 or other software.
And besides this ICC doesn't allow SSE instructions above SSE3 on AMD hardware so it is AVX (which is full superset of SSE instructions) for Intel vs SSE w/o SSSE3, SSE4.1 and SSE4.2 for AMD.
There is a risk of slowing application down if autoparallelization is done on hardware that has a lot of shared resources between cores due to the overhead that additional threads add. And for the record I do not blame Intel for doing this test that way as they are not supposed to know how to get the best performance from AMD's processor. I just say that better result on 8150 could be achieved when autoparallelization will be limited to 4 threads and thread affinity distributed between modules.
You are wrong. 3-operand SSE5 instructions that later appeared as 3-operand AVX versions of SSE instructions reduce number of registers needed in code, reduce latencies due to removal of unnecessary MOVs and reduce size of code. All of this allows to achieve higher utilization of FPUs functional units and increase performance w/o need to increase theoretical throughput of functional units. x264 shows pretty good example of such increased performance as vectInt throughput is the same in SB and yet there is quite substantial boost in performance on both SB and FX8150.
And as I said before not only ICC doesn't allow building AVX code for BD but also doesn't allow to use large part of SSEx instruction set.
If you want to compare single-threaded performance between FX8150, i7 2700 and PhIIX6 using ICC and SPECInt then you should turn off autoparallelization, build for SSE3 target and then compare resulting performance. Everything else is just marketing.
The Open64 compiler produces up to 25% faster code as Intel's latest version 12 compilers even
though the intentionally crippled results submitted by Intel run on a 40% higher clocked Bulldozer.....
Open64 4.2.5.2 Compiler suite: (SPEC results submitted by Dell)
2.6 GHz Bulldozer: SPEC_int_rate 134, SPEC_FP_rate 100
Intel Studio XE 12.0.3.176 compilers: (SPEC results submitted by Intel)
3.6 GHz Bulldozer: SPEC_int_rate 115, SPEC_FP_rate 79.8
http://www.spec.org/cpu2006/results/res2011q4/
Hans
Last edited by Hans de Vries; 12-08-2011 at 05:49 AM.
~~~~ http://www.chip-architect.org ~~~~ http://www.physics-quest.org ~~~~
Bookmarks