From the link, the first paragraph:
Dirk Meyer, chief executive of Advanced Micro Devices, said Friday the chipmaker plans to "ramp up" production of next-generation 32-nanometer processors in the middle of next year with volume production starting in the fourth quarter.
More information from GloFo slides: http://phx.corporate-ir.net/External...xUeXBlPTM=&t=1
The left side of the bar represents RISK production (!). This is the time that you stop running test vehicles through a process and start running the first customer designs. Meaning customers get back first silicon samples. So *that* is what GloFo is promising AMD for Q3 2010. First 32nm silicon. No wonder all the 32 products are listed as 2011.
See the 6th slide.
Adobe is working on Flash Player support for 64-bit platforms as part of our ongoing commitment to the cross-platform compatibility of Flash Player. We expect to provide native support for 64-bit platforms in an upcoming release of Flash Player following the release of Flash Player 10.1.
I believe the bold part is wrong.First silicon samples are not risk production.First silicon samples are being done now(test wafers) and Bergam said first BD samples will be early next year.So by the end of the next year they will have EVT samples ready and H1 2011 is a logical target for launch (both server and client).
Like i said a few posts back,Bergam let it slip that BD cores may already be taped out or set to tape out soon:
http://www.semiaccurate.com/2009/11/...rth-has-moved/
Bulldozer has taped out, the earth has moved
More analyst day dirt dug up
by Charlie Demerjian
November 11, 2009
THREE VERY INTERESTING tidbits snuck out in the Q&A session at the AMD analyst day today. It seems that Fusion and the new cores have taped out and are at the fabs.
The new cores were said to begin sampling to OEMs in 2010. When pressed on the timing of tapeouts, one AMD spokesperson said that the fabs were 'running product now'. That means the chips have taped out and the fun is about to begin.
Next up was the process the Fusion cores will be on. The first of them will be made on a silicon-on-insulator (SOI) process, something that makes a lot of sense. It is much easier to port a GPU from bulk silicon to SOI than to do things the other way around. The answer did not preclude bulk silicon variants of Fusion in the future, but since the first generation cores are not made on it, I would not expect that to happen for a while.
The last bit was confirmation of what we have know, or at least have strongly suspected for a while, that the first generation of Fusion products will be a 'stars' core. The optimistic view of this is that AMD is reusing the old K10 variant for time to market reasons. Basically the uncore was done first, and since it is modular, why not use it?
If you are pessimistic, you could see this as the Bulldozer and Bobcat cores being massively late. Given that they were on the roadmap for 45nm and delayed about 2 years ago to 32nm, this has a ring of truth to it. Because it was a planned move, and one that rationalizes a likely untenable earlier schedule, I don't think this is a delay, or even a bad thing. The 'delay' probably avoided another "Barcelona".
In the end, it looks like AMD is on track. 2010 will likely be full of pain, but you can finally see the light at the end of the tunnel. The first of the new parts have taped out, so it is only a matter of time before details start leaking. Then we will know if the grand plan is working, at least on a technical level.S|A
uhn... not to make anyone angry... but why does that remind me of Dunnington...
hmm on second thought, its quite different, a extra scheduler for fp shared between two cores seems interesting, but the basic setup still reassambles the dunnington scheme quite strong (2 cores share 2nd levle cache and then 4+ cores share 3rd levle cache).
Dunno,why?In what regard? It doesn't remind me of dunnington at all
The design approach like this has never been tried before(shared front end with hardware mutithreding based on sharing some parts of the core,namely integer execution units,while having SMT like FPU/SIMD which will be dual thread capable a la SMT and 256b wide;shared L1I cache among int clusters,globally shared L2 and L3;AVX support ;advanced turbo clock/boost features for individual parts within a "core" etc.).
Dunningtion is basically monolithic design based on 3 merged Penryn dice with huge L3 to alleviate the FSB bottleneck(which is still there...).Nothing novel uarchitecture wise.
The BD design approach has only been discussed in several academic papers thus far,and in Fred Weber's (AMD) presentation from 2005. No company has ever made a MPU based on this multithread design.It's a risky move but they seem pretty confident in the abilities of the new core.
Last edited by informal; 11-11-2009 at 03:43 PM.
Editied original post, again first impression was -> wth looks like dunnington (cache wise). I corrected that.
The core has an interesstening concept, but no 1st levle cache for FP seems quite risky, if the 2nd level cache isn't fast enough.
do you even read any post beyoned the first sentence?
SSE will be serviced directly from shared L2 cache over double width L2 bus(when compared to today's 128b L2).The possible L1 bottleneck would be alleviated this way since FP/SSE stuff would be serviced over another bus from 2nd level shared cache ,leaving L1I and L1D for two 2-way int clusters exclusively .
Didn't see these two posted yet:
![]()
Faceman![]()
Face,thanks for posting the slides from pdf. I didn't have the time to upload these,these are good addition to the discussion.Only you should have posted it in other (official) thread too
.
I'd expect they did their jobs well and addressed all the potential bottlenecks as good as they possibly could. The FPU units will rely on the L2 cache heavily so I'd expect it will be low latency.
Last edited by informal; 11-11-2009 at 04:18 PM.
I've added Face's post in the "official" thread: http://www.xtremesystems.org/forums/...d.php?t=238702
please let's all go there!![]()
Adobe is working on Flash Player support for 64-bit platforms as part of our ongoing commitment to the cross-platform compatibility of Flash Player. We expect to provide native support for 64-bit platforms in an upcoming release of Flash Player following the release of Flash Player 10.1.
AnandTech said that the FPU will use both L1 caches.
http://www.anandtech.com/cpuchipsets...oc.aspx?i=3674 (just below the second image)
I have one doubt about this core. Will it be able to greatly improve single threaded performance?
"When in doubt, C-4!" -- Jamie Hyneman
Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |
why would that be important in year 2011? He'll who cares about single thread performance now?
But that's legit question, so let's move it on the "official" thread: http://www.xtremesystems.org/forums/...d.php?t=238702
![]()
Adobe is working on Flash Player support for 64-bit platforms as part of our ongoing commitment to the cross-platform compatibility of Flash Player. We expect to provide native support for 64-bit platforms in an upcoming release of Flash Player following the release of Flash Player 10.1.
Actually it uses only the L1I cache,but yeah it is a shared resource too.Good catch.
As for the single-thread question, the module looks like it was designed to improve single-thread performance by utilizing 2x2way execution units,making it a 4 way capable design. Each of the 2 way cores would deliver 90% of today's Phenom 3-way execution strength and it's proven fact that going beyond 2 way in classical sense(simple decoding expansion) yields very small gains:i remember a figure of 10% improvement for 2way->3 way,can't recall the number for going 4 way but it ain't high at all.
So you'll have 2 2way integer cores trying to extract maximum performance from both serial and parallel code and dual threaded SIMD unit will try to extract highest ILP level from simd code.
Edit(on the topic of throughput performance): John Fruehe from AMD said that engineers told him that adding the second integer core to the one BD module costs only 5% of extra die area but gains 80% more throughput performance!Talk about the efficiency gain. And we know that one "integer" core within a module is projected to be at 90% of Athlon IIs(mainstream MPU today) performance as stated in Bobcat presentation slides.
Last edited by informal; 11-11-2009 at 05:58 PM.
Bulldozer is very impressive. Cant wait for Bulldozer vs sandy bridge/ivy bridge benchmarks. Hopefully bulldozer smokes SB so that intel fast tracks its future releases. Currently they are in cruise control and releasing cpus at a relaxed pace.
Bookmarks