ALL HAIL DRESDENBOY!
Hyperthreading = PWNT
Printable View
^^ in the above quote there is a chance it's 2x 4-way int clusters instead of DDboy's speculation about 2x2-way since AMD lists 4 "pipes" in the BD module diagram. But i have no idea if these are simple or complex instructions mentioned there. In patents there is a mention of possible total of 8(eight!) instructions being executed in parallel (due to ability to execute additional 4 fastpath ones in the same clock cycle)
Once again I say:
AMD = http://3.bp.blogspot.com/_0vwWXTM55N....FullOfWin.jpg
I just checked again and it is 4 way execution indeed with 2x2way clusters within one module(CPU core) and these two are sharing one wide(256b) SIMD unit.The front end for 4x4way would be way to much complex and expensive ,at least for this generation of products.But still is an option for future iterations of this (previously) unseen design approach. Fastpath comment still stands(even more so now) since 4 fastpath above 4 complex instructions give us precisely total of 8 instructions in one cycle,as dresdenboy found out in his research.
What is amazing is level of detail he "guessed",he has been correct in almost every part of his speculations.I remember Savantu and his bashing against ddboy's blog,how it is just pure wishfull thinking and imagination,how semi companies patent useless stuff all the time etc. Looks like he is this year's honorable bunnysuit winner :p:.
Yes for now,but it is mini-revolution in 2011 :). The approach is novel and needs to be applauded since it's a brave move from AMD.
CMT was all paper for years now,there is academic research papers but not 1 firm ever even presented a possible design solution. The design is much more potent than half-threading(SMT in intel's way of doing things),since resource sharing is done much better in hardware(via common front end and separate int execution units that can share data and one shared dual threaded SIMD unit-a best of both worlds approach). How will it work in practice we'll have to wait and see,but AMD stated that one small bobcat core(based on smae bulldozer) is at the 90% level of today's mainstream performance ,all with that very low power draw .
edit: let's not forget Hans de Vries and his chip-architect website which detailed this very same approach 7 years ago(IIRC). This was the original Hammer design,not the sledgehammer aka K8 which AMD launched back in 2003(not to say K8 wasn't good,quite opposite). Back in those days Hans presented a possible future core from AMD that resembles exactly what dredenboy depicted in his diagrams and what AMD presented today :).
if i understand well, 2 core shares 8 int pipelines. So in a dual core with a dual threaded apply you have up to 8 int/clock.
And on same processor, with a monothread apply you can have up to 8 int/clock, because it's shared on 2 cores.
On a Quad, with a multithreaded bench with 4 thread you can have up to 16int/clock, and with only 2 thread you can have up 16int/clock if the "good cores" are used. If only one thread 8/clock.
Phenom II is based on athlon with only 3/clock/core.
The performance increase could be amazing if they increase L3 to fetch that monster.
Madcho you are mixing some things up.You need to reread the webcast and look again at dredenboy's blog.
Anyhow,Charlie D. has a new dirty tidbit :D:
http://www.semiaccurate.com/2009/11/...rth-has-moved/
Quote:
Bulldozer has taped out, the earth has moved
More analyst day dirt dug up
by Charlie Demerjian
November 11, 2009
THREE VERY INTERESTING tidbits snuck out in the Q&A session at the AMD analyst day today. It seems that Fusion and the new cores have taped out and are at the fabs.
The new cores were said to begin sampling to OEMs in 2010. When pressed on the timing of tapeouts, one AMD spokesperson said that the fabs were 'running product now'. That means the chips have taped out and the fun is about to begin.
Next up was the process the Fusion cores will be on. The first of them will be made on a silicon-on-insulator (SOI) process, something that makes a lot of sense. It is much easier to port a GPU from bulk silicon to SOI than to do things the other way around. The answer did not preclude bulk silicon variants of Fusion in the future, but since the first generation cores are not made on it, I would not expect that to happen for a while.
The last bit was confirmation of what we have know, or at least have strongly suspected for a while, that the first generation of Fusion products will be a 'stars' core. The optimistic view of this is that AMD is reusing the old K10 variant for time to market reasons. Basically the uncore was done first, and since it is modular, why not use it?
If you are pessimistic, you could see this as the Bulldozer and Bobcat cores being massively late. Given that they were on the roadmap for 45nm and delayed about 2 years ago to 32nm, this has a ring of truth to it. Because it was a planned move, and one that rationalizes a likely untenable earlier schedule, I don't think this is a delay, or even a bad thing. The 'delay' probably avoided another "Barcelona".
In the end, it looks like AMD is on track. 2010 will likely be full of pain, but you can finally see the light at the end of the tunnel. The first of the new parts have taped out, so it is only a matter of time before details start leaking. Then we will know if the grand plan is working, at least on a technical level.S|A
I've updated my blog regarding Bulldozer's FMAC units.
The information provided during the Analyst Day simply was not enough to satisfy me (and maybe most of us) ;)
Thanks for the update dresdenboy! :up:
It's amazing how many things "you got right" :). I still remember some skeptic intel fans(savantu, where art thou?) who claimed that your patent based research would not be successful at all since companies "patent all kinds of stuff daily" and bulldozer you predicted was some wishful thinking.We all know how that turned out :D.
Very interesting find on the fmac possible structure(especially that not-so-confidential-anymore paper :)).
That rules out some sort of micro-op fusion like the core architecture has?Quote:
This way no [instruction] fusion of FADDs and FMULs (in todays code) is necessary, which would have not only added complexity in the decoders but would only work for certain combinations.
So, "in english for the rest us", how much performance will BD have over Phenom II; roughly?
no one here has any idea.
it will for sure have i7 (more probably) power. but i think it is really up in the air. from what i understand, this design is very ....different/new, because of this; it is hard to tell what type of power it will yield? any of the gurus care to correct me?
at very least 50% over Phenom II, because AMD's engineers know very well, if the minimum 50% couldn't be achieved; it would be doomed, as Intel will be launching new architecture to counter BullDozer's architecture.
From the paper, it is very clear that BullDozer is going to be benefit from the new design in terms of power dissipation and much higher IPC in ALU and FPU. Hopefully BullDozer could make use of build in GPU to do much of the FPU intensive job.
Expected to be about 80%~100% over current Phenom II in certain area like encoding and ALU, overall is 60%.
I don't think that's accurate. Since CMT isn't something they're likely to just tack on at the end and AMD is likely to be experimenting with pieces on silicon at this point, I think it's rather more likely that it isn't just some neat concept paper. At the very least, its physical implementation has probably been designed.
Particle is correct since Mr Bergman stated in the Q&A session of the Analyst day that they are twiddling around with the first samples at this moment in time and that they will be shipping the product to their partners (for evaluation and testing purposes ) in first half of 2010,just by the time the whole range of Magny Cours and Lisbon product is launched.
Didn't AMD say SMT was nothing for them and they focused on CMP?Quote:
Originally Posted by http://www.sun.com/processors/throughput/faqs.html#5
At that time of spoke, there were less than 0.1% of software supporting this and VMwares are only used on servers
Now, VMwares are entering desktop level and more and more softwares are taking the advantage of multi-core and multi-threading.
Things change and so do trend, Intel once thought their CPU would reach 10GHz in a few years. Aren't they were right at that time of speaking??
Do not just take a paragraph out of context
http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3Quote:
• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.
very interesting article, who was asking about L1 instruction and CMP related info. Lastly SPECInt_rate hehe i am too tired to use the calculator some one put that percentages in numerical value.
First of all we don't know the clocks of the Interlagos ATM.Second,there will also be 2P version of 16 core variant(4 modules/8cores in MCM via direct connect 2.0 resulting in 16 cores within a single MPU;4 DDR3 channels) .That one will have massive int/fp rate results. And judging by the latest Dredenboy's blog about the actual implementation of the FMAC units(bridged as described in patents),the fp/sse part will be brutally strong..