^^ in the above quote there is a chance it's 2x 4-way int clusters instead of DDboy's speculation about 2x2-way since AMD lists 4 "pipes" in the BD module diagram. But i have no idea if these are simple or complex instructions mentioned there. In patents there is a mention of possible total of 8(eight!) instructions being executed in parallel (due to ability to execute additional 4 fastpath ones in the same clock cycle)
Last edited by informal; 11-11-2009 at 01:10 PM.
I just checked again and it is 4 way execution indeed with 2x2way clusters within one module(CPU core) and these two are sharing one wide(256b) SIMD unit.The front end for 4x4way would be way to much complex and expensive ,at least for this generation of products.But still is an option for future iterations of this (previously) unseen design approach. Fastpath comment still stands(even more so now) since 4 fastpath above 4 complex instructions give us precisely total of 8 instructions in one cycle,as dresdenboy found out in his research.
What is amazing is level of detail he "guessed",he has been correct in almost every part of his speculations.I remember Savantu and his bashing against ddboy's blog,how it is just pure wishfull thinking and imagination,how semi companies patent useless stuff all the time etc. Looks like he is this year's honorable bunnysuit winner.
Yes for now,but it is mini-revolution in 2011. The approach is novel and needs to be applauded since it's a brave move from AMD.
CMT was all paper for years now,there is academic research papers but not 1 firm ever even presented a possible design solution. The design is much more potent than half-threading(SMT in intel's way of doing things),since resource sharing is done much better in hardware(via common front end and separate int execution units that can share data and one shared dual threaded SIMD unit-a best of both worlds approach). How will it work in practice we'll have to wait and see,but AMD stated that one small bobcat core(based on smae bulldozer) is at the 90% level of today's mainstream performance ,all with that very low power draw .
edit: let's not forget Hans de Vries and his chip-architect website which detailed this very same approach 7 years ago(IIRC). This was the original Hammer design,not the sledgehammer aka K8 which AMD launched back in 2003(not to say K8 wasn't good,quite opposite). Back in those days Hans presented a possible future core from AMD that resembles exactly what dredenboy depicted in his diagrams and what AMD presented today.
Last edited by informal; 11-11-2009 at 01:38 PM.
if i understand well, 2 core shares 8 int pipelines. So in a dual core with a dual threaded apply you have up to 8 int/clock.
And on same processor, with a monothread apply you can have up to 8 int/clock, because it's shared on 2 cores.
On a Quad, with a multithreaded bench with 4 thread you can have up to 16int/clock, and with only 2 thread you can have up 16int/clock if the "good cores" are used. If only one thread 8/clock.
Phenom II is based on athlon with only 3/clock/core.
The performance increase could be amazing if they increase L3 to fetch that monster.
Madcho you are mixing some things up.You need to reread the webcast and look again at dredenboy's blog.
Anyhow,Charlie D. has a new dirty tidbit:
http://www.semiaccurate.com/2009/11/...rth-has-moved/
Bulldozer has taped out, the earth has moved
More analyst day dirt dug up
by Charlie Demerjian
November 11, 2009
THREE VERY INTERESTING tidbits snuck out in the Q&A session at the AMD analyst day today. It seems that Fusion and the new cores have taped out and are at the fabs.
The new cores were said to begin sampling to OEMs in 2010. When pressed on the timing of tapeouts, one AMD spokesperson said that the fabs were 'running product now'. That means the chips have taped out and the fun is about to begin.
Next up was the process the Fusion cores will be on. The first of them will be made on a silicon-on-insulator (SOI) process, something that makes a lot of sense. It is much easier to port a GPU from bulk silicon to SOI than to do things the other way around. The answer did not preclude bulk silicon variants of Fusion in the future, but since the first generation cores are not made on it, I would not expect that to happen for a while.
The last bit was confirmation of what we have know, or at least have strongly suspected for a while, that the first generation of Fusion products will be a 'stars' core. The optimistic view of this is that AMD is reusing the old K10 variant for time to market reasons. Basically the uncore was done first, and since it is modular, why not use it?
If you are pessimistic, you could see this as the Bulldozer and Bobcat cores being massively late. Given that they were on the roadmap for 45nm and delayed about 2 years ago to 32nm, this has a ring of truth to it. Because it was a planned move, and one that rationalizes a likely untenable earlier schedule, I don't think this is a delay, or even a bad thing. The 'delay' probably avoided another "Barcelona".
In the end, it looks like AMD is on track. 2010 will likely be full of pain, but you can finally see the light at the end of the tunnel. The first of the new parts have taped out, so it is only a matter of time before details start leaking. Then we will know if the grand plan is working, at least on a technical level.S|A
Last edited by informal; 11-12-2009 at 05:52 AM.
I've updated my blog regarding Bulldozer's FMAC units.
The information provided during the Analyst Day simply was not enough to satisfy me (and maybe most of us)![]()
Thanks for the update dresdenboy!
It's amazing how many things "you got right". I still remember some skeptic intel fans(savantu, where art thou?) who claimed that your patent based research would not be successful at all since companies "patent all kinds of stuff daily" and bulldozer you predicted was some wishful thinking.We all know how that turned out
.
Very interesting find on the fmac possible structure(especially that not-so-confidential-anymore paper).
That rules out some sort of micro-op fusion like the core architecture has?This way no [instruction] fusion of FADDs and FMULs (in todays code) is necessary, which would have not only added complexity in the decoders but would only work for certain combinations.
So, "in english for the rest us", how much performance will BD have over Phenom II; roughly?
no one here has any idea.
Fast computers breed slow, lazy programmers
The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
http://www.lighterra.com/papers/modernmicroprocessors/
Modern Ram, makes an old overclocker miss BH-5 and the fun it was
it will for sure have i7 (more probably) power. but i think it is really up in the air. from what i understand, this design is very ....different/new, because of this; it is hard to tell what type of power it will yield? any of the gurus care to correct me?
[MOBO] Asus CrossHair Formula 5 AM3+
[GPU] ATI 6970 x2 Crossfire 2Gb
[RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
[CPU] AMD FX-8120 @ 4.8 ghz
[COOLER] XSPC Rasa 750 RS360 WaterCooling
[OS] Windows 8 x64 Enterprise
[HDD] OCZ Vertex 3 120GB SSD
[AUDIO] Logitech S-220 17 Watts 2.1
Main Rig:
Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
Graphic Card:XFX RX 580 4GB
Power Supply Unit:FSP AURUM 92+ Series PT-650M
Storage Unit:Crucial MX 500 240GB SATA III SSD
Processor Heatsink Fan:AMD Wraith Spire RGB
Chasis:Thermaltake Level 10GTS Black
at very least 50% over Phenom II, because AMD's engineers know very well, if the minimum 50% couldn't be achieved; it would be doomed, as Intel will be launching new architecture to counter BullDozer's architecture.
From the paper, it is very clear that BullDozer is going to be benefit from the new design in terms of power dissipation and much higher IPC in ALU and FPU. Hopefully BullDozer could make use of build in GPU to do much of the FPU intensive job.
Expected to be about 80%~100% over current Phenom II in certain area like encoding and ALU, overall is 60%.
Main Rig:
Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
Graphic Card:XFX RX 580 4GB
Power Supply Unit:FSP AURUM 92+ Series PT-650M
Storage Unit:Crucial MX 500 240GB SATA III SSD
Processor Heatsink Fan:AMD Wraith Spire RGB
Chasis:Thermaltake Level 10GTS Black
I don't think that's accurate. Since CMT isn't something they're likely to just tack on at the end and AMD is likely to be experimenting with pieces on silicon at this point, I think it's rather more likely that it isn't just some neat concept paper. At the very least, its physical implementation has probably been designed.
Particle's First Rule of Online Technical Discussion:
As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.
Rule 1A:
Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.
Rule 2:
When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.
Rule 2A:
When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.
Rule 3:
When it comes to computer news, 70% of Internet rumors are outright fabricated, 20% are inaccurate enough to simply be discarded, and about 10% are based in reality. Grains of salt--become familiar with them.
Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!
Random Tip o' the Whatever
You just can't win. If your product offers feature A instead of B, people will moan how A is stupid and it didn't offer B. If your product offers B instead of A, they'll likewise complain and rant about how anyone's retarded cousin could figure out A is what the market wants.
Particle is correct since Mr Bergman stated in the Q&A session of the Analyst day that they are twiddling around with the first samples at this moment in time and that they will be shipping the product to their partners (for evaluation and testing purposes ) in first half of 2010,just by the time the whole range of Magny Cours and Lisbon product is launched.
Didn't AMD say SMT was nothing for them and they focused on CMP?Originally Posted by http://www.sun.com/processors/throughput/faqs.html#5
At that time of spoke, there were less than 0.1% of software supporting this and VMwares are only used on servers
Now, VMwares are entering desktop level and more and more softwares are taking the advantage of multi-core and multi-threading.
Things change and so do trend, Intel once thought their CPU would reach 10GHz in a few years. Aren't they were right at that time of speaking??
Do not just take a paragraph out of context
Main Rig:
Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
Graphic Card:XFX RX 580 4GB
Power Supply Unit:FSP AURUM 92+ Series PT-650M
Storage Unit:Crucial MX 500 240GB SATA III SSD
Processor Heatsink Fan:AMD Wraith Spire RGB
Chasis:Thermaltake Level 10GTS Black
http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3• Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
• Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
• The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
• L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
• Up to 4 modules share a L3-cache and Northbridge
• Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.
very interesting article, who was asking about L1 instruction and CMP related info. Lastly SPECInt_rate hehe i am too tired to use the calculator some one put that percentages in numerical value.
Coming Soon
First of all we don't know the clocks of the Interlagos ATM.Second,there will also be 2P version of 16 core variant(4 modules/8cores in MCM via direct connect 2.0 resulting in 16 cores within a single MPU;4 DDR3 channels) .That one will have massive int/fp rate results. And judging by the latest Dredenboy's blog about the actual implementation of the FMAC units(bridged as described in patents),the fp/sse part will be brutally strong..
Last edited by informal; 11-24-2009 at 01:09 PM.
Bookmarks