Seasonic Prime TX-850 Platinum | MSI X570 MEG Unify | Ryzen 5 5800X 2048SUS, TechN AM4 1/2" ID
32GB Viper Steel 4400, EK Monarch @3733/1866, 1.64v - 13-14-14-14-28-42-224-16-1T-56-0-0
WD SN850 1TB | Zotac Twin Edge 3070 @2055/1905, Alphacool Eisblock
2 x Aquacomputer D5 | Eisbecher Helix 250
EK-CoolStream XE 360 | Thermochill PA120.3 | 6 x Arctic P12
To make any estimations, guesses or expectations regarding launch dates a bit less prone to attacks they could simply be improved by adding an error margin (the scientific way). So if I'm expecting a CPU to launch in Q2, but am not that sure about it, I could either say first half or middle of that year or simply Q2 +/- one quarter. If it would finally be Q1, Q2 or Q3, I wouldn't have been wrong
Sure if someone is already expecting a certain month, then naming a quarter could be a way to add an error margin to it. But I don't think, that there are many posters besides John, which are already working with month granularity in their minds regarding BD or Llano.
@John:
Will there any microarchitectural updates on BD at the Financial Analyst Day?
I'm currently collecting data supporting an assumption, that BD might not use the same clock in different parts of a module. This seems to be an interesting concept.
JF posted a new thing on his blog :
http://blogs.amd.com/work/2010/10/25/the-new-flex-fp/
nothing really new, but interresting reading.
Damn, I need BD now. Blazing fast software rasterization here I come!
I think that with the FMA optimized software ,2P Interlagos will be one uber fast number crunching machine(compiler support should be there by the time it launches).
Matthias, there will be some analyst day updates. I expect possibly one or two nuggets that we can share. There is data in the slides, but it obviously needs to get through the final executive reviews.
This is financial analysts, not industry analysts, so don't expect any deep technical discussions, they are more interested in business discussions.
No prob. You could still write additional blogs like the one about Flex FP.Some interesting BD tech is still behind the curtain. But I can understand why you don't want to share info about it. Sometimes I already thought that by speculating into multiple directions I am actually camouflaging the microarchitecture instead of revealing it
I listed your Flex FP blog here with other neat stuff:
http://citavia.blog.de/2010/10/26/mo...lysis-9794436/
Last edited by Dresdenboy; 10-27-2010 at 02:35 PM.
I don't work for GF so I can't comment on them. Dirk continues to reiterate that BD is still on schedule.
sshhhhh! tis a secret
Well, for a Bulldozer module you have 3 schedulers, (I am going off memory right now), 2 integer schedulers at 40 entries each and an FP scheduler at 60 entries. So you have 140 total entries for 2 integer threads and 1 FP thread.
If you look at SB, they have 1 scheduler that has to handle 1 thread (1 hyperthread) and 1 FP. I believe they only have 54 entries on that scheduler (Based on Real World Tech article).
what about phenom II ?
It makes sence for bulldozer to have a bigger schedulers. According to Dresdenboy, bulldozer has longer instruction latencies so it is possible that each instruction would wait longer in the scheduler queue for available execution slot (depends on the actual instruction throughput). On the other side SB has 168 slots in uop reorder buffer vs. 128 slots in each bulldozer core. So depends how you're positioning bulldozer vs SB (core vs. core or module vs. core) it is a bit better or a bit worse. Also separate FP/INT schedulers is not something new for AMD. AMD sticked with this approach since K7. So there's no clear winner since Intel's unified scheduler has been proven as very effective.
AMD is switching to unified integer scheduler per core.This will allow greater flexibility when executing integer instructions compared to K8/10h.
As for instruction latencies,yes they are longer but BD has other ways of masking this,plus this is the hint about the possible high frequency targets for BD cores.
The number of scheduler entries also depends on the clock frequency target (FO4 pipeline stage delay). This could be a "knee of the curve" type optimization in Bulldozer: less FO4 increases latencies but limits scheduler entries.
OTOH an average scheduler size only influences overall performance by a small amount. Much more does it depend on branch prediction, memory prefetches (no data - nothing to schedule), mem subsystem efficiency, front end, etc.
You could try the simple scheduler simulation available on my blog. I'll soon update it with one having different instruction latencies.
Yep. And unified FP/INT scheduler may bring another few percent of perf (at least according to theoretical studies) but I guess AMD won't take this route because of future APU aproach (which aimed to replace traditional fpu).
Yep, and the bigger sheduler size is one of the ways to mask high latency of some instructions. But rather then make some prediction on performance by mentioning higher latency instruction (which by self means nothing without a "big picture"), my point was that shedulers of different architectures are not comparable just by their's size. Sometimes bigger does not mean better (just like P4 with it's frequency).As for instruction latencies,yes they are longer but BD has other ways of masking this,plus this is the hint about the possible high frequency targets for BD cores.
My "big picture" looks more like there will first be an updated BD architecture (BD2) before making bigger changes to the ľArch again. There currently is no need for making any circuits in the critical path prepared for some future changes. This would just cost time, performance, area, verification effort etc.
In this big picture we should also stop thinking in the old ways. Why does a scheduler have to be big and cover most stalls? In the past this was because there was a fixed power budget. Not making use of it while stalling simply reduced performance. Now you could during a stall switch off (clock gate) the core. This saves power. Saved power could be turned into work/performance at another place, even at another time. Overall performance not lost - scheduler window size ok.
Another question is, at which point it will be useful to bring shader-like computing resources closer to the GP (x86) core. One way could be to add APU resources as additional cluster(s), fed by a common front end. Then they would have their own scheduling etc. AMD has patents for decoding mixed ISA streams. So they could build a decoder, which directs decoded or even translated instruction packets to their appropriate targets. There could further be some instruction buffers/execution caches etc.
Last edited by Dresdenboy; 10-28-2010 at 06:30 AM.
[MOBO] Asus CrossHair Formula 5 AM3+
[GPU] ATI 6970 x2 Crossfire 2Gb
[RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
[CPU] AMD FX-8120 @ 4.8 ghz
[COOLER] XSPC Rasa 750 RS360 WaterCooling
[OS] Windows 8 x64 Enterprise
[HDD] OCZ Vertex 3 120GB SSD
[AUDIO] Logitech S-220 17 Watts 2.1
Bookmarks