Well its not like Giga would say its real, even if it was :)
But hard to say.. the coming weeks should be pretty interesting no matter what!
Printable View
hah someone edited the chat part of the picture LMAO
Even if it's fake, the very quick Super Pi time brought up a discussion at OCF:
Since each module has shared components, and the hardware does the splitting of resources, even if Super Pi is a single-threaded program, it will be split up between the two "cores" in the single module because the resource allocating is done via on-chip hardware, and not the software. ;) Which is like getting two cores to computer Super Pi, but it's still single-threaded.
the pic is real. BD is real. look again its up.
Everyone is wrong sometimes :D
I used Photoshop CS3 with a compression of 6, and while it's not as extreme, his might be an older or newer version. OR not even Photoshop for that matter.
I assume it abruptly stops like that because the aliasing of text results in small color changes, which can compress. Big empty white boxes are solid and so any compression doesn't stand out.[/theory]
Your "compression boxes" are not perfectly uniform like they are in the other shot... also look at your box that says DDR3...
That's why I made the disclaimer about what version PShop I used, and the fact the other pic may not ever have been saved in any version of PShop :\ Compare AIDA and Notepad though, you'll see the plain as day boxes around the text.
Also a huge difference is the Windows DPI is cranked up on his, which will impact the severity of the compression artifacting. I was going to change mine, but already changed themes for this endeavor haha
Again, I know mine isn't a perfect match, but it shows that it will at least artifact that way
One other thing, there are two fonts in CPU-Z:
http://i140.photobucket.com/albums/r...1/unledpjq.png
The HT in HT Link on CPU-Z is in the same blue as the field values, a mistake or very odd compression?
Wrong. One execution-unit can in singelthreaded mode use all resources of the front-end and deliver massive throughput. Also with aggressive Turbo mode, it will be even faster in singelthreaded workload.
No one has pointed out that Bulldozer can execute one thread along the two integer units.
If it does so anyways, then I'm very glad.
As I said, AMD is REALLY holding Bulldozer behind the gates.
Each integer unit has its own scheduler. They can't combine the two integer units to process one instruction or one thread.
The two 128-bit FPU's belong to each of the two cores in the module, but share a single scheduler.
Only certain 256-bit FP instructions can be spanned across the two 128-bit FPU's - I'd guess the OS will see it as being just one of the two cores in the module being used, even though the other core's FPU is busy, which would still allow integer instructions to run on that core.
Or would it be possible for it to simultaneously process (1x256-bit OR 2x128-bit) FP instructions as well as integer instructions on both cores?
http://www.rumorpedia.net/wp-content...dbulldozer.jpghttp://www.rumorpedia.net/wp-content...dbulldozer.jpg
here you go higher resolution. LOL but two active windows??? :shrug:
That's the Windows System page where you took the screen crop of (Control Panel\All Control Panel Items\System). The font is the same though, just looks a little different due to all the dissimilar characters. Compare the "m" though, in "memory" and "(tm)", you can see they are of the same size and font.
The HT color IS blue like the specs, as you pointed out. The font size is also down a point or two from non-GB version, but I suspect that was done to allow all of the lines in the Instructions to be displayed. The color I can only assume would be due to a coding goof by the programmer, but it is odd none-the-less.
Aside form that, the only other oddity I've spotted (on account of your post when I went to compare fonts) is the control panel path in the basic system info screen. Not much is able to be seen due to the IM message window, but what we CAN see is: The trailing point on a "▶", and "System and Se". Either it's "System and Settings" or "System and Search", neither of which are what my Win7 Ultimate show (at least the 32bit ver I've got on this laptop).
http://maxpain.lackeydom.net/System.PNG
:confused: That's the exact same picture, Skull lol Res, file size, even link (and filename as a result) :p:
What do you mean by "two active windows" though? CPU-Z, or?
the chat window and the cpu-z window have red X... so how can both be active windows at the same time?
Aye, good catch. That reminds me of the one I forgot to mention: I think there might be a setting deeper in Windows to disable the Aero Peek on the task bar windows (disabling it is just for the button on the far right side), but... If I hover over the thumbnail like he has, which is the only way to get the yellow balloon to pop up, it makes all the windows transparent (see pic attachment).
Also, there is another snafu, which falls in line with the dual-illuminated Red X issue (which hovering the cursor over an inactive window's X will make it red, but it also glows and looks different). The window shadowing actually indicates that THREE windows are 'active'! The MSN chat, CPU-Z, and then Notepad as well (but it's X is offscreen).
I'm beginning to think there is no such thing as the perfect fake on things such as this (where so much is unknown) when you have enough eyes :yepp:
EDIT: aha-ha DUHH Helps to at least TAKE the screenshot I was going to attach >_< *sigh* I resized it 50%, since no use in using a full-res shot. Here it is though...
Nicely spotted :up:
With regards to Formula350's image, I think it's worth pointing out that if you just hover the icon on your taskbar and not over the larger "thumbnail" that pops up, you won't activate the Aero Peek feature and will only see the thumbnail plus the little tooltip with the title of the window. With that being said, I've noticed something further that would be evidence against it being legitimate.
In the chat window, you'll notice that in the context menu in the chat window, "Video from Bing" is highlighted which means the mouse pointer would have to be hovering over it when the screenshot was taken. At the same time though, the "thumbnail" for the CPU-Z readme text file is active which means that the mouse pointer must be hovering over the taskbar. The mouse pointer must be in two places at once which can't be possible....
simply put, that chat window was added to the image
Oh nice one.:up:
Besides, two recycler bins? :rofl: Lower left and upper right. Not saying it's impossible, just stupid.
And why would anyone set the title bars to be that gigantic? I always make them smaller after install (the smaller one in the pic). The same goes for the border padding.
Guys it was already reported that screen is several screens merged...no need to speculate on why there are two active windows...doesn't mean its fake I could do that too :shrug:
ya was just something they were speculating on in the coments where the pic is hosted. "shrugs" if BD even comes close to that it will be awsome. would mean that BD turbo @ 3.8 which apears to be a stock for that model = i7 2600k @ 4.5ghz :)
in there chat they wanted to be famous... LOL
Some people really have nothing better to do than to stir up some fanbois on the net :P
http://i.imgur.com/xblF4.gif
That's when you're dealing with a traditional core with one FPU and one integer unit.
Also, do traditional cores have a single scheduler for the integer plus FPU?
What I'm talking about is, if you have a 256-bit AVX instruction, could AMD just make it so it just registers as using one of the two cores in the module? even though it needs to use both the 128-bit FPU pipelines (which belong to two cores)
https://sites.google.com/site/apokal...nstruction.png
So if the OS sees the 256-bit AVX instruction as just being processed by core 1, then it allows core 2 to process integer instructions
No I don't think so. Both Core 1 and 2 would be in use (technically).
@Apokalipse, Dolk:
Nice to see at least some technical discussion going on, other than analyzing the wave of faked BD screenshots ^^
To your discussion:
Well, after providing 2 instruction streams to a BD module, the OS is kind of out of the loop of executing them.
The fetch/decode/dispatch path works on 2 threads, capable of switching between them on a per-cycle-basis. The dispatch unit dispatches groups of "cops" (complex ops, AMD term) belonging to either thread (I hope, my English skills aren't killing the message here ;)). Such a dispatch group might contain ALU, AGU and FP ops of one thread. I assume the FP ops or the according dispatch group have a tag denoting their respective thread.
From now on the integer and the FP schedulers are kind of independently working on the cops/uops they received, until control flow changes. The FP scheduler "sees" a lot of uops asking for ready inputs and producing outputs. Except for control flow changes it actually doesn't even need to know to which thread an op belongs to. The thread relation is assured by a register mapping (has to be done for OOO execution anyway). So the FPU could execute uops as their inputs become available.
If there are 256b ops, they're already decoded as 2 uops ("Double decode" type) to do calculations on both 128b wide halves of the SIMD vector. I don't think that there is an explicit locking mechanism. The FPU could basically issue both uops in the same cycle or - if there are ready uops of the other thread - issue them to one 128b unit during 2 consecutive cycles (the software optimization manual show latencies of 256b ops supporting this). This actually depends on the implemented policy: issue uops based on their age only or issue them ensuring balanced execution of both threads.
But what's important to your discussion: this happens independently from both integer cores.
Thanks for nice explanation Dresden!
Hopefully not too long till we will be able to torture this design with variety of workloads and see how it performs. Also I think OS scheduling will be quite important for overlay CPU performance due to module based nature of it, but that's the same for any HT enabled CPU.
Overlay I hope AVX performance won't be that far off 4 core SB while SSE should be higher. Now we just need some AVX enabled apps to start rolling out in bigger quantities.
new article at AMDblog about states (C6)
http://blogs.amd.com/work/2011/05/16...AMD+at+Work%29
Isn't that basically the same info that was already posted back in Feb. ? :confused:
Yes, but one interesting thing stood out for me:
I wonder how's that estimated, what's the best case scenario here. Possibly a 12 core Opteron (what are they idling at?), compared to a .... any Bulldozer with all but one module sleeping?Quote:
With the new “Bulldozer” core, when both cores in a module are idle for a pre-determined period of time, the power is gated off to the module. Our design engineers estimate that this will drop the power consumption by up to 95% in idle over the previous generation of processor cores
Another interesting bit about the article is JF commenting on someone and thus also confirming turboboost over all cores are mostly related to non-fpu workloads. e.g. all integer applications will benifit from (at least) 500MHz higher clock. (~16% extra performance through turboost). Most people already speculated the turbo was related to the fpu but now it is somehwat confirmed.
Wonder if that is anything like the Bobcat power draw I saw happening at idle...
The ASRock E350M-1 with the iGPU enabled would drop down to around only 0.836V when idle! Load was as high as 1.36V, but that wasn't ever constant either. This was all while running with C'n'Q OFF and Windows Power Plan set to Min-100% and Max-100% on CPU, so 0.8xxV at full speed (1600MHz) for dual core o_0
On the Sapphire IPC-E350M1 I received, the iGPU was bad in some way :shrug: The system had corrupted graphics immediate at POST, and would lock up on even the BIOS logo. Tossing in a discrete card and hastily turning off the iGPU in the BIOS (before it crashed) rectified the issue, which BTW a BIOS update didn't fix either. Anyways, point of that is the fact the iGPU is disabled and Sapphire's method completely turns it off, compared to ASRock which seems to just... I dunno... remove it's device ID so Windows doesn't detect it? Same situation as above, the idle voltage with no iGPU is something like 0.58V!! Again, still at 1600MHz, just idling in Windows (with programs open, not just after a fresh boot).
The other thing is the voltage is not static, but fluctuates between specific voltages based on load. I had seen basically everything between the .5V to 1.35V (on the Sapphire) in 0.1V increments, with the typical range being around 1.15V to 1.25V for your every-day web browsing. Essentially what seemed to be occuring was the system automatically switching on/off power phases, like what a number of motherboards now do. Granted, what I saw might be exclusive to AMD's mobile line, but I have a feeling we'll see close to the same results with most/all the Bulldozer platforms too. Hell that might be one the main reason for 990FX! :shrug:
[/rambling]
You might want to measure the voltages with a Voltometer, if possible on that board, cause I don't think going as low as you describe is possible with silicone based semiconductors, but I might be wrong. :)
I'd be more than happy to, but I'm not entirely sure where I'd need to measure at :\ I have no qualms about poking around, even at the small SMD caps, but if I took that approach it'd literally take me a week to get lucky enough to find it lol I'll drag out my spare PSU and HDD, boot the board and probe around the major components (MOSFETs to start).
To expand on this bit of info, I found this when looking for hints on 'Dozer pricing with Google lol It's from a 20 Questions AMD blog, from Aug 2010...
It seems as though we won't really be able to utilize a Bulldozer chip to it's full extent until Windows 8 :eek: :( I'm no coder, but I have a feeling it's not going to be as easy a thing to do to make Windows 7 capable of that where M$ can just roll out a 50MB patch, but I'll cross my fingers just in case!! :yepp:Quote:
Originally Posted by John Fruehe@AMD[B
EDIT: JUST in case there was any confusion still, here are some comments he posted from the same Blog entry:
"John Fruehe September 13, 2010
A module will have 2 cores in it. It will be seen by the hardware and software as 2 cores. The module will essentially be invisible to the system."
"John Fruehe September 8, 2010
We have already said one core per thread, period. But a single thread gets all of the front end, all of the FPU and all of the L2 cache if there is not a second thread on the module."
dunno :rolleyes:
I think on launch 150euro; and the crosshair V on 180-200euro.
Sabertooth must cost 30-50euro less than the CH V otherwise nobody will prefer it to a CH V http://news.softpedia.com/images/new...Detailed-3.jpg.
But I think Sabertooth it will be very interesting looking the specs
-sorry for my basic english
EDIT
posted here http://www.xtremesystems.org/forums/...72#post4851872
Hmm, that isn't the TUF model shown in the leaked roadmap, unless they plan on having a TUF Armored and Non-TUF model o_0 I was kinda digging that P67 model heh
EDIT: OK I guess technically it's a TUF, but I think everyone would consider a TUF Sabertooth to be synonymous with an Armored board :\
Looks like the same image posted on page 14 or 15 :D
EDIT: Dug up a better pics
http://img.hexus.net/v2/news/asus/TUFST990FX2L.jpg
http://img.hexus.net/v2/news/asus/TUFST990FX1L.jpg
Can anyone explain the function of the second oval crystal location next to the SB? Seems like all the AMD boards have two (at least the 4 I currently have, 890GX/FX and two E350), one being unpopulated... That for use if the manufacturer wants to offer the ability to adjust the SB's clock itself?
Also somewhat disconcerting (IMO) is the SB heatsink looking plastic haha Does look to come in contact with whatever that other chip is, above the cap/crystal that sit to the right of the black SATA ports...:shrug:
Jetway(Magic-pro) has one also
http://limages.vr-zone.net/body/1226...y_990x.png.png
http://vr-zone.com/articles/jetway-s...aks/12269.html
Had Jetway put a PCIe x1 above the first PEG slot and put on right-angle SATA ports, that'd be a really nice entry level 990X board I think.
EDIT: Though one redeeming point in my eyes is that mini-PCIe slot! I'm becoming a real fan of those as it gives you the ability to run WiFi, cheaply, and not need a full sized ATX card. Which on this board would not be feasible if running Crossfire/SLI as the x1 and PCI slots would all be blocked (the second x1 wouldn't technically, but putting a card there might impede airflow).
With most of the cases I have used I like the placement of the SATA ports. They are above the Graphics card and I always have problems with access to the angled ports.
I agree with you Charged!
For someone who builds a machine and then never touches it again the angled ports usually result in a cleaner install, but for folks like us (most XS members) the angled SATA ports are a PITA.
It's not my deciding factor when picking a board, but I'm not a big fan of the angled ports...
They make swapping gear alot more difficult most of the time.
It quite early info, but still official and very confusing and:
http://www.anandtech.com/show/2881
AMD refers to the module as being two tightly coupled cores, which starts the path of confusing terminology. A few of you wondered how AMD was going to be counting cores in the Bulldozer era; I took your question to AMD via email:
Also, just to confirm, when your roadmap refers to 4 bulldozer cores that is four of these cores:
http://images.anandtech.com/reviews/.../bulldozer.jpg
Or does each one of those cores count as two? I think it's the former but I just wanted to confirm.
AMD responded:
Anand,
Think of each twin Integer core Bulldozer module as a single unit, so correct.
As said before, maybe AMD is happy to begin with a dual (two modules), tricore (three modules, one inactive) and a quadcore (4 modules) bulldozer and see the extra integer cores as "extra execution power" like Intel's Hyperthreading technology.
Interlagos dual 4 module MCM cpu will be a server Opteron part, but maybe also the hexacore (two Bulldozer dies with one inactive module) and eightcore (two fully Bulldozer dies).
Much talks for it, and much against it! One month left to go.
Hm, I remember John saying that they will not use modules for marketing, it will be cores.
@Module, cores:
It was even not a clear thing for AMD in the beginning.
1. The engineers developed BD and called a module "core" (in patents), since in their view they developed a much bigger core than 10h and added a copy of integer execution resources (which is basically similar to a simplified CPU - not x86 compatible it just works on already translated (decoded) instructions like a RISC CPU). The module could also be seen as a heavily optimized dual core processor. It basically has everything, while sharing those parts, where sharing makes sense.
2. Someone possibly thought that these small cores will perform much better than logical cores in a hypothetical SMT machine, so this needs to be marketed somehow.
What they are depends on the applied definitions of "core".
*cough* I see we skipped my above post, albeit a long one, quoting John on it being 1 Module = 2 Cores (and by default, 2 threads):up:
In the interest of getting to the heart of it:
"John Fruehe September 13, 2010
A module will have 2 cores in it. It will be seen by the hardware and software as 2 cores. The module will essentially be invisible to the system."
A bolt of lightning from the AMD Gods of Thunder? Which, defying belief, burns into their skin:
"Henceforth let it be known: a Bulldozer module will have TWO cores! Thee Interlagos will provide ye with up to 16 cores; Valencia with up to 8; Zambezi as well with up to 8. We have spoken, and it was good!" :D
Good one Formula :rofl:
Quote:
"Henceforth let it be known: a Bulldozer module will have TWO cores! Thee Interlagos will provide ye with up to 16 cores; Valencia with up to 8; Zambezi as well with up to 8. We have spoken, and it was good!"
Will the 4 core/ 2 module cpu be a handicapped 8c/ 4 module or only 4c/2m on die?
AMD started paying for working dies rather than wafers so I guess the chance you are gonna see locked/faulty cores just dropped massively.
http://www.theinquirer.net/inquirer/...ies-32nm-chips
Dunno, I think that's quite a good question... At first I wanted to say "they are based on a modular setup, so you'll have true 4/6/8 chips", but then it clicked: These aren't like CPUs where you can just socket in a module after it's been fabed lol So you'll have to start our with a chip being 4/6/8, and if it ends up having bad modules, one can only assume it would have them turned off. Though there might be a means to laser cut the "power line" to essentially turn off that module completely, so then it wouldn't be a 4C 120W chip because the other 2 modules are still being powered.
I don't think, though this is a total barely-educated guess, that AMD is going to be having just 8C dies fabed and disable the cores (by whatever means) to have them meet their respected CPU model. It'd be easier to do I guess, but I feel as though they went with this module design for some reason similar to the whole "modular" thought. *shrug* I'm not going to pretend to know how things are going to work, due to the limited amount of info we have (and that seems to be AMD's plan heh), but that is just how it makes sense to me :)
(Off topic: XLR8, you're not by chance "bOingball" are you? I only ask due to your forum avatar heh)
Each module has two integer cores and two 128-bit FPU's.
the two 128-bit FPU's belong to two cores. They just have a single FP scheduler.
It means fewer transistors used, as well as allowing one 256-bit AVX instruction (decoded into two 128-bit micro-ops) to run on both 128-bit FP pipelines simultaneously.
Thus 256-bit AVX can complete in one cycle while at the same time keeping each core's FP pipeline 128-bit
Having a 256b FP pipeline per core would be faster than using modules if therere enough 256b instructions in enough threads. However, it would be a massive increase in die area for a tiny performance gain.
One thing Bulldozer also has is a combined multiply-add instruction, which takes half the time of separate multiply and add instructions (and also has higher precision).
SB has 256-bit FP pipelines per core, but lacks a combined multiply-add instruction.
Mats, John mentioned Int cores couple of times, but when trying to postulate what IS a core :)
I think we are digging to deep, we have two cores in module, end of discussion ;)
What a ridiculous thing to argue over. The term "core" doesn't have a solid definition. Not one person here has given a definition of core that would be universally accepted within computer science.
I think this eternal arguing about if it's a core or not exists just because people have nothing better to talk about (like benchmarks?).
I agree that some of it is fueled by boredom and anxiety. But I don't expect it to stop once they are out. IMO, the ambiguity of the terms virtually guarantees it.
It's not ambiguous. One module is two cores.
Yes they share resources. But it's still two cores.
One x86 core consists of an integer unit and floating point unit.
One bulldozer module contains two of each.
The FPU's just share a single scheduler, allowing them to ALSO process 1 x 256-bit instruction (decoded into 2 x 128-bit micro-ops) on top of being able to process their own threads separately.
micro-ops belonging to two threads can be issued by the FP scheduler simultaneously.
Or another way of looking at it:
a set of hardware capable of processing its own thread independently, without sharing execution pipelines with another thread (HyperThreading). A module can process two independent threads simultaneously (it has the hardware to do it, with both integer and FP instructions)
The way it can use two 128-bit FP pipelines to process one 256-bit AVX instruction could be considered a kind of "reverse hyperthreading"
It only works on instructions wide enough to be able to span across multiple cores
That's just your definition however. One could just as easily say that a core is everything from instruction fetch through instruction retire.
Even by your definition it wouldn't be two full cores because it isn't two full floating point units. It is one FP that can do work for either "core". Take away one half of that and the CPU couldn't compute the full instruction set.
IMO, "core" is a bad term.
It's not just my definition.
Also, each core in a BD module can and does do its own fetch and retire with its own thread.
Yes, it uses shared hardware to do it. But it still works that way.
Yes it is.
Sharing the FP scheduler (capable of handling two threads of FP micro-ops simultaneously) is just more flexible than two independent schedulers, AND uses less transistors.
Yes it can.
It just means it has to take two cycles to process each 128-bit micro-op of a 256b AVX instruction.
Actually, BD still can do it that way even with the shared FP scheduler - if one FP pipeline is busy, it can send both 128-bit micro-ops of the 256-bit AVX instruction through one 128-bit pipeline. It just takes two cycles instead of one.
In short, it is not really "1.5 cores". It is 2 cores with a kind of "reverse hyperthreading" for specific 256b FP instructions that can span across two cores, implemented by sharing some hardware, which also saves transistors.
RE: Disabled Units
AMD Never did native "triple core", therefore i doubt they will do native "triple module" (6 core cpu)
It will be a native "4 module" with one disabled.
As for their quad core (two module) that is another story...it is more likely to have a native version down the road, but it wouldnt be zambezi
It's AMD's definition, you have simply accepted it. Try and find me a computer science source that agrees with that definition. You won't, because before now because instruction fetch and retire were traditionally considered part of the core.
I agree that it is more flexible and I like that design. But being flexible doesn't mean that it is really two separate FP units. Take away one half and it wouldn't be able to process 256-bit instructions. It is designed to be ganged together to process them. One half can't do half a 256-bit instruction in two steps, that would require a redesign of the FP unit.Quote:
Yes it is.
Sharing the FP scheduler (capable of handling two threads of FP micro-ops simultaneously) is just more flexible than two independent schedulers, AND uses less transistors.
Yes it can.
It just means it has to take two cycles to process each 128-bit micro-op.
From John Fruehe's FlexFP article:
Quote:
The beauty of the Flex FP is that it is a single 256-bit FPU that is shared by two integer cores.
Not just AMD's.
They still are with BD modules.
You do have two sets of fetch, retire operations occurring in the one module simultaneously.
Yes it does it with shared hardware. But it still functions the same way as two cores, with the only exception being that specific 256b instructions can span two cores (by being decoded into 2 x 128b micro-ops).
Which is NOT like hyperthreading. It's exactly the opposite ("reverse hyperthreading")
Being able to process two threads simultaneously without sharing pipelines does.Quote:
I agree that it is more flexible and I like that design. But being flexible doesn't mean that it is really two separate FP units.
Yes it CAN process 256-bit instructions even with one 128-bit pipeline.Quote:
Take away one half and it wouldn't be able to process 256-bit instructions.
256-bit AVX instructions are decoded into 2 x 128-bit micro-ops anyway. That's how it's able to span across two 128-bit FP pipelines (and they are separate pipelines). They don't process a single 256b micro-op.
With separate schedulers, it would just mean always taking two cycles to process each 128-bit micro-op (one half of the 256b AVX instruction) in the one pipeline.
With a shared scheduler. it means that not only is it possible to process both 128-bit micro-ops in one pipeline (in two cycles), it's also possible to process them in two 128-bit pipelines (in one cycle).
Hence "reverse hyperthreading"
Yes it can. And does even with the shared scheduler if one FP pipeline is busy.Quote:
It is designed to be ganged together to process them. One half can't do half a 256-bit instruction in two steps, that would require a redesign of the FP unit.
Then feel free to prove it. Find a source outside AMD in the computer science field that agrees with this definition and we will talk. Otherwise it is pointless.
I'm not going to take your word for it. Firstly because reputable individuals in the computer science industry don't necessarily agree with your definition. Secondly because you obviously don't know what you are talking about. One instruction does not equal one cycle, for example.
Are you deliberately ignoring the rest of my explanations?
A BD module is functionally the same as two cores, with the added ability to process specific 256b instructions using both 128b FP pipelines of each core simultaneously.
You seem to be trying to artificially narrow the definition of a core.
One macro-op doesn't, no. And I never said it did.Quote:
you obviously don't know what you are talking about. One instruction does not equal one cycle, for example.
In fact I even said that a 256b AVX instruction can take two cycles (2 x 128b micro-ops in one pipeline) - and that's not counting the decode, retire, etc.
No, I'm dismissing them because they are an obvious attempt to rationalize your position. That a shared scheduler can operate on independent threads, for example, is irrelevant to the point of what actually constitutes a core.
The definition of what is part of the core varies by who you are talking to (intel, amd, my CS professors, etc) and over time. Early patents related to BD had the core and module terms reversed, so clearly it isn't as cut and dry as you suggest. Traditionally everything that wasn't part of IO, memory controller, and cache hierarchy was part of the core. Integration changes all of that and makes the existing terms ambiguous. If future versions of the bulldozer arch allow separate int cores to work on the same thread or eager execution allows two cores to work on one thread then are we to consider a module only one core?
So each half of the FP unit can work on only half of an AVX instruction while the other half can process something separate? No, I don't think so. To process a 256-bit instruction both halves are obligatory from John's description. They can't process half of an AVX instruction and leave the other half for later.Quote:
A BD module is functionally the same as two cores, with the added ability to process specific 256b instructions using both 128b FP pipelines of each core simultaneously.
It is quite the opposite. You are saying that you have the only proper definition of core. I am saying the term is ambiguous because there are multiple definitions being used by various people.Quote:
You seem to be trying to artificially narrow the definition of a core.
Except that one pipeline can't do both halves of an AVX instruction. That would mean that the circuitry for computing all parts of an AVX instruction are present in both halves of the FP unit. That would be counterproductive and defeat the purpose of sharing it in the first place.Quote:
In fact I even said that a 256b AVX instruction can take two cycles (2 x 128b micro-ops in one pipeline) - and that's not counting the decode, retire, etc.
Like I said, you have set yourself on one definition when, frankly, it is a really ridiculous thing to argue over. I don't see the point of discussing it further. Does a 4870 have 1 core, 160 cores, or 800 cores? What would we call a cpu if it had a hundred INT units, 20 FP units, and one frontend/backend? Who cares. What matters is how a particular architecture works with how we use our computers.
And rationalisation is bad how?
I'm not simply defining something for the sake of defining it like you are.
That doesn't even make sense. How could it not be relevant in the context of Bulldozer?Quote:
That a shared scheduler can operate on independent threads, for example, is irrelevant to the point of what actually constitutes a core.
All you're doing is arguing definitions.Quote:
The definition of what is part of the core varies by who you are talking to (intel, amd, my CS professors, etc) and over time.
Early patents related to BD had the core and module terms reversed, so clearly it isn't as cut and dry as you suggest. Traditionally everything that wasn't part of IO, memory controller, and cache hierarchy was part of the core. Integration changes all of that and makes the existing terms ambiguous. If future versions of the bulldozer arch allow separate int cores to work on the same thread or eager execution allows two cores to work on one thread then are we to consider a module only one core?
If you're going down that route, you COULD technically define 1+1=3 as being true.
A BD module functions the same as two cores, with the added ability to schedule two 128-bit micro-ops from one AVX instruction on the two 128-bit FP pipelines.
Yes.Quote:
So each half of the FP unit can work on only half of an AVX instruction while the other half can process something separate?
EVERY 256b AVX instruction is decoded into two 128-bit micro-ops, then sent to the FP scheduler.
The FP scheduler can send both 128b micro-ops for that AVX instruction to one FP pipeline or both, depending on what's available.
It is not possible to process one 256b micro-op on 128-bit pipelines even if there are two of them. They have to be decoded into two 128-bit micro-ops.
You clearly don't understand how x86 decoding works.Quote:
No, I don't think so. To process a 256-bit instruction both halves are obligatory from John's description. They can't process half of an AVX instruction and leave the other half for later.
An AVX instruction IS NOT processed as one 256b micro-op. It is processed as two separate 128b micro-ops.
I'm not even arguing definitions. I'm actually arguing functionality.Quote:
It is quite the opposite. You are saying that you have the only proper definition of core.
It can do both of them. It can't do both of them at the same time, but one ofter the other.Quote:
Except that one pipeline can't do both halves of an AVX instruction.
You won't always have both FP pipelines available. And it does save transistors.Quote:
That would mean that the circuitry for computing all parts of an AVX instruction are present in both halves of the FP unit. That would be counterproductive and defeat the purpose of sharing it in the first place.
If both 128b micro-ops get simultaneously processed on both 128b pipelines 50% of the time, you will still have better performance than if it happened 0% of the time (like with separate schedulers).
irrelevant since we are talking about x86 cores and BD.Quote:
Like I said, you have set yourself on one definition when, frankly, it is a really ridiculous thing to argue over. I don't see the point of discussing it further. Does a 4870 have 1 core, 160 cores, or 800 cores?
That would be completely different, and not analogous to BD.Quote:
What would we call a cpu if it had a hundred INT units, 20 FP units, and one frontend/backend?
Dont feed the troll.
i like this argument, you wouldnt believe how much im learning, keep it up apok!
You literally have no idea what you are talking about. I was going to bow out of this pointless discussion, but I don't want your misinformation poisoning good minds like poor Manicdan. That is how the game of internet-telephone starts. I haven't programed in assembly in a REALLY long time, but I figure anything is better then letting such obvious misinformation to spread.
Firstly, there are no 256-bit instructions. x86 instructions can be up to 15 bytes in length. Though the size of the instruction can vary because of the various prefixes, registers, etc - so most are smaller, sometimes significantly so.
If you don't believe me feel free to reference the AMD64 Architecture Programmer's Manual, Volume 3
On page 1:
The AMD64 Architecture Programmer's Manual, Volume 4: 128-Bit and 256-BitQuote:
An instruction can be between one and 15 bytes in length. Figure 1-1 shows the byte order of the
instruction format.
Media Instructions includes the same diagram and references the previous volumes.
So if instructions aren't 32, 64, 128, or 256 bits in length then what does it mean to have a 256-bit instruction?
They both say the following in the definitions section:
Essentially 64-bit/128-bit/256-bit instructions are instructions that add support for larger registers, new memory addressing modes, new capabilities, etc. When the processor breaks those down into one or two macro-ops it isn't because the instruction is actually too big for a single unit to handle. It does this because macro-ops have a simpler fixed length encoding that the processor can operate on more efficiently, among other things. A single x86 instruction might become two macro-ops because the instruction is actually telling the processor to do several steps that can be handled by different execution units.Quote:
256-bit media instructions
Instructions that use the 256-bit YMM registers.
Speaking of being handled by different execution units, some instructions can be handled by one of many units and some instructions have to be handled by a specific unit. This is just as true for bulldozer as it is for past processors.
Please reference the Software Optimization Guide for AMD Family 15h [ie, bulldozer] Processors.
In the integer core there are 4 pipelines, 2 INT and 2 AGU. But the units are not identical. Some instructions have to be processed by a particular unit. Examine the diagram on page 36 to see why. Refer to table 10 and you will find lots of instructions that can be done by either INT unit such as ADD or PUSH. While other instructions are done by a specific pipe, such as DIV on pipe 0 and MUL on pipe 1.
This brings us to your claim that either half of the FPU can do it's own AVX instruction simultaneously. This is wrong. On the optimization guide page 38:
In other words, the entire FPU can only operate on 1 AVX instruction in any given cycle. And that instruction can be delayed if the AVX instruction decodes into 2 macro-ops and one of them requires a pipe currently in use by another instruction. Examine figure 3 on the same page and table 8 on page 232 and you will see why this is the case.Quote:
Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
of a FastPath Double if both micro ops cannot issue together.
Just as with the integer units, the FPU units have different capabilities. Some instructions can be done on one of many available pipes while other instructions need a specific pipe because it is the only one with that capability. Shuffles on pipe 1, AVX FPMAL on pipe 2 or 3, AVX FPFMA on pipe 0 only, and so on. Refer to table 12 for further examples.
The FP unit needs all of these execution units to execute the whole instruction set. It is an entire unit with it's own scheduler, retire, etc. To take out "half" of that would require a redesign of the unit. So no, the FPU unit isn't like two independent units connected by "reverse hyperthreading". It is a full floating point unit in its own right that can do work for either of the threads from either integer core.
A quote from the BD article in IEEE micro:
http://www.computer.org/portal/web/c...109/MM.2011.23Quote:
The FPU is a coprocessor model shared
between two integer cores via two-way multithreading.
The FPU has its own out-of-order
engine along with the execution units and register
file, and interfaces with the DE to receive
Cops, the load/store unit to receive and send
load/store data, and the integer cores’ retire
unit to handshake on completion and retire.
I was about to say the same thing Manicdan... While arguing does generally suck, I don't mind this kind because I learn from it haha I've been into computers for coming up on 2 decades (come 2014), but never been into the architectural details like this :)
So to Solus, regardless of whether or not who is right or wrong, I'm learning from the back-and-forthness of this debate. It's not so much the info being given about Bulldozer, wrong or right, that I'm learning from, but the specifics on how certain segments of the CPU function are a core level. Indeed, there still is the issue of incorrect info which can mean I'm (or Manic) not getting the right specifics... but in the end it'll be sorted and means I still come out ahead :p: I've been reading all of the posts, so it's not like I intend on stopping heh
So anyways... I was doing some painstaking translation over at zol.com.cn (where a fair bit of leak images stem from), and the one topic talked about the potential drawbacks of running a 'Dozer on an AM3 pseudo+ board. I can't remember but did we discuss the fact that the AM3 would only default to a CPU-NB speed of 2000MHz, where as the 'Dozer's is 2600MHz default? Meaning that unless a person was to overclock/change multipliers, it would be taking a performance hit from that...
20 pages in the thread, it's a little hard to keep track of what all exactly we've discussed lol (basically what I've read here and what I've read elsewhere)
Frankly, you shouldn't be using an argument to learn arch details. When it comes to arguments people tend to reject factual information that disagrees with their view and/or mold the rest around whatever preconceived notion they already had. Everyone does it to a degree no matter how hard we try. This article titled The Science of Why We Don't Believe Science is a good read on that topic.
I think Apokalipse is wrong and just making stuff up. Dresdenboy's link and John Fruehe's article on FlexFP back up my position. But I haven't programmed in assembly in a long time and I could easily have gotten specific details wrong. Our I could be unconsciously rationalizing a false position. That's why I posted the links to the official AMD documents. You don't have to take my word or Apokalipse's, you can go read them and decide for yourself. Not to mention that they are packed with lots of great information that I didn't remotely touch on in my post.
basically BD witll be equal to SB in AVX if all things are equal, due to having 4 pipelines of 256bit
but since not everything is on avx yet, what happens when theres only 128bit commands? then BD has 8 pipelines, and SB has only 4.
so in your opinion the ONLY thing thats missing from calling all 8 BD cores, actual cores, is that its missing 4 avx pipes?
In my opinion BD is 8 cores because that is how AMD defines their cores. Not because the traditional definition of core necessarily agrees or disagrees with their usage. We aren't in the multiprocessor era where one die is one core and the memory controller, some cache, etc lives outside the core. The definition of the term core was already getting less clear by the time AMD made the first dual-core.
How BD performs will likely depend highly on the application. I am rooting for AMD to have monster multithreaded performance because that is exactly what I need. I do heavy multitasking, deconvolution of 3d image stacks, scientific volume rendering, video/audio encoding, multiple VMs, etc and all are dependent on MT performance, int and fp. Single threaded performance isn't going to matter much at all as far as my web browsing or word processing are concerned.
Here is everything I got (see: understood) from the 20 Questions John answered on his AMD blog.
-BD modules = 2 cores. Call them what you want, they are two cores and will always be treated, by the OS, as two cores.
-Each core's FPU can be looked at one of two ways, but the end result is essentially the same. Either A) It's two individual 128bit FPUs per core that can only work in parallel to process 256bit AVX extensions/threads/processes (whatever is the correct term lol) or B) Each core shares 1/2 (128bits) of a 256bit FPU, but is only 256bit for AVX ext/thr/proc.
So I'm able to conclude:
a. Each core's FPU can process it's own information on a single cycle, but still only has a single scheduler. So I assume a core will receive it's orders ever so slightly delayed after the other core.b. A BD has one FPU capable of 256bit (for AVX), per module. Thus, an FX 4000 series will have 2 modules, for 4 cores, with 2 256bit FPUs; an FX 6000 series will have 3 modules (working, disregarding any potential 'disabled' ones), for 6 cores, with 3 256bit FPUs; an FX 8000 series will have 4 modules, for 8 cores, with 4 256bit FPUs.
That about the gist of it?
The FPU is it's own unit. Neither core really owns it or owns a half of it. The threads from either INT core can send FP commands to the FPU, but only one thread at a time. The commands are decoded, renamed, scheduled, and buffered to wait for execution on the appropriate pipe.
From page 37 of the bulldozer optimization guide I posted a link to above:
Skipping over a few points to page 38:Quote:
FPU Features Summary and Specifications:
•The FPU can receive up to four ops per cycle. These ops can only be from one thread, but the
thread may change every cycle. Likewise the FPU is four wide, capable of issue, execution and
completion of four ops each cycle. Once received by the FPU, ops from multiple threads can be
executed.
•Within the FPU, up to two loads per cycle can be accepted, possibly from different threads.
•There are four logical pipes: two FMAC and two packed integer. For example, two 128-bit
FMAC and two 128-bit integer ALU ops can be issued and executed per cycle.
•Two 128-bit FMAC units. Each FMAC supports four single precision or two double-precision
ops.
Please examine Figure 3 on page 38 to help understand the first and last bullet point. I'll BRB if you want further clarification.Quote:
•Only 1 256-bit operation can issue per cycle, however an extra cycle can be incurred as in the case
of a FastPath Double if both micro ops cannot issue together.
No, that's a good tid-bit of info, thanks :up:
Came across some pics of MSI's 990FXA-GD80, posted by MSI's Sales Manager heh While unlikely they'll get pulled, just in case I yoinked em and tossed them online. :D
http://pro-clockers.com/images/revie...20teaser/1.jpg
Specs:
http://pro-clockers.com/images/revie...90fxa-gd80.jpg
More here: http://pro-clockers.com/images/revie...gd80%20teaser/
Source: MyGarage.ro
nah this time I will skip MSI FXA board. Had 790FX and 890FXA, good boards, but cannot hold voltages straight :( Will be trying ASUS chrosshair I suppose.
I'm partial to Gigabyte, yet if their AMD offerings don't jump on the black PCB bandwagon, I'll be rather displeased lol The price I didn't pay for this ASRock was too good to pass up :rofl: No, I did want it and am quite happy with it for the most part. Sure, there are things I'd like to be changed/different (BIOS options mainly), but it's an older board and that's unlikely :P I haven't been able to stress it any though since my 555BE is an epic POS, confirmed first by my 890GX and solidified by this FX. Don't know how cdawall got it to over 4GHz, but I can't even get it over 3.6 stable :(
But that's off topic! I'd love to get a CVF, but most of what budget I have will be spent on an FX 8000 series :D So the MSI 890FXA-UD65 will have to suffice if/when something new comes along.
gigabyte hopefully will have a tonka bulldozer version heatsink on there boards :) called the G1 Tonka
http://img.donanimhaber.com//images/...ab_dh_fx57.jpg
Ouch, this doesn't look very good, BD 8C being positioned slightly lower than 4C Core i7, though I guess that makes sense, most desktop workloads use 4 cores at most, and isn't one SNB core as big as 2 BD cores? So it'd be a miracle if per core perf. would be close, turbo can mitigate that somewhat but not enough. I hope they do better in server, that's hugely more important for AMD.
Bulldozer will be late ... Expect more august ...
w0mbat posted this ASUS-slide in the "Preliminary Bulldozer and Llano Pricing Revealed"-thread (http://www.xtremesystems.org/forums/...0&postcount=32) which I thought was interesting.
http://www.xtremesystems.org/forums/...1&d=1305925796
Now as I said in that thread (http://www.xtremesystems.org/forums/...6&postcount=37) I tried to figure out the frequencies that he had censored.
And as no one in that thread seemed interested I thought I'd try to post it here. Anyone have any idea what the slide says?
When I zoomed in and studied the pixels it looks like the FX-8110 has a stock frequency of 3.6 GHz and a maximum frequency of 4.0 GHz through AMD's Turbo CORE. The FX-8130P is harder to make out but it looks to have a stock frequency of 3.8 GHz and a maximum frequency of 4.2 GHz through AMD's Turbo CORE.
Anyone have any other ideas? Or is this old news? I haven't seen the slide anywhere else and I thought it'd be interesting to try and figure out the actual frequencies.