There is good article from czech VIP user no-x:
http://translate.google.cz/translate...je-radic-gddr5
AMD Kaveri bear two memory controllers: DDR3 and GDDR5 - analysis
There is good article from czech VIP user no-x:
http://translate.google.cz/translate...je-radic-gddr5
AMD Kaveri bear two memory controllers: DDR3 and GDDR5 - analysis
i can see it as a possibility as a stop gap solution until ddr4 becomes more widely available
heck AMD will have a 2 major players coding for it so why not
question is whats in the steambox?
Attachment 0
Recently, we exclusively unveiled that Kaveri, successor to the current "Trinity" high-end APU (Fusion A8 and A10 family) features a GDDR5 memory interface. This time we will talk about architectural enhancements of AMDs upcoming mainstream APU Kaveri as well as enhancements of the Steamroller cores which will also make their way into servers and high-end desktop systems in 2014. The information comes from a "Preliminary BIOS and Kernel Developer's Guide for AMD Family 15h Models 30h-3Fh Processors" (you can find a similar document here, dated January 2012) document, available to interested developers.
Store to load forwarding optimization
Dispatch and retire up to 2 stores per cycle
Improved memfile, from last 3 stores to last 8 stores, and allow tracking of dependent stack operations.
Load queue (LDQ) size increased to 48, from 44.
Store queue (STQ) size increased to 32, from 24.
Increase dispatch bandwidth to 8 INT ops per cycle (4 to each core), from 4 INT ops per cycle (4 to just 1 core). 4 ops per cycle per core remains unchanged.
Accelerate SYSCALL/SYSRET.
Increased L2 BTB size from 5K to 10K and from 8 to 16 banks.
Improved loop prediction.
Increase PFB from 8 to 16 entries; the 8 additional entries can be used either for prefetch or as a loop buffer.
Increase snoop tag throughput.
Change from 4 to 3 FP pipe stages.
http://www.brightsideofnews.com/news...-unveiled.aspx
And looks for 3CU/6C APUs! ( Or maybe 4CU/8C too?) Nice one, CPU with 2500 MHz DDR3 IMC, iGPU with GDDR5 (for the highest model). Im looking forward and for FX SR too :)
Flanker thanks for posting the relevant news mate :). I'm still reading it, looks interesting. Will comment later :)
edit:
Wow ,BSN found some massive gold mine of info,some of which he haven't seen before :).
What Flanker quoted above was unknown before.
The document lists the following changes to improve instructions per clock (IPC):
Store to load forwarding optimization <- big improvement(store handling sucked in BD/PD)
Dispatch and retire up to 2 stores per cycle <- same as above
Improved memfile, from last 3 stores to last 8 stores, and allow tracking of dependent stack operations. <-complements above
Load queue (LDQ) size increased to 48, from 44. <-solid improvement to load subsystem
Store queue (STQ) size increased to 32, from 24. <-complements above mem. store subsystem changes
Increase dispatch bandwidth to 8 INT ops per cycle (4 to each core), from 4 INT ops per cycle (4 to just 1 core). 4 ops per cycle per core remains unchanged. <-massive improvement in MT workload
Accelerate SYSCALL/SYSRET. <- I have no idea how much faster this change makes the syscall/sysret,probably noticeable improvement
Increased L2 BTB size from 5K to 10K and from 8 to 16 banks. <-solid improvement
Improved loop prediction. <- solid improvement (don't know how good though)
Increase PFB from 8 to 16 entries; the 8 additional entries can be used either for prefetch or as a loop buffer. <- prefetch was already solid in BD/PD, making it better cannot hurt
Increase snoop tag throughput. <-no clue
Change from 4 to 3 FP pipe stages. <- don't know what to think of this. It's listed as improvement so less stages is good(shorter pipeline usually means better IPC).
why dont they use ddr4; or gddr6? 3 and 5 are kind of old right now eh?
Ddr4 isn't exactly out yet, now will be when it launches, so that would make for some useless chips and boards. I'd think kaveri's successor will jump on the ddr4 wagon asap, probably with a new socket.
I'm hoping kaveri is offered on fm2, otherwise I will wait for ddr4 to upgrade. A 6 core apu sounds nice though...
i have a feeling SR will be AMD's comeback
GDDR5 will be ideal for iGPU. I heard, performance of this new iGPU will be as HD 7750, so very good. DDR3 have limited bandwith, with theoretical DDR4 will be the same at beginning (first DDR4 could have Broadwell in Q2/Q3 2014).
APU Kaveri will be very interesting, but what about Steamroller FX?:) Any news?
Good find Flanker and nice analysis informal :up:
Looks to me more like extracting additional inctruction-level parallelism, not thread-level parallelism.
Lower latency means same max theoretical IPC, but lower branch misprediction penalty, less waiting for the result of previous operations - should be a nice increase in real-world apps. Also this seems unusual - so far most architectures evolved from shorter to longer pipeline, not the other way.
About the memory controllers: I hope the MC can work in both modes (DDR3 and GDDR5) and selects one mode at boot, similar to how Deneb had DDR2/DDR3 controller. Another possibility is selecting modes during packaging (blowing on-chip fuses), in which case SKUs will be locked to one or other type of memory, probably GDDR5 for mobile and ULV chips and DDR3 for desktop. I hope it's the previous, but the latter seems more likely.
Yep I think that IMC will work in dual mode,just like graphics cards can work with cheaper DDR3 and GDDR5 :). AMD is doing the same thing with new Kaveri, the downside of this approach is somewhat more die area in the IMC department(and more complexity). As for FP unit and latency, you are probably right,but I wonder if BSN didn't just misunderstand the document(that we cannot see) in which AMD lists the changes in FP unit as "trimmed down" FlexFP with 3 pipelines(as they call it). What it basically says is that they axed one "MMX" pipeline(128bit) which was used for common SSE if I recall correctly,and they will use the other 2 FMACs to help the execution of those same ops now(this is my interpretation- SSE ops would therefore be executed on all 0-1-2 pipes instead of only pipes 2 and 3 as before) . Max. FP throughput would still be unchanged ,except for instruction latency changes of course, since only 2 128bit FMACs would be used for a total of 16 fp ops per cycle per module which is the same max. as PD(8 single prec. fp ops per FMAC when FMA is used).
Whatever the case is when it comes to FP, I have no doubt that it will be noticeably faster than PD is today. AMD lists some various numbers ranging from ~20% to 30%,which I think is in the line with the changes they made to the core. Add in the fact that we will have 3 module Kaveri as mainstream APU in Q4, it can easily be the case that 3M 3.8-4.2Ghz Kaveri will be on par with 4M 8350 in MT workloads and massively ahead in ST ones(~30%). This should be enough to invalidate the FX8xxx in the short term until the new FX9x comes,based on a 5M SR in 2014 (which is what I hope they will do since this is what the server segment will have in store ,a 5M ~3.5Ghz parts for single socket and 10M MCM 2.5-3Ghz parts for multisocket segment).
Great info's finally!
What an amazing upgrade this would be for those running 7700 cards and budget FX cpu's. A 2 for 1 :)
Initially much would hinge on drivers no doubt. No matter what sounds like tremendous bang for the buck! Great overall arch moving forward.
Some good newz here :
http://www.xtremesystems.org/forums/...-2013-XBitLabs
I believe, we will see SR FX later...Yes, good news +1 ,-)
3M/6T FM2 compatible Kaveri APUs branded as Opterons in 2013? :)
I think, we will see in 2014 classic AM4 socket or something bigger than FM2.
I've read Kaveri will support DDR4. Multiple sources say that there might be a new socket (FM3), but yeah, should still be compatible with FM2.
http://gamingio.com/2013/03/amd-kave...ry-controller/
Hot news from Ars technica concerning Kaveri.
http://arstechnica.com/information-t...ear-in-kaveri/
Fudzilla:
Chipmaker AMD is getting all enthusiastic over Heterogeneous Systems Architecture (HSA) as its cunning plan for the future.
Recently it has been talking to Ars Technica about something else dubbed "heterogeneous Uniform Memory Access" (hUMA) which is its take on HSA. HSA involves developing systems with multiple different kinds of processor, connected together and operating as peers. Normally it is CPUs and GPUs.
Armed with another set of acronyms AMD talks about splitting workloads between a CPU and a GPU, and the creation of a general purpose GPU (GPGPU). But a GPGPU is awkward for software developers, some of whom might think that GP stands for guinea pig and others are not happy that the CPU and GPU have their own pools of memory.
HUMA is AMD?s way around this problem. Using HUMA, the CPU and GPU share a single memory space and the GPU can directly access CPU memory addresses, allowing it to both read and write data that the CPU is also reading and writing. It is also cache coherent so the CPU and GPU will always see a consistent view of data in memory. If a processor makes a change then the other processor will see it.
We will first see HUMA in the chip codenamed Kaveri. It mixes up to three compute units using AMD's Bulldozer-derived Steamroller cores with a GPU. The GPU will have full access to system memory. It should be out in the second half of the year.
It appears likely that the chip AMD is designing for the PlayStation 4 later this year will also be a HSA system.
And...
-Much easier for programmers
-No need for special APIs
-Move CPU multi-core algorithms to the GPU without recoding for absence of coherency
-Allow finer grained data sharing than software coherency
-Implement coherency once in hardware, rather than N times in different software stacks
-Prevent hard to debug errors in application software
-Operating systems prefer hardware coherency - they do not want the bug reports to the platform
-Probe filters and directories will maintain power efficiency
-Full coherency opens the doors to single source, native and managed code programming for heterogeneous platforms
-Optimal architecture for heterogeneous computing on APUs and SOCs.
looks good for APU future systems.
Could it be soldered GDDR5 for notebooks and DDR3 DIMMs for Desktops? I don't believe RAM manufacturers will make a DIMM module just for one platform, that never went well on the past.
Also, can GDDR5 even be socketed on a DIMM? Just asking.
If you've kept yourself informed up to now, then no need to read the link by djohny. Nothing new.
No offense intended, just trying to save people some time.
Interesting info about next high performance chip. Thanks to yuri.cs from CZ forum for the link :)
http://www.rage3d.com/articles/hardware/amd_worldcast/
Kaveri will have significantly more memory bandwidth than any APU (or CPU with IGD) in the past.
How it is done? There is only one real way to do it.
Take a look at the memory prices and you're likely to figure it out.
The cost is not much higher (if any) and the high bandwidth will be available for the normal users too.
The solution should be good thru atleast couple of generations.
While GDDR5 would offer some serious bandwidth, it is not the most desirable solution for a desktop or notebook system as the memory might need to be expanded or replaced.
This is just pure speculation, of course.
;)
Im looking forward for SR FX, if it will be ready in Q1 2014, Il be shocked :)
http://www.hardwareluxx.de/community...or-955355.html
only the clocks seems very high....if is it true...
PS: nice joke :D
http://www.ocaholic.ch/modules/news/...p?storyid=6786