source

N HIS KEYNOTE, Chuck Moore basically laid out the K8L in slightly more detail than we did. Either way, here are the highlights.

First, it has a shared expandable L3 cache, necessary because it is a native quad-core design. The one massive enhancement to the mix is that AMD finally has the ability to independently change core voltages for power savings. It now can also change the north bridge voltage independently of the cores. This is a huge win, we are told voltage differentials and problems with them were one of the main scaling headaches of the K8 core to this point.

Next is memory. The new core will support 48-bit addressing and 1GB pages. Cray and SGI will be very happy with this, until they hit that memory wall again. There is also official co-processor support, strongly hinted to be on a HTX card. The key here will be the platform is aware of them vs having to hack them in.

The other whopper Chuck dropped was that DDR2 is coming and DDR3 is in the wings when the spec 'settles down'. Old news, FB-DIMMs are the future, right? AMD has said they are supporting them, but the big news is that they are not forcing support. Unlike Intel's approach, Blackford supports only FBD, AMD will let you choose. This seems to strongly suggest that the controller on the later gens will be quite flexible indeed.

Next up is RAS, another area where AMD is sorely lacking. It is addressing the major sore points with support for memory mirroring, data poisoning support, and HT retry. It looks like it is following the IBM roadmap more than the Intel one here.

IPC is also going up in a big way. It is doing the obvious doubling of SSE/FP resources, old news now, but it goes a lot deeper than that. There are a bunch of added instructions, starting with the bit manipulation instructions LZCNT and POPCNT. It also added SSE extensions EXTRQ/INSERTQ and MOVNTSD/MOVNTSS. No word on SSE4 though.

The last bit is much more aggressive prefetch to 'feed the beast'. It has gone from 16B to 32B, an obvious step with the added SSE number crunching power. On top of this, it has out of order loads, and other tweaks to use the available bandwidth in a much more efficient manner.

For those who thought K8L was more or less a tweaked K8, you are wrong. It looks like no part of the core has been left unmolested by the elves working the CAD stations. It looks like AMD will have a credible response to the Intel MCW architecture after all. 2007 will be a fight after all. µ



AMD readies hounds to blood Intel's next gen hares
source

AMD IS NOT sitting on its hands in the face of the Intel Merom attack. It has the imminent Rev F cores coming, and after that, there is the somewhat mysterious K8L as it is mentioned, no doubt followed by others. If the current gen parts are Rev E, the next is Rev F, what is G and H, and when will they be out? E is 2005, F is 2006, so that would put G in 2007 and H in 2008.

We all know about F, code named Santa Rosa for the 2xx/8xx Opterons, and Santa Ana for the 1xx. They bring a new memory controller, DDR2-800, and all the long talked about goodies to the mix, and should be out at Computex. One thing it is not talking about on these beasties is enhanced memory RAS and quad-port crossbars.

AMD is not talking about releasing QC parts until G, but sources tell us if sales start flagging, or if Intel starts kicking them around, it could be pulled in to F this year. It is a marketing thing, just like DDR2 on E was, and of late, AMD seems to be very good at timing.

The parts are out there, and we are hearing various things, all centering around about a 10% performance gain, clock for clock. Server folks with 1207 parts tell us the gain is lower, desktop and gaming folks are aiming higher. It could just be variants among pre-release parts, or it could be the memory RAS taking a bite out of latency on the server parts. Either way, look for a bump.

So, what follows F? How about G? Names you say? 2xx/8xx is Deerhound and 1xx is also a hound, but no names yet, sorry. When? 2007, Q2/3 or so, which means they will be taping out any day now. They plug into 1207 and bring a lot to the table.

Everything I read tells me these will be quad core, and I have not seen a reference to a dually anywhere, but they no doubt will exist. The QC parts on 65nm are said to be about 250mm but I have not personally taken a ruler to one. This is a pretty impressive thing if you consider that it has a 2MB L3 cache, it looks like the F cache shrink will be carried over.

For features, it looks like G will drop all pretences of DDR1 and all of no one will stay up at night crying over it. In its place comes the next gen of Pacifica, basically almost full I/O virtualisation. We told you about the doubled FP units a bit ago, and they look to be 2x 64 not a widening to 128 bit. This should be one hell of a kick in the pants for the HPC set.

The HPC crowd will also love the memory controller enhancements. Think 1GB page tables and 48-bit physical addressing for a total of 256TB of RAM. Can you say Cray and SGI doing the happy dance? This part should have a happy home in large servers.

The problem for large Opteron machines is that they tend to suck at the 8S node, mainly because cache coherency traffic eats all of the HT bandwidth. If you want to fix this problem, you need to have a much better filtering system like that of Horus, which explains why AMD hired the guy who designed it. G will have just such an enhanced filter, but the reasons why, other than making 8-ways functional will have to wait for H.

Oh, H you say? Cerebus on 2xx/8xx and Wolfhound on 1xx? Yeah, they will go 16-way glueless, and it should actually not stink this time. Why? That is a long story, go get a drink and relax while you read. If you see references to AMD touting 64P systems, it is really this 16S 4C that they are talking about.

The first thing is that AMD will add a 6MB L3, but we have heard different numbers floating around. This will ease a lot of the pressure on the HT system, as will a boost to 2.6GHz. It looks like systems based on H will eat a huge chunk of the low end of the high end market or something like that. My head hurts from thinking about it.

The other big problem in this area is latency, AMD gets killed by the added latency of cache coherency when you go from 2S to 4S, and at 8S, goodnight Gracie. To handle this, you would need better topologies to reduce the number of hops seen by the traffic.

H addresses this in two ways. The most obvious is that they can do non-flat designs, IE diagonal links, and if they have a clue, and I think they do, hypercube configs. Here's to hoping. They also can split up HT links from 1x16 to 2x8, so if you need lower latency, you can trade peak bandwidth for it. This should make for some interesting routing layouts.

When you are talking boxes that big though, RAS is a topic that you can't avoid. AMD is not sitting still, and H will bring a huge raft of RAS enhancements. Instead of aping Intel, AMD is going to ape IBM. Nice target to pick for the market segment, but no specifics yet. The one thing that stands out looking at the Intel vs IBM RAS roadmaps is memory mirroring and RAID, but both sides will be there in 2008.

What are you going to feed this beast with? How about 4 FBD channels? Whether it is FBD 1 or 2 is still up in the air right now, but it will most likely be 2. AMD is down on FBD right now mainly because it has failed to live up to the power numbers, but latency also is not all that hot either. There is no choice to the decision in the end, but H looks like the longest they can possibly put it off.

There is a lot more in H, and it will come out sooner or later, as will all the details of G. F is all over the place, so there are few surprises still lurking in it. It looks like Intel will be back for late 2006 and early 2007, AMD for the following 12 months, and then the comes Neahlem vs Rev H battle. How that turns out is anyone's guess, but it sure will be fun to watch. Now that you know both sides, place your bets. µ
will it give core 2 a serious competition??