Intel Xeon 5570: Smashing SAP records.

**savantu** · 12-18-2008, 10:49 PM

Originally Posted by Shadowmage

The cores on the same silicon die are not connected using HT. My guess is that they have some dedicated coherency traffic buses. If what you said is correct, then AMD's processor diagram will have the L1, L2, and L3 caches all connected to some arbiter which is then connected to the HT port. Instead, note that L1 feeds into L2 which feeds into L3. In other words, like with Nehalem, the L3 handles all the coherency traffic - it just takes a little longer to process the data.

AMD's has a pseudo-exclusive cache ( data cannot be found in the two cache levels at the same time, although “pseudo” means that there are a few exceptions) relationship.Things are problematic when you have an L3 miss , you need to check what the other cores have in their caches.

By using an inclusive cache , with an L3 miss data is guaranteed not to be in the other caches and a memory request is sent.

Given Nehalem's inclusive relationship and the flag system they use to maintain coherency , gives them little to no headaches about coherency traffic.
AMD's caches burn more BW and latency for this problem , that's all.

What's the cache coherency traffic like on Nehalem? It seems to me that AMD is interconnect-bound because they are still waiting for HT3. Intel has less of a problem because QuickPath is higher performance than AMD's current HT implementation.

Your argument is pinpointing the problem on the wrong component. The traffic caused by the exclusive caches is reasonable. It's just that AMD needs a bandwidth improvement for their intra-chip interconnects.

That is simply not true.AMD isn't bottlenecked by interconnects, it is precisely the coherency traffic which kills it.Huge amounts of BW are wasted with maintaining coherency.

In a Nehalem multicpu system , you need to maintain the coherency of the L3s.Furthermore , Intel implemented a directory based coherence protocol which is point to point instead of broadcast.

Not so with the Opteron because data in L1/L2 is more or less guaranteed not to be in the L3 .Also they use a snoop based one protocol in which the caches listen in on transport of variables to any of the CPUs and update their own copies of these variables if they have them. Snooping logic in the processor broadcasts a message over the bus each time a word in its cache has been modified. The snooping logic also snoops on the bus looking for such messages from other processors.
Since K8/10 use 64bit lines , can you imagine the traffic in a 4 socket system to maintain coherency ? What about 8 sockets ? Yeah , HT 3.0 will help , but it is a band aid curing the symptoms by brute force ( more BW ) and not the disease ( a better cache coherency protocol ).

Why do you think Newsys tried to build Horus , a directory based chipset to overcome this ?

**gosh** · 12-19-2008, 02:49 AM

savantu:
Inclusive vs exclusive cache
Why is the L1 and L2 cache smaller on i7 compared to DENEB ?
What about redundant data in cache ?
What is the hitrate for the cache (assocativity) ?
Do you have any thoughts about manufacturing yields comparing i7 and DENEB

There has been some discussions on the internet that Intel did focus very much on servers with the i7 (performance), that was AMD's strong area where Intel was behind. With the i7 Intel may have been focused to much on server performance?

What AMD has done seems to be the opposite (if desktop is the opposite). I think they have some good news for gamers etc with this design for the deneb. Maybe Intel knows this and thats why they are releasing this type of information (let go of the NDA).
The reason why they didn't focused on performance on servers could be that it is enough now. Companies don't need more CPU performance there, they want CPU that draws less power and maybe they focused on that area.

**gallag** · 12-19-2008, 03:05 AM

Originally Posted by gosh

The reason why they didn't focused on performance on servers could be that it is enough now. Companies don't need more CPU performance there, they want CPU that draws less power and maybe they focused on that area.

They will love i7 for servers then, They will be able to cut there power usage in half. Through out those four socket systems and replace with duals. Lovely

**gosh** · 12-19-2008, 03:17 AM

Originally Posted by gallag

They will love i7 for servers then, They will be able to cut there power usage in half. Through out those four socket systems and replace with duals. Lovely

The problem (what I think) is that servers is on 24/7, and that isn't that many servers that need raw CPU performance. Fast disks is probably more important in general. Sometimes there might be high CPU loads but that is probably not a common scenario.

**gallag** · 12-19-2008, 03:39 AM

Originally Posted by gosh

The problem (what I think) is that servers is on 24/7, and that isn't that many servers that need raw CPU performance. Fast disks is probably more important in general. Sometimes there might be high CPU loads but that is probably not a common scenario.

ahhh, So now server cpu performance is not important now, Well it can just join all those other things that become unimportant as soon as Intel excelled in them

.

Why does AMD do so well in server land atm and what will change to make cpu performance negligible once i7 comes to the server sector?

**gosh** · 12-19-2008, 03:53 AM

Originally Posted by gallag

ahhh, So now server cpu performance is not important now, Well it can just join all those other things that become unimportant as soon as Intel excelled in them

.

Why does AMD do so well in server land atm and what will change to make cpu performance negligible once i7 comes to the server sector?

I only explained what I think is important and the same was important "yesterday" also. Power consumption has always been important for servers, but what happened is that the power consumption is much more important today. I read something about AMD when then was selling CPU's for servers, even with current opteron they sell more of the CPU's that draws less power with less speed compared to the more power hungry and faster CPU's (if what I read was right).

If you read in forums where people are looking for home servers etc. They always have power consumption as one very important attribute. My own server draws 52 watt on idle.

**Hornet331** · 12-19-2008, 03:59 AM

Originally Posted by gosh

If you read in forums where people are looking for home servers etc. They always have power consumption as one very important attribute. My own server draws 52 watt on idle.

I just burst our in laughter when i read the sentence...

**gallag** · 12-19-2008, 04:01 AM

All we have heard for the last few years is that all that matters is server performance, It was so to speak the jewel in AMD's crown. It just seems that every time Intel dominates a new performance metric the goal posts shift on how important it is. It seem that since the launch of i7 all that matters is gpu limited gaming performance.

**gosh** · 12-19-2008, 04:08 AM

Originally Posted by gallag

All we have heard for the last few years is that all that matters is server performance, It was so to speak the jewel in AMD's crown.

It was power/watt. And if you go back then CPU's wasn't that fast compared to CPU's today (the new opteron is of course faster compared to the old one).

**Donnie27** · 12-19-2008, 06:09 AM

Originally Posted by gosh

It was power/watt. And if you go back then CPU's wasn't that fast compared to CPU's today (the new opteron is of course faster compared to the old one).

Yet still slower and has been slower for almost 2 years now. AMD's main advantage was Bandwidth from SOC and Point to Point. Dual Socket was lost to Xeon-Bensley as I showed you when you were Duby229. Not only was the Intel Platform Faster, but it drew less power as well.

New Opterons are Faster than the old ones but still slower than their Intel Counterparts. Power Draw will make some, maybe a few pause but others will still get power savings from using Nehalem. Why? Look at power used during the time it took to finish the project? Let's say you're right, You take longer but with less power but overall time took longer. This also can mean overall time of the project/s could mean you used more power anyway.

http://www.reuters.com/article/techn...24573820080708

DreamWorks picks Intel over AMD for chip supply

They used to exclusively use AMD processors. A friend told me about DW testing a Nehalem long before Intel let the rest know about it. Just like with Apple, maybe they tell us just how long ago that way. Please don't say DW were duped by Intel's clever marketing

http://tech.blorge.com/Structure:%20...t-in-the-cold/

AMD’s dual-core Opteron processors will be replaced over the next 18 month – a move involving some 1,000 workstations and 1,500 server units. DreamWorks has agreed to buy Intel’s Nehalem 8-core processor for the high-end workstations, and Larrabee processors for the servers. Larrabee processors will have between 10 and 100 cores, according to Intel.

"He said that the studio’s recent offering, Kung Fu Panda, had its final touches done on computers using both Intel and AMD processors."

Maybe you know more than they do

**JumpingJack** · 12-19-2008, 06:25 AM

Originally Posted by Stukov

Uh, the TLB was a bug from previous designs of L3 cache that was never fixed. It's Fudzilla but it was the first in google search http://www.fudzilla.com/index.php?op...=6242&Itemid=1

DO NOT TRUST FUDzilla to get anything right. His post on this point, and any point about a TLB, is complete and total hogwash.

**savantu** · 12-19-2008, 09:05 AM

Originally Posted by gosh

savantu:
Inclusive vs exclusive cache
Why is the L1 and L2 cache smaller on i7 compared to DENEB ?

AMD always preferred larger L1s with low associativity ; Intel went for small L1s with high associativity.I would assume the hit rate is larger on Intel's higher associative caches.

Intel choose based on extensive simulation a small but very fast L2.it had to be small because of the inclusive relationship with the L3.Results point out that their approach is outstanding performance wise.

Even so , there were serious debates inside Intel over the size of the L2s, many advocated a larger one.

http://www.realworldtech.com/page.cf...2808015436&p=1

What about redundant data in cache ?

That's the drawback of the inclusive approach ; using smaller L1/L2 is one fix to the problem.The other is to increase the size of the L3.

What is the hitrate for the cache (assocativity) ?

Intel's is generally higher as can be seen from Aaron Kanter's review.

Do you have any thoughts about manufacturing yields comparing i7 and DENEB

They have the same die size.Given Intel's prowess in manufacturing I'd assume their yields are better.I don't have hard data on this , it is just a hunch based on past performance.

There has been some discussions on the internet that Intel did focus very much on servers with the i7 (performance), that was AMD's strong area where Intel was behind. With the i7 Intel may have been focused to much on server performance?

How so ? From all reviews , Nehalem stomps the desktop world with ease especially in multimedia benchmarks.It has superb all around performance.
Once graphic drivers are optimized for it , it will increase its lead in games.

What AMD has done seems to be the opposite (if desktop is the opposite). I think they have some good news for gamers etc with this design for the deneb. Maybe Intel knows this and thats why they are releasing this type of information (let go of the NDA).

Gamers are 1% of the market.Barely relevant and Deneb still has to prove that it can beat Kentsfield/Yorkfield as the same clock.

**Clairvoyant129** · 12-19-2008, 09:17 AM

Originally Posted by gosh

It was power/watt. And if you go back then CPU's wasn't that fast compared to CPU's today (the new opteron is of course faster compared to the old one).

Gosh, I know you have no clue about servers but that's ok.

Everyone can now pick up dual socket Xeons to replace their quad socket Opterons and have same/better performance while using less than half the power.

There has been some discussions on the internet that Intel did focus very much on servers with the i7 (performance), that was AMD's strong area where Intel was behind. With the i7 Intel may have been focused to much on server performance?

I don't get it, i7 920 @ 2.66GHz is on par with a QX9650 @ 3GHz, how is that focusing too much on server performance? Seeing all the reviews for Deneb, it will still be slower than the current Core 2 Quads. So what's your argument then? AMD focused too much on server performance beginning?

**Movieman** · 12-19-2008, 09:19 AM

Originally Posted by Zucker2k

Come on Dave, don't be greedy. Let's all have a go at it with our very subjective purchases, unbridled, unfettered; doesn't get any real world than that. Anyway, I have a head start: http://www.xtremesystems.org/forums/...d.php?t=211079

I wasn't being greedy. I only asked for one of each!

**~~DoubleZero~~** · 12-19-2008, 11:42 AM

Originally Posted by gallag

They will love i7 for servers then, They will be able to cut there power usage in half. Through out those four socket systems and replace with duals. Lovely

Sure they can, if they plan to run synthetic benchmarks 24/7 with HT on.

**Shintai** · 12-19-2008, 11:51 AM

Originally Posted by DoubleZero

Sure they can, if they plan to run synthetic benchmarks 24/7 with HT on.

Talking from first hand experience. That is not true. They really do have quad socket performance in dual socket.

**gosh** · 12-19-2008, 12:06 PM

Originally Posted by Clairvoyant129

Gosh, I know you have no clue about servers but that's ok.
Everyone can now pick up dual socket Xeons to replace their quad socket Opterons and have same/better performance while using less than half the power.

What type of server software needs this hardware speed? How many concurrent users (employees) do you think need to be using the server before it will have trouble to handle the load?
What is the idle power for i7

Originally Posted by savantu

They have the same die size.Given Intel's prowess in manufacturing I'd assume their yields are better.I don't have hard data on this , it is just a hunch based on past performance.

Can Intel turn of one or more cores and sell it as X3, X2 etc with this design?

Originally Posted by savantu

Nehalem stomps the desktop world with ease especially in multimedia benchmarks.It has superb all around performance.

There are some big english sites that has shown good results. But if you have read foreign reviews they are not so good (good but not super). It is very fast on memory intensive applications. That is also needed on server software, not on desktops (latency is much more important on desktops)

**_Lone_Wolf_** · 12-19-2008, 12:12 PM

Obviously at 65nm AMD wern't able to give K10 sufficient die area in implement a fully inclusive L3 cache but that isn't the case at 45nm and lower.
I would assume comprehensive simulations would have been run to test such a design revision, but, given the change wasn't made would it be reasonable to conclude any performance gains were deemed not to warrant the required man hours of engineering effort? Granted the current time to market pressure has been great for Shanghai, the design choices would have been made most likely prior to Barcelona even launching.

**~~Zucker2k~~** · 12-19-2008, 12:14 PM

Originally Posted by Movieman

I wasn't being greedy. I only asked for one of each!

My bad, Dave. I would be very interested in seeing results from trusted person like you. Hopefully AMD/Intel reps on this forum would accept the challenge.

Originally Posted by gosh

What type of server software needs this hardware speed? How many concurrent users (employees) do you think need to be using the server before it will have trouble to handle the load?
What is the idle power for i7

Can Intel turn of one or more cores and sell it as X3, X2 etc with this design?

If I understand you right, you're saying Intel's latest dualcores for the server platform are so fast they're useless. Wow, that's dangerous talk. We don't want Intel's engineers to rest on their laurels. We need even faster chips, with ultra low power consumption. Don't you agree?

**Shintai** · 12-19-2008, 12:14 PM

Originally Posted by gosh

What type of server software needs this hardware speed? How many concurrent users (employees) do you think need to be using the server before it will have trouble to handle the load?
What is the idle power for i7

Well, since AMD people have shouted virtualization sicne Core 2 Xeons. Lets try that. Else there is PLENTY of DB applications. Webservices, terminal services etc. Plus in all the cases where you would buy a quad socket machine. You just get a Xeon 5500 based one.

And the idle power....seriously. You are trolling gosh.

http://www.anandtech.com/cpuchipsets...spx?i=3453&p=3
http://techreport.com/articles.x/15818/14
http://www.xbitlabs.com/articles/cpu..._18.html#sect0

Its nothing you cant find on your own. But there is a reason you got 10% of the posts here.

Originally Posted by gosh

Can Intel turn of one or more cores and sell it as X3, X2 etc with this design?

I dont see why not. But Intel earlier said such things would be scrapped. AMDs reason to do so was horrible process manufactoring mixed with a very large diesize.

And there is no such CPUs like those on Intels nehalem roadmaps. There are 3 core designs. Bloomfield, Lynnfield and Havendale.

**Hornet331** · 12-19-2008, 12:26 PM

Originally Posted by Shintai

Well, since AMD people have shouted virtualization sicne Core 2 Xeons. Lets try that. Else there is PLENTY of DB applications. Webservices, terminal services etc. Plus in all the cases where you would buy a quad socket machine. You just get a Xeon 5500 based one.

And the idle power....seriously. You are trolling gosh.

http://www.anandtech.com/cpuchipsets...spx?i=3453&p=3
http://techreport.com/articles.x/15818/14
http://www.xbitlabs.com/articles/cpu..._18.html#sect0

Its nothing you cant find on your own. But there is a reason you got 10% of the posts here.

I dont see why not. But Intel earlier said such things would be scrapped. AMDs reason to do so was horrible process manufactoring mixed with a very large diesize.

And there is no such CPUs like those on Intels nehalem roadmaps. There are 3 core designs. Bloomfield, Lynnfield and Havendale.

yeah i dont think we ever will see S1336 dualcores. Thought on the other hand, i think we maybe see dualcore lynnfields without IGP.

**gosh** · 12-19-2008, 12:30 PM

Originally Posted by Shintai

And the idle power....seriously. You are trolling gosh.

http://www.anandtech.com/cpuchipsets...spx?i=3453&p=3
http://techreport.com/articles.x/15818/14
http://www.xbitlabs.com/articles/cpu..._18.html#sect0

The problem is that there are some sites (english sites) that allways say good things about intel.

Here is one review from another site in anothre country
http://sweclockers.com/articles_show.php?id=6125&page=5

**gosh** · 12-19-2008, 12:36 PM

About cache hitrate

Originally Posted by savantu

Intel's is generally higher as can be seen from Aaron Kanter's review.

isn't the i7 L3 cache 16 way associative? And deneb has 48 way associative?

What does that mean in hit rate?

I know that i7 has higher Hz on the cache (more power needed) and maybe they use that to compensate?

**demonkevy666** · 12-19-2008, 12:41 PM

Originally Posted by savantu

AMD's has a pseudo-exclusive cache ( data cannot be found in the two cache levels at the same time, although “pseudo” means that there are a few exceptions) relationship.Things are problematic when you have an L3 miss , you need to check what the other cores have in their caches.

By using an inclusive cache , with an L3 miss data is guaranteed not to be in the other caches and a memory request is sent.

Given Nehalem's inclusive relationship and the flag system they use to maintain coherency , gives them little to no headaches about coherency traffic.
AMD's caches burn more BW and latency for this problem , that's all.

That is simply not true.AMD isn't bottlenecked by interconnects, it is precisely the coherency traffic which kills it.Huge amounts of BW are wasted with maintaining coherency.

In a Nehalem multicpu system , you need to maintain the coherency of the L3s.Furthermore , Intel implemented a directory based coherence protocol which is point to point instead of broadcast.

Not so with the Opteron because data in L1/L2 is more or less guaranteed not to be in the L3 .Also they use a snoop based one protocol in which the caches listen in on transport of variables to any of the CPUs and update their own copies of these variables if they have them. Snooping logic in the processor broadcasts a message over the bus each time a word in its cache has been modified. The snooping logic also snoops on the bus looking for such messages from other processors.
Since K8/10 use 64bit lines , can you imagine the traffic in a 4 socket system to maintain coherency ? What about 8 sockets ? Yeah , HT 3.0 will help , but it is a band aid curing the symptoms by brute force ( more BW ) and not the disease ( a better cache coherency protocol ).

Why do you think Newsys tried to build Horus , a directory based chipset to overcome this ?

an inclusive L3 cache in nehalem and an noninclusive L3 cache in k10/ shanghai. nice post but HT also is coherent.

there was a reason that L3 cche like that??? anyone know why (I don't know or remeber why)

I don't think HT3.0 will be big enough for 4 socket shanghai.

**demonkevy666** · 12-19-2008, 12:43 PM

Originally Posted by gosh

About cache hitrate

isn't the i7 L3 cache 16 way associative? And deneb has 48 way associative?

What does that mean in hit rate?

I know that i7 has higher Hz on the cache (more power needed) and maybe they use that to compensate?

good luck on finding that cpu-z screen of caches.associations >_>....

Thread: Intel Xeon 5570: Smashing SAP records.

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions