Intel Xeon 5570: Smashing SAP records.

Printable View

Show 100 post(s) from this thread on one page

12-18-2008, 03:41 AM
Stukov

Quote:

Originally Posted by savantu

BS.

There is no relation between TLB bugs on similar designs ( K10/Nehalem) , not to mention CPUs designed 10 years.

Each patient with its own disease.

Uh, the TLB was a bug from previous designs of L3 cache that was never fixed. It's Fudzilla but it was the first in google search http://www.fudzilla.com/index.php?op...=6242&Itemid=1

Quote:

We learned that the good old K6-III had a similar L3 cache errata to what we now know as the TLB bug, and the K10 simply inherited this almost a decade (nine years) old problem. After more than a year AMD finally found a solution for TLB and the new B3 CPUs will finally fix this raging issue.

The K6-III was the first CPU to have L3 cache from AMD and it had some similar issues. At that time AMD didn’t make a big deal out of it and it all went by quietly, but at that time AMD didn’t really have a strong server market presence, as TLB mostly affects servers, not desktops.

It is good to know that AMD solved this issue, but the issue itself ended up costing AMD billions. What can you do, sooner or later you have to slip.
12-18-2008, 04:09 AM
Macadamia

Quote:

Originally Posted by Shintai

I would look on the cache architecture if I was you ;)

The AMD chips suffer from none of that, so core scaling would be as simple as i7's etc. Just add cores and L3 (if you wish).

Once CHTT 3+ gets onto Istanbul I don't think you'd see real problems with even 24 cores on 4P.
12-18-2008, 04:24 AM
Clairvoyant129

Quote:

Originally Posted by gosh

They was early yes! And because of that the advantages wasn't that obvious. You need to understand how the processor work in order to understand the advantages. Applications wasn't that multithreaded also, so few applications was able to show the performance gain using this type of processor. Also developers need to adapt to the main market and that means that they need to adapt to Core 2. Developing for a processor (even if that processor has a more modern design) that has a tiny market share isn't economically smart.

???

While Phenom does scale better than Core 2s, that's where all the advantages end. With Nehalem, all the advantages AMD had are gone.

Also, you seem to be implying that Phenom could have made the same kind of impact Nehalem did for the server market if developers had taken advantage of multi-core CPUs. :rolleyes:

http://www.enumae.com/images/Nehalem%20965-900.png

4 core i7 965 trading blows with dual socket 8 core Opterons. Xeon Nehalems will completely annihilate any advantages AMD had.
12-18-2008, 04:37 AM
gosh

Quote:

Originally Posted by Clairvoyant129

???

While Phenom does scale better than Core 2s, that's where all the advantages end. With Nehalem, all the advantages AMD had are gone.

I was talking about design. if you get a C2D running at 20 GHz, then it dosent matter if you have i7 or deneb, C2D will win anyway.

Also I think that the most important attribute today is how much power you get in relation to energy.
12-18-2008, 04:46 AM
Ghostbuster

Quote:

Originally Posted by gosh

They was early yes! And because of that the advantages wasn't that obvious. You need to understand how the processor work in order to understand the advantages. Applications wasn't that multithreaded also, so few applications was able to show the performance gain using this type of processor. Also developers need to adapt to the main market and that means that they need to adapt to Core 2.

And the problem is that you yourself don't even fully understand how a processor works at all. Furthermore Core 2 Duo (and Core Duo earlier) was tearing up the highway when it first came, and that was even before developers "need to adapt to Core 2". :nono:

Quote:

Originally Posted by gosh

Developing for a processor (even if that processor has a more modern design) that has a tiny market share isn't economically smart.

"Modern design"? Then we can also conclude Itanic has modern design since it also had integrated L3 cache before Phenom did (as was Gallatin years before). Its not called "economically smart", more of "progress" and "design decisions". The processor has to serve both markets. :doh:

Quote:

Originally Posted by Stukov

Uh, the TLB was a bug from previous designs of L3 cache that was never fixed. It's Fudzilla but it was the first in google search http://www.fudzilla.com/index.php?op...=6242&Itemid=1

Fudzilla again and his endless banter about TLB bugs, I wonder how does "L3 cache" on the motherboards get affected. Anyway, it wasn't really L3 cache, its because the K6-III already had integrated L2 cache that it had to treat the external cache as L3. That external cache is considered as L2 with other processors. And I did had K6-III, with "L3" cache or without "L3" cache there were simply no performance difference seen (tested on those motherboards with "false cache")... .:D
12-18-2008, 05:31 AM
Donnie27

Quote:

Originally Posted by LOE

you miss trolling? sure you would like to troll back :D

the lack of those is result of 2 simple things - everyone expects nehalem based servers to perform good in server environment, and second - I doubt many people actually care about SAP, infact I don't think you will find people who use it and actually benefit from that performance wasting time over internet forums

I know only one company using it, and the people who manage it don't care about anything else than getting their job done

No, it's got nothing to do with trolling LOL! No, NOT everyone thought Nehalem would be a monster and there are too many posts to delete from folks saying just that.

Note, I would like to see Apache and etc... benchmarks. It's also another double stand and false performance measure as usual. What do I mean LOE, just getting the job is now a new standard and acceptable after speed mattered more when Opteron ruled the roost. That my friend is the definition of a Troll. If speed, interconnect and bandwidth mattered then, it should matter now as well. Note #2. I made these same arguments FOR OPTERON, so was I trolling then? The moving finish line is what causes flames to happen in the first place. Not to mention it is the height of hypocrisy.
12-18-2008, 06:03 AM
Donnie27

Quote:

Originally Posted by gosh

You missunderastand what I was writing, I am not that good at english and didn't want get involved in totally different discussion.

http://www.xtremesystems.org/forums/...3&postcount=65

Maybe it was a bad translation or something but have we forgotten the Pentium Pro? I was NOT trying to trash you out here. They figured out a better way to get the Job done and dumped it. They have more experience with *real L3 than AMD. Intel's S-RAM is faster than AMD's.

Example, one guy went off the deep-end and said I said that AMD didn't use smart memory. I'd said it wasn't advanced as Conroes, AMD's buffering routines (smart cache, smart memory access and etc..) are just now catching up to Conroe and not as good as Penyrn. This I was told by an AMD leaning programmer who's more honest than Fan. He added Intel needed it more than AMD since Intel wasn't using an IMC. This is relevant because Nehalem is using even more improved buffering scheme, TLB, a bigger IMC, QPI and etc.... Then we're surprised by its performance?
12-18-2008, 06:07 AM
Jacky

Quote:

Originally Posted by DoubleZero

HT is great... for benchmark apps. The guys drooling over the sap records must have missed the reviews of I7 where there it gets amazing results with HT on in... benchmark apps, and then in real word apps, it gets the same or even worse results then with HT off.

What you are saying is clearly FUD, I've read about 20 nehalem reviews and I'm pretty sure it is much faster with HT on average. If HT helps, it helps by a lot, if it does harm, then only slightly. Most servers run a single application so they can disable HT if necessary.

Quote:

Originally Posted by justapost

Of course that chip will be available only round 04/09. 8 cores on 45nm? At what frequency?

I've seen speculative calculations putting it anywhere from 2.3-2.6ghz. I'd assume the exact clock speed is irrelevant, we know the range and what the architecture is capable of, it should be pretty darn impressive.
I just wanted to point out that you were talking about the very far future, AMD has to weather the storm - 2009 2P-nehalem *and* a weak economy - first.
12-18-2008, 06:13 AM
Donnie27

Quote:

Originally Posted by Jacky

What you are saying is clearly FUD, I've read about 20 nehalem reviews and I'm pretty sure it is much faster with HT on average. If HT helps, it helps by a lot, if it does harm, then only slightly. Most servers run a single application so they can disable HT if necessary.

I've seen speculative calculations putting it anywhere from 2.3-2.6ghz. I'd assume the exact clock speed is irrelevant, we know the range and what the architecture is capable of, it should be pretty darn impressive.
I just wanted to point out that you were talking about the very far future, AMD has to weather the storm - 2009 2P-nehalem *and* a weak economy - first.

QFT! But when it comes to a "weak economy" so does Intel (weather the storm).
12-18-2008, 06:41 AM
gosh

Quote:

Originally Posted by Donnie27

Maybe it was a bad translation or something but have we forgotten the Pentium Pro?

will try again here

i7 has uses very similar technology as Phenom. Anandtech applauded i7 for its design and in same article criticized AMD phenom. That was the problem (what I meant).
What's good with the L3 cache is that threads from each core is able to share memory without going to main memory, that makes it more simple for programmers to create threads without initializing each thread with their own memory pools e.t.c. Threads can share memory in a much more effective way. Phenom did select L3 cache for this and so did i7 also. If some of the processors did choose another technique that was getting same result, fine! The issue isn't specifically L3 cache and who was first. The issue is that multiple cores needs a technology so that it is possible (simpler) for the programmer to thread the application that he/she is doing and getting a boost for doing that.
It is strange that Anandtech applauds one CPU that has a good technique for good scaling and at the same time criticize another that has used similar technique and did it before i7. If Anandtech wants a reputation of beeing neutral this is very strange behavior. I don't think anyone that reads these tests etc believes that they are neutral and maybe they know that it doesn't matter also. Just feed those that need information and then they will spread the word all over the internet. What Intel has succeeded enormously in is their marketing strategy.
12-18-2008, 07:05 AM
Shintai

Quote:

Originally Posted by gosh

i7 has uses very similar technology as Phenom. Anandtech applauded i7 for its design and in same article criticized AMD phenom. That was the problem (what I meant).

Phenom and i7 uses similar technology? yes ok..a F1 racer and lada is also similar. 4 wheels, engine, steering wheel. Yep!

In that case Phenom looks like Gallatin CPus, or Itaniums etc.
12-18-2008, 07:23 AM
gosh

Quote:

Originally Posted by Shintai

Phenom and i7 uses similar technology? yes ok..a F1 racer and lada is also similar. 4 wheels, engine, steering wheel. Yep!

I don't think I am able to explain in more simple terms, if you don't get it then I am sorry for that
12-18-2008, 10:50 AM
Donnie27

Quote:

Originally Posted by gosh

will try again here

i7 has uses very similar technology as Phenom. Anandtech applauded i7 for its design and in same article criticized AMD phenom. That was the problem (what I meant).
What's good with the L3 cache is that threads from each core is able to share memory without going to main memory, that makes it more simple for programmers to create threads without initializing each thread with their own memory pools e.t.c. Threads can share memory in a much more effective way. Phenom did select L3 cache for this and so did i7 also. If some of the processors did choose another technique that was getting same result, fine! The issue isn't specifically L3 cache and who was first. The issue is that multiple cores needs a technology so that it is possible (simpler) for the programmer to thread the application that he/she is doing and getting a boost for doing that.
It is strange that Anandtech applauds one CPU that has a good technique for good scaling and at the same time criticize another that has used similar technique and did it before i7. If Anandtech wants a reputation of beeing neutral this is very strange behavior. I don't think anyone that reads these tests etc believes that they are neutral and maybe they know that it doesn't matter also. Just feed those that need information and then they will spread the word all over the internet. What Intel has succeeded enormously in is their marketing strategy.

First, did you see the results? You know you can't keep dissing not only Anand but anyone not agreeing with what you think. L3? Intel knows how to do L3 and it is NOT like AMD's is better or anything. The first Conroe started as a Mobile Processors. Nehalem started life as a Server Processor.

Stop calling any and everyone who buys Intel suckers and dupes of Intel Marketing. That's pretty lame on you part. Intel success is easily due to great products and always some advantage over AMD. When AMD had faster processors, Intel had better platforms. When AMD got better platform support, that got Volume constrained. Then when AMD had everything in place, including deals with Dell, Intel dropped the Conroe bomb and blew it all up. Marketing alone as you're implying doesn't work for anyone=P If it did, Ford, GM and Chrysler would be more popular than Honda and Toyota who they out market 19 to 1:rolleyes:

Wes Fink who writes for Anandtech.com is one of the biggest AMD fans online LOL!
12-18-2008, 10:52 AM
savantu

Quote:

Originally Posted by gosh

will try again here

i7 has uses very similar technology as Phenom. Anandtech applauded i7 for its design and in same article criticized AMD phenom. That was the problem (what I meant).
What's good with the L3 cache is that threads from each core is able to share memory without going to main memory, that makes it more simple for programmers to create threads without initializing each thread with their own memory pools e.t.c. Threads can share memory in a much more effective way. Phenom did select L3 cache for this and so did i7 also. If some of the processors did choose another technique that was getting same result, fine! The issue isn't specifically L3 cache and who was first. The issue is that multiple cores needs a technology so that it is possible (simpler) for the programmer to thread the application that he/she is doing and getting a boost for doing that.
It is strange that Anandtech applauds one CPU that has a good technique for good scaling and at the same time criticize another that has used similar technique and did it before i7. If Anandtech wants a reputation of beeing neutral this is very strange behavior. I don't think anyone that reads these tests etc believes that they are neutral and maybe they know that it doesn't matter also. Just feed those that need information and then they will spread the word all over the internet. What Intel has succeeded enormously in is their marketing strategy.

Gosh/Duby229/Kassler - you're clueless now as you have always been , sorry to be so blunt.

The way Nehalem's and K10s L3 work is completely different , one is inclusive , the other is exclusive.Get the point ? While both a Ford Model T and a Mercedes S600 have 4 wheels , that's where the similarities end.

Nehalem scales better , you know why ? Because data from all the L1s and L2s on chip are also found in the L3.When there is a cache miss in the L1 or L2, data is searched in the L3 , if it's not there , a memory request is sent.

On K10 , when a cache miss occurs in the L1 or L2 , the L3 is searched and a request is send to the remaining cores for searching in their L1, L2s.Only after the reply comes from all cores , data is requested from RAM.

That's extra latency => imagine with multiple threads and multiple misses.Nehalem has no such problems.Intel did its homework right.

Intel was 1st with DC , AMD did it right.
Amd was 1st with single die QC , Intel did it right.
Too bad zealots cannot give credit where it's due.
12-18-2008, 11:00 AM
gosh

Quote:

Originally Posted by Donnie27

First, did you see the results?

I just explained why anand isn't a site that I trust.
I have both intel (C2D) and amd, when I am working my intel isn't running good compared to my phenoms. Maybe if you like to OC but you don't need to do that for a phenom. It is very irritating to get that uneven performance when more than one application is running (Of course there is a difference in cores but the phenom is less expensive compared to C2D). Review sites (english sites) doesn't seem to be sites that you can trust.
It's very good if this i7 brings performance, that is something that phenom users have been experience for some time now.
12-18-2008, 11:01 AM
Shadowmage

Quote:

Originally Posted by savantu

Gosh/Duby229/Kassler - you're clueless now as you have always been , sorry to be so blunt.

The way Nehalem's and K10s L3 work is completely different , one is inclusive , the other is exclusive.Get the point ? While both a Ford Model T and a Mercedes S600 have 4 wheels , that's where the similarities end.

Nehalem scales better , you know why ? Because data from all the L1s and L2s on chip are also found in the L3.When there is a cache miss in the L1 or L2, data is searched in the L3 , if it's not there , a memory request is sent.

On K10 , when a cache miss occurs in the L1 or L2 , the L3 is searched and a request is send to the remaining cores for searching in their L1, L2s.Only after the reply comes from all cores , data is requested from RAM.

That's extra latency => imagine with multiple threads and multiple misses.Nehalem has no such problems.Intel did its homework right.

Intel was 1st with DC , AMD did it right.
Amd was 1st with single die QC , Intel did it right.
Too bad zealots cannot give credit where it's due.

I don't think that you understand the pros and cons of inclusive vs. exclusive caches. Generally, exclusive caches are much more efficient than inclusive caches. The only real benefit of the inclusive cache is its implementation simplicity.

Your supposed "con" is actually a pro, since the only reason why they do the additional searching is to minimize the number of memory requests. If it was actually a con, they can easily send off a memory request in parallel with the searching.
12-18-2008, 11:50 AM
savantu

Quote:

Originally Posted by Shadowmage

I don't think that you understand the pros and cons of inclusive vs. exclusive caches. Generally, exclusive caches are much more efficient than inclusive caches. The only real benefit of the inclusive cache is its implementation simplicity.

That might have been true in the single core era when , you need not to worry about what other cores had in their cache.
Nehalem's L3 handles all the coherency traffic and acts like a snoop filter.

Quote:

Your supposed "con" is actually a pro, since the only reason why they do the additional searching is to minimize the number of memory requests. If it was actually a con, they can easily send off a memory request in parallel with the searching.

Really ? And how do you maintain the cache coherency ?
12-18-2008, 11:52 AM
Donnie27

Quote:

Originally Posted by gosh

I just explained why anand isn't a site that I trust.
I have both intel (C2D) and amd, when I am working my intel isn't running good compared to my phenoms. Maybe if you like to OC but you don't need to do that for a phenom.

OK so let me try this one more time!?

You're comparing a Dual Core Intel to a Quad Core AMD processor/s? Then you're wondering why the Dual is not running as fast. Do you think what you know about Dual cores has anything to do with with a Nehalem? No wonder you're :confused:

Quote:

It is very irritating to get that uneven performance when more than one application is running (Of course there is a difference in cores but the phenom is less expensive compared to C2D). Review sites (english sites) doesn't seem to be sites that you can trust.
It's very good if this i7 brings performance, that is something that phenom users have been experience for some time now.

If WHAT?

Anand didn't do the test/s, what does he personally have to do with HP and others running the test? Please explain? I thought you understood that Anand DIDN'T run the tests. Anand could have been Joel at Sudhian, Rahul Sood, Chris Tom and any other AMD Fan for that matter, he was just reporting an event. So you don't trust AMD's biggest partner HP?

I'm not overclocking and my little simple 3GHz Wolfdale is even more impressive as a low noise Quiet Rig:up: Overclocking is a sport, NOT a need. If you need to overclock, then maybe it's time for an Upgrade.

For the record, Phenom II is still at the top of my shopping list. Price to performance ratio fo ME! I don't have to bash AMD to buy Intel or bash Intel because I'm buying AMD, that's silly!

Nehalem is not C2D

P.S. $400 limit for parts.
12-18-2008, 11:54 AM
Shintai

Cache coherency traffic is a huge bottleneck for AMD 4 and 8 socket systems. It just drains the HT links with those requests. Even with faster HT speeds you still waste a huge thunk on it.

I guess Shadowmage is unaware of multicores and multisockets ;)

Donnie27: Forget gosh. He is on his 3rd account for a reason.
12-18-2008, 12:04 PM
Movieman

OK guys.
Lets stop with the clueless and stupid comments to the other posters huh?
If someone doesn't agree with you it doesn't mean they are clueless or stupid.
They might know more than you or less.
Time and testing of this product and the new AMD products will tell who built the better mousetrap.
Right now from what I've seen it looks like they have both done well.
Which is better? We will know soon.:up:
I guess what I'd like to add in here is don't always feel that you have to have the last word or argue ad nauseum hoping that the other guy will cave in and say that you are God come to earth.
Just doesn't happen.
So make your point as clearly as you can, support it with whatever facts you have at hand and if someone disagrees tell yourself that you've done what you can and move on.
Now personally I'd love to take one setup from AMD, one from Intel, set them up and see what does what. That is the only way we will get the truth.
12-18-2008, 12:19 PM
Donnie27

Quote:

Originally Posted by Shintai

Cache coherency traffic is a huge bottleneck for AMD 4 and 8 socket systems. It just drains the HT links with those requests. Even with faster HT speeds you still waste a huge thunk on it.

I guess Shadowmage is unaware of multicores and multisockets ;)

Donnie27: Forget gosh. He is on his 3rd account for a reason.

Yea, I know.
12-18-2008, 12:32 PM
Donnie27

Quote:

Originally Posted by Movieman

OK guys.
Lets stop with the clueless and stupid comments to the other posters huh?
If someone doesn't agree with you it doesn't mean they are clueless or stupid.
They might know more than you or less.
Time and testing of this product and the new AMD products will tell who built the better mousetrap.
Right now from what I've seen it looks like they have both done well.
Which is better? We will know soon.:up:
I guess what I'd like to add in here is don't always feel that you have to have the last word or argue ad nauseum hoping that the other guy will cave in and say that you are God come to earth.
Just doesn't happen.
So make your point as clearly as you can, support it with whatever facts you have at hand and if someone disagrees tell yourself that you've done what you can and move on.
Now personally I'd love to take one setup from AMD, one from Intel, set them up and see what does what. That is the only way we will get the truth.

You got it!
12-18-2008, 12:37 PM
Shadowmage

Quote:

Originally Posted by savantu

That might have been true in the single core era when , you need not to worry about what other cores had in their cache.
Nehalem's L3 handles all the coherency traffic and acts like a snoop filter.

Really ? And how do you maintain the cache coherency ?

The cores on the same silicon die are not connected using HT. My guess is that they have some dedicated coherency traffic buses. If what you said is correct, then AMD's processor diagram will have the L1, L2, and L3 caches all connected to some arbiter which is then connected to the HT port. Instead, note that L1 feeds into L2 which feeds into L3. In other words, like with Nehalem, the L3 handles all the coherency traffic - it just takes a little longer to process the data.

Quote:

Originally Posted by Shintai

Cache coherency traffic is a huge bottleneck for AMD 4 and 8 socket systems. It just drains the HT links with those requests. Even with faster HT speeds you still waste a huge thunk on it.

I guess Shadowmage is unaware of multicores and multisockets ;)

What's the cache coherency traffic like on Nehalem? It seems to me that AMD is interconnect-bound because they are still waiting for HT3. Intel has less of a problem because QuickPath is higher performance than AMD's current HT implementation.

Your argument is pinpointing the problem on the wrong component. The traffic caused by the exclusive caches is reasonable. It's just that AMD needs a bandwidth improvement for their intra-chip interconnects.
12-18-2008, 01:33 PM
gosh

Quote:

Originally Posted by Shintai

Donnie27: Forget gosh. He is on his 3rd account for a reason.

Wrong! its my tenth ;)
This and Kassler is the name that I use.
If you do heavy multithreading the behavior for these new processors will be very good. It's Core 2 that I think is bad, i7 = good but I don't like to pay more than I need and AMD phenom's is much more economical.

http://img219.imageshack.us/img219/5056/screenbx8.jpg
12-18-2008, 04:30 PM
Zucker2k

Quote:

Originally Posted by Movieman

OK guys.
Lets stop with the clueless and stupid comments to the other posters huh?
If someone doesn't agree with you it doesn't mean they are clueless or stupid.
They might know more than you or less.
Time and testing of this product and the new AMD products will tell who built the better mousetrap.
Right now from what I've seen it looks like they have both done well.
Which is better? We will know soon.:up:
I guess what I'd like to add in here is don't always feel that you have to have the last word or argue ad nauseum hoping that the other guy will cave in and say that you are God come to earth.
Just doesn't happen.
So make your point as clearly as you can, support it with whatever facts you have at hand and if someone disagrees tell yourself that you've done what you can and move on.
Now personally I'd love to take one setup from AMD, one from Intel, set them up and see what does what. That is the only way we will get the truth.

Very well said, Dave. :clap: Personally, I move for the XS Multi-Platform Challenge. Let the hardware do the talking for a change. Any takers?

Show 100 post(s) from this thread on one page

All times are GMT -8. The time now is 04:55 PM.

XtremeSystems