PDA

View Full Version : Clovertown FSB Bump To Match Barcelona



mascaras
02-07-2007, 06:42 AM
Clovertown FSB Bump To Match Barcelona


We heard Intel might consider bumping the quad core Clovertown FSB to 1600MHz from the current 1333MHz to match up with the AMD quad core Barcelona. Another strategy Intel is considering is to push forward 45nm Hapertown schedule to this year instead. Hapertown will have 2x6MB of L2 cache and 1333FSB. Intel is keeping a close watch on AMD Barcelona to plan their next step.

http://www.vr-zone.com/index.php?i=4607



cheers

Mr. Popo
02-07-2007, 06:53 AM
Xtreme Competition, Xtreme Hardware. :rocker:

Omastar
02-07-2007, 07:09 AM
YEAH THAT ^


1600FSB on Intel's FSB architecture is more that I ever thought I would see.

Well it's pretty doable on Core architecture.

MAS
02-07-2007, 07:14 AM
intel's 1600 and amd's 5200 - the huge difference, it won't help intel

xlink
02-07-2007, 07:15 AM
I'm running 1600FSB on my current setup with room to expand.

I've got room until about 1660ish(on stock voltage) so there's a small margin of headroom

Movieman
02-07-2007, 07:25 AM
When the guys at 2cpu were working out the 1333 mod for the clovers that we're using now, they came across a "unused" VID contact point.
This may be for 1600FSB but when tried it bluescreened.

NiCKE^
02-07-2007, 07:32 AM
intel's 1600 and amd's 5200 - the huge difference, it won't help intel
And you know this how?

gOJDO
02-07-2007, 08:26 AM
@MAS
The difference between AMD HTT and Intel FSB is that FSB is a parallel connection, while HTT is a point-to-point connection.

Intel's FSB is 64bit width, AMD's cHT links are 16bit width bi-directional. The first MP Barcelona CPUs will use HTT1 which is 2000MHz, not 5200MHz. So, if we apply the math:
1600MHz * 64bit = 12.8GB/s total bandwidth
2000MHz * 16bit = 4GB/s per direction or 8GB/s total bandwidth

But there are also other factors that are affecting the effectiveness of the CPU connections: ODMC(on-die memory controller), shared caches, large cache capacity, CPU prefetchers, etc.

nn_step
02-07-2007, 09:02 AM
sure that really competes against over 17GB of bandwidth per proc that AMD has

Movieman
02-07-2007, 09:14 AM
I ran some tests here on the clovertown rig and there's no doubt that in the bandwidth area AMD owns Intel.
The memory benchmark using sandra 7 was app 7200/7800..
A long way from what the AMD's are showing.

Warship
02-07-2007, 09:47 AM
sure that really competes against over 17GB of bandwidth per proc that AMD has

GB = GigaByte
Gb = GigaBit

nn_step
02-07-2007, 09:53 AM
GB = GigaByte
Gb = GigaBit
I am familiar with that, hence I specifically used GB
http://img360.imageshack.us/img360/8470/sandra500py7.jpg

gOJDO
02-07-2007, 10:12 AM
@nn_step
Stop spreading miss-informations!

1. That is an OC-ed Quad FX system:
- HTT OC-ed by 25%
- CPU OC-ed by 15,4%
- ODMC OC-ed by 25%
2. It is not per processor, but total system memory bandwidth, with NUMA enabled

BTW, total system memory bandwidth IS NOT the bandwidth of the cHT links connecting the CPUs.

FghtinIrshNvrDi
02-07-2007, 10:13 AM
wow, I didn't think FSB would ever get to that point... I always assumed they'd have an IMC before they beefed it up to those speeds.

Ryan

gOJDO
02-07-2007, 10:23 AM
miss informations?
It gives a good indication to my opinion.
Missinformations indeed!
The cHT link is 2000MHz, with 8GB/s total theoretical bandwidth, not 2500MHz with 10GB/s
The ODMC is 800MHz with 12.8GB/s total theoretical bandwidth, not 1000MHz with 16GB/s.

The 17GB/s is the total system bandwidth that only one CPU can have, when NUMA is enabled, not the PER PROCESSOR memory bandwidth!

vitaminc
02-07-2007, 10:35 AM
Won't matter for single or dual socket systems.

It would help a lot for quad or more socket systems, but AMD still reigns at that space with K8.

Revv23
02-07-2007, 10:49 AM
i feel like FSB is just marketing...

i mean we have 500fsb on overclocked intel already but show me where that is faster.

IMO this doesnt mean much.

I mean either way all this tells us is that intel doesnt know what to expect with barcelona, and that they are ready to respond, like a good company should be.

[XC] Lead Head
02-07-2007, 11:56 AM
@MAS
The difference between AMD HTT and Intel FSB is that FSB is a parallel connection, while HTT is a point-to-point connection.

Intel's FSB is 64bit width, AMD's cHT links are 16bit width bi-directional. The first MP Barcelona CPUs will use HTT1 which is 2000MHz, not 5200MHz. So, if we apply the math:
1600MHz * 64bit = 12.8GB/s total bandwidth
2000MHz * 16bit = 4GB/s per direction or 8GB/s total bandwidth

But there are also other factors that are affecting the effectiveness of the CPU connections: ODMC(on-die memory controller), shared caches, large cache capacity, CPU prefetchers, etc.

HTT1 was used on Socket 754 CPUs, HTT2 is used on Socket 939, 940 , AM2, and 1207. You are already wrong there with saying "Barcelona will be using HTT1". Now, I find it funny you compare Intel's top end FUTURE server FSB, with AMD's Current consumer level HTT. You should also compare AMD's current, AND future top end.

Lets break this down.

HTT1--Socket 754--800(1600MHz)--Max 6.4GB/s

HTT2--Socket 939/940/AM2/1207--1000Mhz(2000) For consumer level--8.00GB/s. Server level uses 1400 (2800) MHz -- 11.2GB/s x3 for a max of 33.6 GB/s for the Opterons.

HTT3--Socket AM3/1207 (more?)--2600 (5200) MHz--20.80 GB/s x3 for max of 62.4GB/s of bandwith

gOJDO
02-07-2007, 01:06 PM
HTT1 was used on Socket 754 CPUs, HTT2 is used on Socket 939, 940 , AM2, and 1207. You are already wrong there with saying "Barcelona will be using HTT1". Right, it is HTT2.:)

Now, I find it funny you compare Intel's top end FUTURE server FSB, with AMD's Current consumer level HTT. You should also compare AMD's current, AND future top end.The topic is about Clovertown vs first releases of Barcelona. That's why I was comparing a 1600MHz FSB vs 2000MHz HTT.


Lets break this down.

HTT1--Socket 754--800(1600MHz)--Max 6.4GB/s

HTT2--Socket 939/940/AM2/1207--1000Mhz(2000) For consumer level--8.00GB/s.Right

Server level uses 1400 (2800) MHz -- 11.2GB/s x3 for a max of 33.6 GB/s for the Opterons.No. Servers use 1000MHz(2000) - 8GB/s for max of 24GB/s for the Opterons.

HyperTransport™ technology provides a scalable bandwidth interconnect between processors, I/O subsystems, and other chipsets, with up to three coherent HyperTransport technology links providing up to 24.0 GB/s peak bandwidth per processor
http://www.amd.com/gb-uk/Processors/ProductInformation/0,,30_118_8796_14281,00.html


HTT3--Socket AM3/1207 (more?)--2600 (5200) MHz--20.80 GB/s x3 for max of 62.4GB/s of bandwith
s1207 work as 2000MHz HTT2, although the CPU might support HTT3 or higher. s1207+ mainboards will support HTT3.
Anyway the first release of MP Barcelona(2xxx and 8xxx) CPUs will be with 1000MHz(2000) HTT2.
http://images.dailytech.com/nimage/3796_large_Opty-list.png
http://www.dailytech.com/AMD+Quadcore+Opteron+Models+Unveiled/article5992.htm

grimREEFER
02-07-2007, 01:13 PM
lotsa bandwith means nothing if the rest of the processor is crap...not saying it will be , but that kinda applies to current amd products.

nn_step
02-07-2007, 01:14 PM
lotsa bandwith means nothing if the rest of the processor is crap...not saying it will be , but that kinda applies to current amd products.
excuse me as I consider how stupid the comment you just said is

[XC] Lead Head
02-07-2007, 01:22 PM
lotsa bandwith means nothing if the rest of the processor is crap...not saying it will be , but that kinda applies to current amd products.

Even compared to C2D they are not crap

Movieman
02-07-2007, 01:31 PM
Can we not get into another of the 37billion comments of the benefits of one company versus another companies products?
How many times does it have to be said that it's the competition between these two giants that is good for all of us.
I'm basically an Intel guy but I'd love to have a lot of the systems that AMD has made.
Memory bandwidth, no question, AMD OWNS Intel and thats just a fact.
They just did it better and why argue over it.
I've got about as fast a server platform as exists from Intel today and testing in sisoft sandra 7 memory benchmark shows app 7200/7800 and AMD just blows that away.
Now as to computational power these 8 cores are just about the same as a 16 core Opty 880 rig.
There's the answer, they both have plusses and minuses..
Thank both companies for if there was just one and no competition between them we'd be using machines comparable to P3-1000's and paying $5000.00 each for them..Think on that a minute please.

Flak Monkey
02-07-2007, 01:36 PM
This gets me excited for Barcelona. If Intel is worried it might actually be something worth getting.

gOJDO
02-07-2007, 01:42 PM
K8 has more then sufficient bandwidth, but it is not a crap. It was the best desktop CPU for 2.5 years before C2D, and it is still the best for 4P/8P x86 servers. I am sure that K8L will use the sufficient bandwidth better than K8.

rozzyroz
02-07-2007, 01:44 PM
off topic, but back on the original topic, i thought that intel was releasing a chipset that had 2 fsb's? has that idea been scrapped? or is this going to be a 1600fsb x 2?

gOJDO
02-07-2007, 01:48 PM
It will be 2x1600MHz FSB, if the rumor is true. Meanwhile Intel is preparing a quad FSB chipset with snoop filter for Penryn, which will be able to cohere 8 6MBL2 caches(16 cores, 4 MCM quadcore CPUs)

Nedjo
02-07-2007, 02:02 PM
It's ridicules to compare Intel's FSB architecture with AMD HTT+ODMC.

Each Core in Core 2 Duo/Quad or Xeon CPU is sharing that same FSB for intercommunication, and communication to the mem. ctrl inside of northbridge!

It takes some serious frequency, much beyond 1600 MHz to match AMD Direct Connect architectonic, and ODMC.

So until Intel doesn't come on market with CSI, there's no point of comparing Inte's Quad Pumped Bus with Hyper Transport Link

Nedjo
02-07-2007, 02:05 PM
IMeanwhile Intel is preparing a quad FSB chipset with snoop filter for Penryn, which will be able to cohere 8 6MBL2 caches(16 cores, 4 MCM quadcore CPUs)
Link molim te (please)!

gOJDO
02-07-2007, 02:53 PM
Link molim te (please)!Naravno ;)
Intel Clarksboro chipset for Caneland platform:
http://techreport.com/etc/2006q4/fall-idf/server-roadmap.gif

We know a little bit more about the plans for the higher-end Xeon MP. The "Caneland" platform will replace today's quad-socket, dual-bus arrangement with a four-socket layout that has point-to-point connections to each of the four sockets. Caneland's Clarksboro chipset will also feature a bus snoop filter with a 64MB cache, intended to cut the bandwidth needed on the bus. One of the processors that will drop into Caneland's four sockets will be the chip code-named "Tigerton," a Core microarchitecture-based CPU that will follow in the mold of the Xeon 7100 series. Tigerton will have quad cores and, one can infer, a large amount of L3 cache onboard like the Xeon 7100.
http://techreport.com/etc/2006q4/fall-idf/index.x?pg=1

Intel quadcore, quad FSB, 16 cores system demonstration in October 2006 video (http://blip.tv/file/get/Vnunet-intel4wayquad505.flv)

gOJDO
02-07-2007, 03:15 PM
It is a very complicated and very expensive northbridge chipset:
http://www.techreport.com/etc/2006q4/tigerton/chipset.jpg
It is handling the 4 FSB's and has 64MB cache, which is more than enough to cohere 4 Penryn quadcore CPUs with 12MB L2 per CPU.
The mainboard for such monster of a chipset will be very complicated and very expencive also.

doompc
02-07-2007, 03:30 PM
Intel 5000X chipset has 2 FSBs and quad-channel FB-DIMM memory controller.
With 1333FSB and 667MHz FBDIMMs the total memory bandwidth is about 6.6GB/s with Dual Core processors.

With 1600FSB this might get to about 8GB/s but memory latency will still be very higher than on AMDs.

But when you add more processors (dual core to quad core) to the system, per cpu memory bandwidth get lowered on Intel systems, thats what we see on gamepc's Clovertown review:
http://www.gamepc.com/labs/view_content.asp?id=x5355&page=5

However, AMD systems that is already not bandwidth limited get more memory bandwidth as we add more processors to the system (2p to 4p) due to NUMA.

But even if that 40% advantage of Barcelona over Clovertown is true, it means that Intel needs a 3.3GHz Clovertown to beat a 2.4GHz Barcelona, wich is not difficult to do, specialy with 45nm parts.

I think i will be a close race beetwin them.

xlink
02-07-2007, 03:45 PM
@MAS
The difference between AMD HTT and Intel FSB is that FSB is a parallel connection, while HTT is a point-to-point connection.

Intel's FSB is 64bit width, AMD's cHT links are 16bit width bi-directional. The first MP Barcelona CPUs will use HTT1 which is 2000MHz, not 5200MHz. So, if we apply the math:
1600MHz * 64bit = 12.8GB/s total bandwidth
2000MHz * 16bit = 4GB/s per direction or 8GB/s total bandwidth

But there are also other factors that are affecting the effectiveness of the CPU connections: ODMC(on-die memory controller), shared caches, large cache capacity, CPU prefetchers, etc.

remember it's point-to-point on the AMD side.
that means there are multiple 4/8GB buses

AMD's always had the bandwidth advantage on the bus system(as of K8) Intel is certainly catching up and in some scenarios wins but in the real world, this is not the case.

if only AMD had better chips... like k10

celemine1Gig
02-07-2007, 03:47 PM
Intel seems to have had the 400MHz/1600MHz(quadpumped) up their sleeve for quite some time now. You only have to take a look at clock generator offerings for for example i955x boards and you'll notice that they have official support for 400MHz (although the boards don't yet).
So, no real surprise that they now think about actually using the 400Mhz FSB soon.

LyP0
02-07-2007, 03:48 PM
And if that happens we will have the same situation as we had with the p4 and athlon xp and the p4 ( prescott > ) and the amd 64. Intel tries to get the higer frequency's and amd tries to do more with every clockpulse. And than the race for ghz will continue cause i dont see intel or amd lauching a octacore in the near future.

saaya
02-07-2007, 03:55 PM
We heard Intel might consider bumping the quad core Clovertown FSB to 1600MHz from the current 1333MHz to match up with the AMD quad core Barcelona. Another strategy Intel is considering is to push forward 45nm Hapertown schedule to this year instead. Hapertown will have 2x6MB of L2 cache and 1333FSB. Intel is keeping a close watch on AMD Barcelona to plan their next step.

i thought the fsb doesnt have a big impact on kentsfield performance?
push ahead 45nm chips to 2007? :confused: but they were 2007 and they just pushed it back to 2008 a few weeks ago :confused2

is this real news or more a blog of what some guy speculates? :eh:

Ironmon1
02-07-2007, 04:25 PM
It's ridicules to compare Intel's FSB architecture with AMD HTT+ODMC.

Each Core in Core 2 Duo/Quad or Xeon CPU is sharing that same FSB for intercommunication, and communication to the mem. ctrl inside of northbridge!

It takes some serious frequency, much beyond 1600 MHz to match AMD Direct Connect architectonic, and ODMC.

So until Intel doesn't come on market with CSI, there's no point of comparing Inte's Quad Pumped Bus with Hyper Transport Link
first thing first: core 2 duo cores intercommunicate over the L2 cache. also, it has been shown that the FSB isn't the limiting factor in intel CPUs. infactr, although it is possible to saturate the FSB, it almost never happens.

xlink
02-07-2007, 04:32 PM
first thing first: core 2 duo cores intercommunicate over the L2 cache. also, it has been shown that the FSB isn't the limiting factor in intel CPUs. infactr, although it is possible to saturate the FSB, it almost never happens.
FSB/memory bandwidth is the limiting factor when you have 8+cores in a server system.

on a desktop system it's not an issue, but when you've got 2-4times the core count it's a different story.

LyP0
02-07-2007, 04:34 PM
pushed back ? hmm i hope 45nm will arive this year, sounds like i nice upgrade form my current sempy 2800+. Kentsfield has 4 core's to feed so i woulden't be supprised if a higher fsb wouldent help.
btw welcome back saaya

gOJDO
02-07-2007, 04:44 PM
i thought the fsb doesnt have a big impact on kentsfield performance?It is somehow different for servers.
1. servers and desktop computers are running different kinds of software and are being exploited by different number of users/clients.
2. servers have much more RAM and are handling much more I/O & RAM operations
3. the speed of the FSB on the servers is very important because it affects the communication beteween the dies of the MCM CPUs, especially when a die from one CPU is communicating with a die of another

zir_blazer
02-07-2007, 04:54 PM
HTT1 was used on Socket 754 CPUs, HTT2 is used on Socket 939, 940 , AM2, and 1207. You are already wrong there with saying "Barcelona will be using HTT1". Now, I find it funny you compare Intel's top end FUTURE server FSB, with AMD's Current consumer level HTT. You should also compare AMD's current, AND future top end.

Lets break this down.

HTT1--Socket 754--800(1600MHz)--Max 6.4GB/s

HTT2--Socket 939/940/AM2/1207--1000Mhz(2000) For consumer level--8.00GB/s. Server level uses 1400 (2800) MHz -- 11.2GB/s x3 for a max of 33.6 GB/s for the Opterons.

HTT3--Socket AM3/1207 (more?)--2600 (5200) MHz--20.80 GB/s x3 for max of 62.4GB/s of bandwith
Socket 939 Newcastles and Clawhammers used Hyper Transport 1.1. From Winchester onwards, they use Hyper Transport 2 that works at 1.4 GHz Bilinear. There was a press release on Hyper Transport site about "The new 90nm A64 3000+ and 3200+ that uses HT 2.0" some years ago, but I wasn't able to find it elsewhere. Notice also that many Motherboards with nForce 4 and other modern Chipsets seems to support a 7x Hyper Transport Multiplier for archiving 1.4 GHz without touching the Base Clock, so I suppose that they DO support HTT 2.0 unofficially.

Cuthalu
02-07-2007, 05:01 PM
Intel seems to have had the 400MHz/1600MHz(quadpumped) up their sleeve for quite some time now. You only have to take a look at clock generator offerings for for example i955x boards and you'll notice that they have official support for 400MHz (although the boards don't yet).
So, no real surprise that they now think about actually using the 400Mhz FSB soon.

Indeed. I've used 965 with ~1750MHz at full system stress withouth problems 24/7. I would be surprised if Intel couldn't do well beyond 2GHz when those 45nm processors come out. :)

BrownTown
02-07-2007, 07:37 PM
I don't think FSB speed has anything to do with the process used in the CPU.

[cTx]Philosophy
02-07-2007, 07:43 PM
I just wanna see AMD VS Intels Quad, see who is bringing home the bacon per'se

Cuthalu
02-08-2007, 12:19 AM
I don't think FSB speed has anything to do with the process used in the CPU.

Yes, but logically as time passes by, they have time to optimise or develop new ways to achieve a lot higher FSB.

Nanometer
02-08-2007, 12:34 AM
excuse me as I consider how stupid the comment you just said is

But at the same time we know how much AMD's insane bandwidth is doing for them... Absolutely nothing. AMD has insane amounds of bandwidth, but who cares if they do if that doesn't translate into real world performance.

[XC] Lead Head
02-08-2007, 03:50 AM
Which account for the TINIEST FRACTION of sales.

If they account for the tiniest fraction of sales, why do Tyan, Supermicro, etc..bother releasing lots of 4-8P Opteron boards?

Motiv
02-08-2007, 04:02 AM
again? I never called you names, it was you who was insulting
what names - thats the truth - you did admit
and it shows, you crap over every good thing said about AMD in this forum with no exceptions

Simply put him on ignore. Wind up merchants rely on people being wound up, so if you believe he is here for no more than that, then place those people on ignore.

gOJDO
02-08-2007, 06:08 AM
that's why they also stil perform right on par with core2duo in gaming :)
Oh...and even PentiumD is on par with both.:rolleyes:

doompc
02-08-2007, 06:56 AM
Objection!

Movieman
02-08-2007, 07:08 AM
Guys! JEEZUS! Will you people please relax a little.
................................. PLEASE!
I honestly can't see why people get so damned upset over this.
AMD's BW is better, That's a fact and I'm an Intel guy saying it.
Facts are facts so accept it and get beyond it.
Intels computational power I honestly beleive is superior but that could change with AMD's next release.
Let them battle is out and you and I are the winners.
It isn't something worth getting upset at or fighting with the people here over.

gOJDO
02-08-2007, 07:37 AM
I agree. I can't notice the difference in performance in MS SpiderSolitare between my AthlonXP 1.47GHz and my C2D @3.2Ghz:D

Yoxxy
02-08-2007, 08:15 AM
No one seemed to comment that version of Sisoft that nn_step showed is open to a pretty easy bug. Just keep rerunning the app and it will keep increasing your bandwidth because of cache. It is pretty easy to score 15-20 gig/sec on Intel as well :).

xlink
02-08-2007, 09:05 AM
I don't think FSB speed has anything to do with the process used in the CPU.
it certainely does.

if intel moves onto the 45nm process on their CPUs, they have free 65nm fabs for their chipsets.

The Ghost
02-08-2007, 09:59 AM
it certainely does.

if intel moves onto the 45nm process on their CPUs, they have free 65nm fabs for their chipsets.
i don't think that intel is just going to stop it's 65nm fabs when the 45nm cpu's come out , like any fab it takes a while for a company to ramp up a new fab

intel may have a free 65nm fab by the middle of 2008

the way amd is going , they will never have a free fab

Lithan
02-08-2007, 09:59 AM
Any more personal attacks and this thread is getting locked. Any more personal attacks from the same people, and I'm asking for some vacation time for them. It matters not how cleverly you think you word these. That is all.

s7e9h3n
02-08-2007, 12:07 PM
No one seemed to comment that version of Sisoft that nn_step showed is open to a pretty easy bug....
Maybe cause it's not a bug? :fact:

http://img179.imageshack.us/img179/2549/511memser8.jpg

Yoxxy
02-08-2007, 12:17 PM
I guess until Everest is shown it still seems as if it is prefetch. Hard to believe any Sandra shots these days, as you can get 20,000 meg/sec on Intel as well.

I don't doubt that those are real scores it is just sandra algorithm is bugged now days, try everest.

EDIT: Also didn't see it is 4 gigs, I guess I am unfamiliar with how 2x processors and 4 dimms would do. Apparently you have shown it does quite well though, but would still like to see everest.

NiCKE^
02-08-2007, 12:18 PM
Lavalys makes Everest ;)

s7e9h3n
02-08-2007, 01:24 PM
I guess until Everest is shown it still seems as if it is prefetch. Hard to believe any Sandra shots these days, as you can get 20,000 meg/sec on Intel as well.

I don't doubt that those are real scores it is just sandra algorithm is bugged now days, try everest.

EDIT: Also didn't see it is 4 gigs, I guess I am unfamiliar with how 2x processors and 4 dimms would do. Apparently you have shown it does quite well though, but would still like to see everest.
Everest doesn't recognize NUMA, or my chipset for that matter :p:

doompc
02-08-2007, 02:36 PM
s7e9h3n, can you test the memory latency for the CPU1 trying to get some data from the memory on CPU2's memory controller ?

s7e9h3n
02-08-2007, 03:03 PM
s7e9h3n, can you test the memory latency for the CPU1 trying to get some data from the memory on CPU2's memory controller ?
What benchmark will give this info?

nn_step
02-08-2007, 03:14 PM
s7e9h3n, can you test the memory latency for the CPU1 trying to get some data from the memory on CPU2's memory controller ?
I'll save him the trouble, that latency is exactly the same as if he was running Socket AM2 with the exact same speed and timings. In other words pretty good

s7e9h3n
02-08-2007, 04:45 PM
I'll save him the trouble, that latency is exactly the same as if he was running Socket AM2 with the exact same speed and timings. In other words pretty good
Actually, it's not....if NUMA is active and node interleaving is disabled, CPU1 (Physical) will have to access memory THROUGH CPU2's (Physical) memory controller. Therefore, an increase in latency will occur. In a single (Physical) cpu setup such as AM2, memory is directly accessed by the cpu(s). I'm still trying to comprehend this SMP stuff, so I may be wrong here......

nn_step
02-08-2007, 04:51 PM
Actually, it's not....if NUMA is active and node interleaving is disabled, CPU1 (Physical) will have to access memory THROUGH CPU2's (Physical) memory controller. Therefore, an increase in latency will occur. In a single (Physical) cpu setup such as AM2, memory is directly accessed by the cpu(s). I'm still trying to comprehend this SMP stuff, so I may be wrong here......
well you are partially correct. If one processor needs to access another Processor's memory, there is a penalty involved (which grows as the number of Processors increase) However NUMA is an implemenation that localizes Data in memory closest (aka Fastest) for the CPU that is doing work with that Data.
Thus (in theory) the access latency should be identical to that of a single processor, when reading data from its own Memory but greater when accessing the memory of another processor.

doompc
02-08-2007, 05:30 PM
That's exactly what I want to know, how big is that penalty?

It would take at least:
1 HTT cicle to ask other processor attention
1 to send the command
4 to send the address (HTT is 16 bit wide, memory address is 64 bit long)
+ the time the other cpu would take to get that data (in memory or cache)
1 HTT cicle to ask CPU1's attention
1 cicle to send the data (if it is just 1 byte)

Since HTT works at 2GT/s, 8 cicles mean 4ns over the memory latency.

Question, do the CPUs "ask each other" if they have some data in cache if there is a cache miss?
It would be better to ask data that other cpu has in cache than to access the memory...

accord99
02-09-2007, 01:33 AM
That's exactly what I want to know, how big is that penalty?

It would take at least:
1 HTT cicle to ask other processor attention
1 to send the command
4 to send the address (HTT is 16 bit wide, memory address is 64 bit long)
+ the time the other cpu would take to get that data (in memory or cache)
1 HTT cicle to ask CPU1's attention
1 cicle to send the data (if it is just 1 byte)

Since HTT works at 2GT/s, 8 cicles mean 4ns over the memory latency.

Techreport's latency test suggests around a 10ns increase in memory latency with a second socket.

http://www.techreport.com/reviews/2006q4/quad-fx/index.x?pg=5



Question, do the CPUs "ask each other" if they have some data in cache if there is a cache miss?
It would be better to ask data that other cpu has in cache than to access the memory...
Opterons have to ask every other socket first before accessing local memory.

doompc
02-09-2007, 05:16 AM
Thank you accord99.

nn_step
02-09-2007, 08:02 AM
this should help you understand it better
http://www.realworldtech.com/page.cfm?ArticleID=RWT121106171654

saaya
02-09-2007, 03:32 PM
It is somehow different for servers.
1. servers and desktop computers are running different kinds of software and are being exploited by different number of users/clients.
2. servers have much more RAM and are handling much more I/O & RAM operations
3. the speed of the FSB on the servers is very important because it affects the communication beteween the dies of the MCM CPUs, especially when a die from one CPU is communicating with a die of another

i thought this was about desktop chips, you think they will bump the fsb of servers to 1600? i dont think so, they are usually very carefull with that since servers need to be pretty stable.

gOJDO
02-09-2007, 06:25 PM
@Saaya
I really don't think that Intel will introduce FSB 1600, but it still is an option. Anyway, faster FSB will improve the performance scalability of MP C2 servers.

nn_step
02-09-2007, 06:28 PM
@Saaya
I really don't think that Intel will introduce FSB 1600, but it still is an option. Anyway, faster FSB will improve the performance scalability of MP C2D servers.
not really considering Intel has yet to bring Conroe and kentsfield to the 4+P world, they are still competing against K8 with Prescott until Tigerton.