K10 Scores starting to surface

**Thesavage** · 09-01-2007, 11:30 AM

Also has anybody noticed how small the page file was during these benchmarks?
http://forum.coolaler.com/showthread...161127&page=12

**Movieman** · 09-01-2007, 11:31 AM

Almost puts on Mod hat but decides not to..yet:
I can see this heading downhill in a hurry..

**Zytek_Fan** · 09-01-2007, 12:32 PM

Originally Posted by Movieman

Almost puts on Mod hat but decides not to..yet:
I can see this heading downhill in a hurry..

All the K10 threads go downhill in a hurry

**informal** · 09-01-2007, 12:50 PM

Originally Posted by mstp2009

Yeah, AMD isn't a souce.

First it's not a souce for sure

.
Second AMD never ever mentioned the L3 cache latency in your linked document,so no,they did not say what the value is in that document.
Like i said,no source on your side that disputes my link.

**mstp2009** · 09-01-2007, 12:57 PM

Originally Posted by informal

First it's not a souce for sure

.
Second AMD never ever mentioned the L3 cache latency in your linked document,so no,they did not say what the value is in that document.
Like i said,no source on your side that disputes my link.

And your link could be written by a 12 yo expressing his OPINION.

MY POINT was that AMD themselves have NOT specified that latency is variable.

Since variable latency would be a significant deviation from current CPU design specs (with questionable if any benefit), the assumption should be made that the L3 cache is a fixed latency (i.e. number of cycles).

But in AMD Fanboi land, anything is possible.

How's that REVERSE Hyperthreading coming along?

**~~BeardyMan~~** · 09-01-2007, 12:59 PM

Originally Posted by informal

First it's not a souce for sure

.
Second AMD never ever mentioned the L3 cache latency in your linked document,so no,they did not say what the value is in that document.
Like i said,no source on your side that disputes my link.

Is it me or did you missed an essential part of his sentence?

"We can see what AMD might have done in K10's L3 cache."

he doesn't claim anything...

**informal** · 09-01-2007, 01:08 PM

Originally Posted by mstp2009

And your link could be written by a 12 yo expressing his OPINION.

MY POINT was that AMD themselves have NOT specified that latency is variable.

Since variable latency would be a significant deviation from current CPU design specs (with questionable if any benefit), the assumption should be made that the L3 cache is a fixed latency (i.e. number of cycles).

But in AMD Fanboi land, anything is possible.

How's that REVERSE Hyperthreading coming along?

So the techarp quote is not good enough??You still think it was 12 yo kid who wrote it(and not the 12 yo who is disagreeing with it)??

Let's see,would one man named David Kanter,a little older than the supposed techarp kid of 12 years,be a good enough source for you??

Start rolling on that floor and laughing at yourself,as of now:

http://www.realworldtech.com/page.cf...1607033728&p=7

Originally Posted by David Kanter

A round-robin algorithm is used to give access to one of the four cores each cycle. The latency to the L3 cache has not been disclosed, but it depends on the relative northbridge and core frequencies – for reasons which we will see later.

I made it in red in case you can't see it from that big avatar...

In gamers terms:pwnd.

Edit:
even more pwng coming along:

From a circuit level perspective, the changes between the K8 and Barcelona were extremely significant. Barcelona is specified to operate at a wide range of voltages, from 0.8-1.4V. However, unlike its predecessor, each core in Barcelona has a dedicated clock distribution system (including PLL) and power grid. The frequency for each core is independent of both the other cores, and the various non-core regions; the voltage for all four cores is shared, but separate from the non-core. As a result, power can be aggressively managed by lowering frequency and voltage whenever possible. To support independent clocking and modular design, asynchronous dynamic FIFO buffers are used to communicate between different cores and the northbridge/L3 cache. These FIFOs absorb any global skew or clock rate variation, but the latency for passing through depends on the skew and frequency variance – which is why the L3 cache latency is variable. The northbridge and L3 cache compose roughly 20% of the die and share a voltage and clock domain that is independent of the four cores, which is essential for mobile applications. Previously, the northbridge clock and voltage was tied to the processors, so systems with integrated graphics could not reduce the processor voltage or frequency to deep power saving states. Separate sleep states, voltages and frequencies for the northbridge and processors should lower AMD’s average power dissipation which will help in the mobile market.

**informal** · 09-01-2007, 01:09 PM

Originally Posted by BeardyMan

Is it me or did you missed an essential part of his sentence?

"We can see what AMD might have done in K10's L3 cache."

he doesn't claim anything...

I think you quoted the wrong man,since i did gave the link for that patent...

**mstp2009** · 09-01-2007, 01:12 PM

Originally Posted by informal

So the techarp quote is not good enough??You still think it was 12 yo kid who wrote it(and not the 12 yo who is disagreeing with it)??

Let's see,would one man named David Kanter,a little older than the supposed techarp kid of 12 years,be a good enough source for you??

Start rolling on that floor and laughing at yourself,as of now:

http://www.realworldtech.com/page.cf...1607033728&p=7

I made it in red in case you can't see it from that big avatar...

In gamers terms:pwnd.

WTF? That's not pwnage you n00b.

It said "The latency to the L3 cache has not been disclosed . . .".

"but it depends on the relative northbridge and core frequencies" simply means THEY DON'T KNOW. Not that it is variable.

Go back and pass your GMAT and then we will talk. Reading comprehension seems to be a problem for you.

We're still waiting on the glorious Reverse Hyperthreading, BTW.

**informal** · 09-01-2007, 01:17 PM

Originally Posted by mstp2009

WTF? That's not pwnage you n00b.

It said "The latency to the L3 cache has not been disclosed . . .".

"but it depends on the relative northbridge and core frequencies" simply means THEY DON'T KNOW. Not that it is variable.

Go back and pass your GMAT and then we will talk. Reading comprehension seems to be a problem for you.

We're still waiting on the glorious Reverse Hyperthreading, BTW.

LMAO,you really are cluless,aren't you.And having a nerve to call me a "noob".Sure thing.
Go ahead genius,read the next page:

From a circuit level perspective, the changes between the K8 and Barcelona were extremely significant. Barcelona is specified to operate at a wide range of voltages, from 0.8-1.4V. However, unlike its predecessor, each core in Barcelona has a dedicated clock distribution system (including PLL) and power grid. The frequency for each core is independent of both the other cores, and the various non-core regions; the voltage for all four cores is shared, but separate from the non-core. As a result, power can be aggressively managed by lowering frequency and voltage whenever possible. To support independent clocking and modular design, asynchronous dynamic FIFO buffers are used to communicate between different cores and the northbridge/L3 cache. These FIFOs absorb any global skew or clock rate variation, but the latency for passing through depends on the skew and frequency variance – which is why the L3 cache latency is variable. The northbridge and L3 cache compose roughly 20% of the die and share a voltage and clock domain that is independent of the four cores, which is essential for mobile applications. Previously, the northbridge clock and voltage was tied to the processors, so systems with integrated graphics could not reduce the processor voltage or frequency to deep power saving states. Separate sleep states, voltages and frequencies for the northbridge and processors should lower AMD’s average power dissipation which will help in the mobile market.

Pwnd!

**mstp2009** · 09-01-2007, 01:20 PM

Originally Posted by informal

And having a nerve to call me a "noob".Sure thing.

If the shoe fits . . .

You know the one thing you accomplished here - to really show off your interpersonal skills.

You are almost as annoying as a Mac user. Almost.

Reminds me of an argument I saw you have last year about the SSE units of K10.

You swore up and down that they were double in number and not bit size (64-128). You got pwned there and still would never admit you were wrong.

**informal** · 09-01-2007, 01:24 PM

Originally Posted by mstp2009

If the shoe fits . . .

You know the one thing you accomplished here - to really show off your interpersonal skills.

You are almost as annoying as a Mac user. Almost.

And you still don't have the guts to accept the fact that you were wrong!And when you can't counter any more you start using name calling ..Really shows a lot about your character,doesn't it?

Talk about annoying,when i argue with a man that accept no facts unless they fit in his prearranged view.

PS Aren't you a guy who got banned at AMDzone for heavy trolling every thread?Was there another forum[H]?
PPS Did you gave a proof for 3Ghz 25W Penryns for laptops?Oh wait,it is the same "God" analogy coming again,no?

**onewingedangel** · 09-01-2007, 01:27 PM

Why can't everyone just wait the week or two left? Then a lot of arguments can be settled...

**~~BeardyMan~~** · 09-01-2007, 01:29 PM

Originally Posted by onewingedangel

Why can't everyone just wait the week or two left? Then a lot of arguments can be settled...

Indeed, many are selling the skin of the bear before it's even nailed down by a nice winchester

**JumpingJack** · 09-01-2007, 01:31 PM

Originally Posted by informal

LMAO,you really are cluless,aren't you.And having a nerve to call me a "noob".Sure thing.
Go ahead genius,read the next page:
Pwnd!

Informal, becareful and read Kanter's note carefully, this is beginning to focus up .... it was a good link... but you may be misinterpreting what Kanter is saying....

I am, myself, trying to understand at a detail that makes sense, this is much more complicated than what we are assuming....

Each core (that is execution core and the dedicated cache) will be clocked independently, a major power saving feature of K10. So in order to share a cache at L3 level, it will need to send data asynchronously to differently clocked cores... wow, this is complicated.... so what AMD has done (per Kanter) is build a 'translator', or a FIFO buffer to send data to and from the L3 -- this is not the same as dynamically adjusting L3 clock or latency, what it is doing is dynamically adjusting a clock divider to synch L3 with variable speed cores, now this variable L3 latency makes much much more sense.

Any asynchronous communication will incur extra latency (over a simple 1:1) simply as a result of clock mismatch ... this is a given ... (this is why C2D shows a dip in performance in DDR2-533 to DDR2-667 to DDR2-800 as dividers beyond 1:1 introduce extra latency).

So with this understanding, the observed latency (which is actually the important part) will be variable, not because L3 cache has variable latency but because it has to be sychronized through the FIFO buffers to cores of variable clocks.

Damn should have paid more attention to Kanter's article too....

Guys I am learning a lot here... thanks.

Jack

**mstp2009** · 09-01-2007, 01:32 PM

Originally Posted by informal

PS Aren't you a guy who got banned at AMDzone for heavy trolling every thread?Was there another forum[H]?

Nope. Never graced that forum.

PPS Did you gave a proof for 3Ghz 25W Penryns for laptops?Oh wait,it is the same "God" analogy coming again,no?

My statement was that they were possible w/ 45nm high-K. 35W 2.8GHz ones were listed. And they weren't even low-voltage.

Pwned AGAIN.

**Donnie27** · 09-01-2007, 01:32 PM

Originally Posted by Zytek_Fan

All the K10 threads go downhill in a hurry

Can't blame it on me LOL!

**Movieman** · 09-01-2007, 01:35 PM

moving this thread to the AMD section per several requests..

**mstp2009** · 09-01-2007, 01:35 PM

Originally Posted by JumpingJack

Informal, becareful and read Kanter's note carefully, this is beginning to focus up .... it was a good link... but you may be misinterpreting what Kanter is saying....

I am, myself, trying to understand at a detail that makes sense, this is much more complicated than what we are assuming....

Each core (that is execution core and the dedicated cache) will be clocked independently, a major power saving feature of K10. So in order to share a cache at L3 level, it will need to send data asynchronously to differently clocked cores... wow, this is complicated.... so what AMD has done (per Kanter) is build a 'translator', or a FIFO buffer to send data to and from the L3 -- this is not the same as dynamically adjusting L3 clock or latency, what it is doing is dynamically adjusting a clock divider to synch L3 with variable speed cores, now this variable L3 latency makes much much more sense.

Any asynchronous communication will incur extra latency (over a simple 1:1) simply as a result of clock mismatch ... this is a given ... (this is why C2D shows a dip in performance in DDR2-533 to DDR2-667 to DDR2-800 as dividers beyond 1:1 introduce extra latency).

So with this understanding, the observed latency (which is actually the important part) will be variable, not because L3 cache has variable latency but because it has to be sychronized through the FIFO buffers to cores of variable clocks.

Damn should have paid more attention to Kanter's article too....

Guys I am learning a lot here... thanks.

Jack

Thank you Jack.

So absolutely latency - the transfer from FIFO buffer to L3 is constant, it would just be the fill time of the FIFO buffer that is variable b/c it has async communication with each of the 4 cores (unless they are at full speed, one would assume).

The observed latency would be:

Core to FIFO Buf latency + FIFO Buf to L3 latency

Correct?

**JumpingJack** · 09-01-2007, 01:37 PM

Originally Posted by onewingedangel

Why can't everyone just wait the week or two left? Then a lot of arguments can be settled...

Nahhhh, we would get bored.

**~~BeardyMan~~** · 09-01-2007, 01:37 PM

Originally Posted by informal

I think you quoted the wrong man,since i did gave the link for that patent...

I know, you may slap me for that

**JumpingJack** · 09-01-2007, 01:43 PM

Originally Posted by mstp2009

Thank you Jack.

So absolutely latency - the transfer from FIFO buffer to L3 is constant, it would just be the fill time of the FIFO buffer that is variable b/c it has async communication with each of the 4 cores (unless they are at full speed, one would assume).

The observed latency would be:

Core to FIFO Buf latency + FIFO Buf to L3 latency

Correct?

Yeah, and I cannot believe I was so ignorant not to think this through. The L3 cache is shared, but each core is throttled independently. The overall intrinsic latency of the L3 will be like any cache, it is fixed by the size and quality of the process technology as well as the speed paths set at design.

However, since each core will throttle depending on load, the clock for each core can be different than L3... necessitating an asynchronous bus such that each core can still access the data....

Now, simple asynchronous communications will always have variable latency (as a function of the ratio or divider) as one agent will need to wait on the other at some point. Example, say you have a divider 6:5 let's call that agent a and b, so it is 6:5 A:B, to make this easy lets say 1 bit line so in 5 clock tickes it will send 5 bits for agent B, but agent A has put 6 clock ticks into the queue, one cycle will be left hanging until the next revolution around.... temporally this whould make no difference3, but agent A is only as fast as agent B.....

But you also have to add into the mix the physical latency of the circuit to do this work.... it is a trade off, one that AMD obviously believes is better in the long run... so long as L3 'observed' latency is much less than that to main memory, there is a benefit.

**informal** · 09-01-2007, 01:49 PM

However, since each core will throttle depending on load, the clock for each core will be different than L3... necessitating an asynchronous bus such that each core can still access the data....

Now, simply asynchronous communications will always have variable latency as one agent will need to wait on the other at some point. Example, say you have a divider 6:5 let's call that agent a and b, so it is 6:5 A:B, to make this easy lets say 1 bit line so in 5 clock tickes it will send 5 bits for agent B, but agent A has put 6 clock ticks into the queue, one cycle will be left hanging until the next revolution around.... temporally this whould make no difference3, but agent A is only as fast as agent B.....

Yes Jack,this is a simplified reason why it occurs.
I thought you saw Kanter's article long ago(since it is online since the middle of May i think)

**JumpingJack** · 09-01-2007, 01:51 PM

Originally Posted by informal

Yes Jack,this is a simplified reason why it occurs.
I thought you saw Kanter's article long ago(since it is online since the middle of May i think)

I did, and I read it to... it just did not register enough to turn on lights as this topic has evolved.

**Lightman** · 09-01-2007, 02:05 PM

Originally Posted by JumpingJack

Informal, becareful and read Kanter's note carefully, this is beginning to focus up .... it was a good link... but you may be misinterpreting what Kanter is saying....

I am, myself, trying to understand at a detail that makes sense, this is much more complicated than what we are assuming....

Each core (that is execution core and the dedicated cache) will be clocked independently, a major power saving feature of K10. So in order to share a cache at L3 level, it will need to send data asynchronously to differently clocked cores... wow, this is complicated.... so what AMD has done (per Kanter) is build a 'translator', or a FIFO buffer to send data to and from the L3 -- this is not the same as dynamically adjusting L3 clock or latency, what it is doing is dynamically adjusting a clock divider to synch L3 with variable speed cores, now this variable L3 latency makes much much more sense.

Any asynchronous communication will incur extra latency (over a simple 1:1) simply as a result of clock mismatch ... this is a given ... (this is why C2D shows a dip in performance in DDR2-533 to DDR2-667 to DDR2-800 as dividers beyond 1:1 introduce extra latency).

So with this understanding, the observed latency (which is actually the important part) will be variable, not because L3 cache has variable latency but because it has to be sychronized through the FIFO buffers to cores of variable clocks.

Damn should have paid more attention to Kanter's article too....

Guys I am learning a lot here... thanks.

Jack

This is not entirely true...

I read somewhere long time ago that L3 in K10 is acting more like memory layer. In other words it is clocked by IMC independently from all 4 cores and on diagram I would put it after CrossBar...
That's why L3 latency can vary from core point of view (cache latency itself is probably constant). It is similar to how DDR2-800 latency (again from CPU point of view) is different compared to DDR2-667 (same timings of course

).

Edit: JumpingJack you typing too fast

I barely read page 16 and typed my response and here surprise! another page with new info making my post partially obsolete

Thread: K10 Scores starting to surface

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions