!!!The Ultimate K8L Thread 2007 & Beyond!!!

Printable View

Show 100 post(s) from this thread on one page

02-11-2007, 02:55 PM
nn_step

Quote:

Originally Posted by awhir

soo ok guys was readin tru and im abit confused.is k8l a new chip that will run on am2 or its own new board??

just so much amd stuf happen like the fx-74 and stuf :D

K8L just like K8 is going to be featured in a few sockets.
and Like always, any socket that it fits into (without the removal or bending of pins), it will work though not always at optimal efficiency
02-11-2007, 05:23 PM
Dumo

Interesting:) ....I have to get more info on this AMD stuff.
02-11-2007, 09:14 PM
MAS

the way K10 will achieve 40% advantage
http://img201.imageshack.us/img201/8...taskingzq7.png
:)
02-12-2007, 05:07 AM
MAS

We've already had confirmation that 40% overall is simply not happening
Randy Allen and Patrik Patla (AMD directors) told us about 40per cent, and suddenly brentpresley appears and tells us

Quote:

that 40% overall is simply not happening

look better at the picture

40 % advantage is for rough multitasking environment
10 % is for single-threaded appl.
02-12-2007, 06:30 AM
nn_step

Quote:

Originally Posted by brentpresley

NOT LIKELY. He has A2/A3 silicon (possibly EVEN B1 at this point), and unless there are MAJOR bugs to be fixed, we will not see more than 2-5% more in performance over the ES chips. That puts it AT BEST 15% better than C2D on average, but right now all we know for sure is that current steppings perform only 10% better. That is a FAR cry from the performance lead that AMD used to have. And like I said earlier, how many of us power users buy these is going to completely depend on how well they overclock.

I've already said that SSE will run better on K10, but you have SEVERELY misplaced your faith if you think programmers are going to optimize and rework their SW just for that (on average it takes 6-12 months to rework a MAJOR piece of SW to take advantage of instruction-level changes like SSE/2/3/4 and most companies just aren't going to put the resources forward to optimize it when they consider current performance as adequate). As someone who has done a few years worth of programming, I can can tell you that the VAST majority of programs will remain integer-based. Only multimedia apps will continue to use floating point operations. There simply is no benefit to making integer apps faster (how FAST can you make Word? Everything is virtually instantaneous as is).

But hey, keep dreaming. ;)

funny in my favorite programming class, the teacher whipped code optimization and code splitting into us. knowing when to use integer approximation and when to do massive parallel floating point. Fun class, but definitely not for beginners
02-12-2007, 07:00 AM
SOLDNER-MOFO64

Quote:

Originally Posted by brentpresley

NOT LIKELY. He has A2/A3 silicon (possibly EVEN B1 at this point), and unless there are MAJOR bugs to be fixed, we will not see more than 2-5% more in performance over the ES chips. That puts it AT BEST 15% better than C2D on average, but right now all we know for sure is that current steppings perform only 10% better. That is a FAR cry from the performance lead that AMD used to have. And like I said earlier, how many of us power users buy these is going to completely depend on how well they overclock.

Everyone that owns a C2D depend on how well it OC's!!!.....they'd be 30%-50% less powerful if they didn't scale as well.

Well....how many of you power users OC your C2D's? 100% of you?

When you start talkin' OC's then the performance gap will only grow further.

We all know a C2D HAS to be overclocked in order to attain the performance levels everyone talks of, why should it be any dif for AMD?
02-12-2007, 07:08 AM
Motiv

All we can go off are estimates of Performance and a speculative 10% from s7's under NDA guy.

If this is the only figure we know of, then we expect probably 15% more speed than a c2 (clock for clock). Until other figures are released it is nothing but a pointless argument.
02-12-2007, 07:11 AM
SOLDNER-MOFO64

Quote:

Originally Posted by brentpresley

Barcelona INFO straight from AMD:

http://www.amd.com/us-en/Corporate/V...115794,00.html

Availability left open as "mid-2007"

Not much we didn't know, but good to have it in an official statement from AMD.

Agreed :)

Quote:

Originally Posted by Motiv

All we can go off are estimates of Performance and a speculative 10% from s7's under NDA guy.

If this is the only figure we know of, then we expect probably 15% more speed than a c2 (clock for clock). Until other figures are released it is nothing but a pointless argument.

Here, here :)
02-12-2007, 08:01 AM
Shintai

Just to finish the SSE FUD that somehow started. Look on C2D vs CD. Is the C2D like 6x faster? C2D got 6x higher potential SSE throughput. But its not really that much of the total code thats SSE.

Less dreaming, more reality please. x87 to SSE patches for games dont even bring that much.

And SSE is still widely missing at many places....MS tries to force this with no x87 in 64bit. But mandatory SSE. However...dont dream...SSE is a nice boost but no miracle. Its more a matter of cleaning up the stupid x87 and get it removed with time from the CPU.
02-12-2007, 09:33 AM
nn_step

Quote:

Originally Posted by brentpresley

Then you know first-hand as well how hard it is. ;)

It is not just a matter of changing a compiler flag and there you go.

absolutely, fortunately a well made and documented program can be updated rather quickly. I remember helping in a project to convert an Audio encryption from Integer to SSE3, took a couple days but the performance boost was huge.
So it ultimately how important performance is to you.
02-12-2007, 11:30 AM
\Karting_freak

the clue is "estimates"
i hope its atleast half true ;) would still give c2d a run for the money
02-12-2007, 12:30 PM
nn_step

Quote:

Originally Posted by brentpresley

he he, haven't seen too many of those.

honestly.

I definitely agree with you there, heck take ten seconds to look at Microsoft source code and you'll wonder how the hell they got it to run. Some of them just seem to love the "goto statements" But I must admit their Binary interfaces and the assembly they use for it are extremely well made.
Unfortunately the technically skilled aren't the ones writing the most code.
And if you really want to see a 300% speed increase, transcribe .Net programs to pure C code. Talk about a huge improvement.
02-12-2007, 02:57 PM
Lightman

Some interesting bits:

Quote:

A 65nm silicon-on-insulator process is used for producing the near-450-million transistor device, with dual stress liners and a silicon germanium process is used to speed up the pFETs. Eleven layers of copper and low-k dielectrics connect the device.

At 95 degrees Celsius, modelling suggests the processor will run at between 2.2 and 2.8GHz at 1.15 volts. Each of the four cores include eight temperature sensors. The on-chip northbridge contains a further six.

The memory interface is 400 to 800Mbps from a 1.7 to 1.9 volt supply for DDR2, and 800 to 1,600Mbps from 1.4 to 1.6 volts for DDR3.

The HyperTransport interface supports legacy HT1 and 2 modes as well at HT3 at 2.4Hbps with a peak of 5.2Gbps.

Source:
http://www.edn.com/article/CA6415782.html?partner=enews

Enjoy!

:)
02-12-2007, 03:39 PM
accord99

Quote:

Originally Posted by LOE

64 bit - c2d is slower in 64bit mode due to 2 reasons - the iAMD64 and the lack of macro ops fusion in 64bit mode, c2d could easily loose 7-10% of its performance

But it's not slower, sometimes its faster, sometimes its slower depending on the application, just like the K8. Overall, it still remains the fastest 64-bit x86 processor available today.

Quote:

heavy multithreading - we already see quad FX running inferior chips outperforming core2quad in heavy multithreaded scenarios, that gap will only grow bigger when K10 comes out

We see a one or two unrealistic scenarios where this happens and requires specific situations that benefit from the Quad FX's additional memory controller. However, in a single-socket system, the desktop versions of Barcelona will only have 1 memory controller and 12.8GB/s of memory bandwidth.

Most other heavy multi-threaded scenarios have the QX6700 beating the Quad FX just as easily as it does in single-threaded scenarios.

Quote:

are you sure? :nono: C2D can process one 128bit sse instruction per cycle, do you mean pentium has a 21.33 (128/6) bit SSE engine :rofl:

A C2D can execute 1 128-bit multiply, 1 128-bit add plus a load, store and jump in the same cycle.
02-12-2007, 03:45 PM
SOLDNER-MOFO64

Quote:

Originally Posted by Lightman

Some interesting bits:

Source:
http://www.edn.com/article/CA6415782.html?partner=enews

Enjoy!

:)

:up: :party3:
02-12-2007, 05:22 PM
doompc

Quote:

Originally Posted by accord99

But it's not slower, sometimes its faster, sometimes its slower depending on the application, just like the K8. Overall, it still remains the fastest 64-bit x86 processor available today.

All CPUs speed up in 64 bits due to the larger amout of registers and the standard SSE2 instructions.
But Core2 does not speed up as much as K8 since MacroFusion doesn't work in long mode.

On SSE execution K10 has little advantage.
Core2 has 3 SSEs plus one load and one store units.
K8 has 3 FPUs (that do SSE) plus the load/store unit that do two loads/stores per cycle, on K10 the FPUs are widened to 128 bit so it can do 3 128 bit SSE per cycle plus 2 load/stores.
So Core2 does 3 SSE, 1 load and 1 store. K10 does 3 SSE, 1 load and 1 store or 2 loads or 2 stores.
http://www.xbitlabs.com/articles/cpu...amd-k8l_5.html
02-12-2007, 10:55 PM
JumpingJack

Quote:

Originally Posted by savantu

Really ? Not even Conroe manages 1-1.2 except on few codes.

P4 was around 0.3-0.7 and K8 0.5-0.9 at least for SPEC IIRC.

I don't know why 0.9 to 1.2 keeps sticking in my head, but in some code base yes, P4 could do that 0.9 to 1.2 (some apps within the SPECINT bench showed this high):

http://www.princeton.edu/~jdonald/re...uck_pact03.pdf

Quote:

The benchmarks that perform
best in this environment are mcf, art and swim at 93%, 97%
and 98% of peak respectively. eon and wupwise have relatively
high instruction throughput of 0.9 and 1.2 IPC respectively,
while mcf and swim have relatively low IPCs of .08,
.2 and .4 (all IPCs measured in ��ops). Not unexpectedly,
then, those applications with low instruction throughput demands
due to poor memory performance are less affected by
the statically partitioned execution resources. See Figure 1
for a summary of results from these runs.

(EDIT: it is reading this paper sometime ago that 0.9 to 1.2 sticks in my head, because my first thought was wow... a P4 can actually do that :) )..

The IPC, of course, is very code dependent (compiler optimizations, instruction ordering, etc) and how the architecture handles the ILP efficiency, combined with all sorts of factors. Truth is I have looked over probably half dozen to dozen papers where the IPC is measured/calculated, HT helps, I have seen IPC as high as 1.6 in some code base. However, the original point is that it really really stunk in a general sense.... a long pipeline with unoptimized code for that situation will generally crater the efficiency.

Another example of who well and poor the P4 can do IPC wise:
http://www.geocities.com/ykchen913/p...ions/CAECW.pdf

In h.264, the IDCT chain could get as high as 1.16 (see table 4). This is a good paper, as it also shows FSB utilization on a P4 is quite low even with a high L2 miss rate.... this is on a 533 MHz FSB .... and multimedia is likely to have the highest demand on FSB.

Anyway, C2D I do believe is significantly higher than 1.0 IPC on average (some will be low of course, but others high), but I have not found any studies or data that has measured it.

Barcelona appears to be heading for a good IPC boost, achieving something higher that C2D will be a true accomplishment, C2D did a good job in this department to show the improvements. I am anxious to see the data.

Jack

Show 100 post(s) from this thread on one page