No, like I said before, I've run my X2 at 2.0ghz (with cinebench 10 x64), and 1cpu result was 1905 (1896 this K10).
Printable View
No, like I said before, I've run my X2 at 2.0ghz (with cinebench 10 x64), and 1cpu result was 1905 (1896 this K10).
I can believe K10 would run rings around K8 at same clock. The mere doubling of SSE throughput means that a lot of HPC apps will run near twice as fast. Core 2 is in many cases near twice as fast as P4 clock for clock
in HPC is some cases up to 4 times as fast.
The 32 byte fetch means K10 should be a floating point monster (HPC).The load forwarding capabilities of K8 are quite deficient (none!) compared to Core 2 ( load forwarding already in pentium pro) which means that their inclusion in K10 will give an even bigger boost than Core 2 got from it. Too bad the clock rate is low and the cache is relatively small.
32 byte fetch can help with decoding long instructions, but K10 still limited by 3 x pipeline (Core has 4 x pipeline). It can't help in legacy code with short instructions.
But Core(tm) feature 64-byte fetch buffer wich can help short loops run faster (on any code).
Core(tm) is still better in almost all which is related to the memory subsytem.Quote:
The load forwarding capabilities of K8 are quite deficient (none!) compared to Core 2 ( load forwarding already in pentium pro) which means that their inclusion in K10 will give an even bigger boost than Core 2 got from it. Too bad the clock rate is low and the cache is relatively small.
http://www.xbitlabs.com/articles/cpu...k10.html#sect0
As a result, we see that the memory subsystem of K10 processors has undergone some positive improvements. But we still have to say that it still potentially yields to the memory subsystem in Intel processors in some characteristics. Among these features are: the absence of speculative loading at unknown address past the write operations, lower L1D cache associativity, narrower bus between L1 and L2 caches (in terms of data transfer rate), smaller L2 cache and simpler prefetch. Despite all the improvements, Core 2 prefetch is potentially more powerful than K10 prefetch. For example, K10 has no prefetch at instruction addresses so that we could keeps track of individual instructions, as well as no prefetch from L2 to L1 that could hide L2 latency efficiently enough. These factors can have different effects on various applications, but in most cases they will determine higher performance of Intel processors.
Until i see a reputable site trusted by XS post a full review, its all speculation. Lets wait and see mature results, enough with the speculation.
1st point. That's a loop detector. I'm interested in pure SIMD FP, where 32 byte fetch should help.
2nd point. AMD has implemented a write burst buffer and real RAM prefetcher into the IMC. Intel probably has better prefetchers. The real hurt is the low launch clockspeed if true.
People seem to forget how one early K7 sucked (simply put)
Link:http://firingsquad.com/hardware/k7550preview/page7.asp
Even loses to K6-3 at the same clock.Has abysmal FPU performance,also goes for integer too.
I do:) So few good ones are out there that leaks of the so called bad ones would likely be seen first.
Yet, I'll say what I've said from day one. If AMD had something to show, they or someone friendly to them would have shown it by now. I hope like hell I'm wrong on this one.
The AMD stated working latency is "less than 38 cycles and depends on the clock speed of the southbridge". Higher clock speeds offset the latency as in all processors. L3 cache is just the shared victim cache for the L2 cache, nothing more. It operates to reduce latency very well between RAM<->CPU for the K10 as the larger L2 does in Core 2.
K8 had a 12 stage pipeline, Barcelona a 12 stage, and Core 2 a 14 stage.
K8 L2 latency is 12 clock cycles, Core 2 is 14 and Barcelona is 12.
K8 L2 cache bus width is 128-bit, Core 2 is 256-bit, Barcelona is 128-bit.
SSE engine width of K8 was 64-bit (2 per clock), Core 2 was 128-bit (3 per clock) and Barcelona is 128-bit (2 per clock).
L1+L2 cache latency is 15 cycles for the K8, 17 cycles for Core 2, and 15 for Barcelona IIRC.
Correction: L1+L2 cache access combined latency is median 13 cycles for Core 2 and Barcelona. That's twice as much data in the same time frame accessed by Core 2 due to the double bus width between L2<->Core.
There's much more improvements with larger stack load and reordering of load/store of the many which have the potential to make the most difference. Many of the improvements are identical to what was done with Yonah -> Core 2. Many more specific, and even some more advanced.
Based on the technicalities, the improvement seems like this:
K8 > K10 as with Yonah > Core 2. Like I forementioned, I think its a retail clock speed yield race, nothing more. We'll wait and see how it pans out in reality.
Coolaler isn't trusted by XS?
From what I recall, the early Kentsfield benchmarks found on that forum were quite accurate, and so were the Core 2 overclocks. Faking a benchmark like this would be really stupid, seeing how easy it would be to prove the forgery once the CPU's went retail.
The results are disappointing taking Intel's current and future (45nm) offerings into consideration, but they are not unreasonable. They show a few percentages performance increase over the K8.
There will probably be benchmarks where the K10 scores like the K8, and there will probably be benchmarks where the performance delta is bigger than in Cinebench (say circa 20-30%).
http://firingsquad.com/hardware/k7550preview/
Take a good look at that. Those were pretty bad K7 pre-launch benchmarks. And we all know the K7's story don't we?
Better wait a few more days. You never know with AMD, I remember T-Bred A and B. Well that was a surprise too, many thought AMD couldn't get the K7 over 2GHz properly when T-Bred A came out.
I'd say don't worry too much, AMD has the reputation to exceed everyone's expectations. It happened so many times before.
Edit, wow informal, I for sure take tooo long to write a reply.
Anyway, that's the same point I wanted to present here.
Kinda funny to note that both Pentium III and K7 did not reach their pinnacle until they were well in the shade of anticipation for the newer tech. Barton and Tualatin, too few remember ye.
K8 has gone through allot too, but besides the X2... it hardly feels like an mutated alien in the face of its original self.
Guy's
i would like to mention that AMD said that software using the FPU need a re- compile to take full advantage of the new Barcelona FPU right ?
So, running current benchi's like cinebench or super pi with apparently 'old' code doesnt show the full speed up the K10 would get.
Dunno if super PI uses FPU but basically alot of scientific programs use it.
Please correct me if i am wrong here.
I love how everyone is coming up with conspiracy theories. I totally trust Coolaler.
I love how discussions like these bring out the stupid in stupid people.
*sigh*
First off, it's not Coolaler, it's some guy on the site's forum. Secondly, remember the original Athlon previews? Even if somebody credible were to bench the chip, the results would still be bogus it it were bugged. Thirdly, Rahul Sood has stated that the 3GHz Phenom is considerably faster than all of AMD and Intel's current offerings. These benchmarks fly against Rahul's claims, and, quite frankly, I trust Rahul Sood just a bit more than some guy on some forum who's possibly using bugged chip/mainboard/bios to boot!
How about the AM2 previews, they turned out to be correct. Performance decrease vs DDR-400 unless paired with DDR2-800 memory.
Sood also questioned the initial Conroe benchmarks so I doubt he really knows much more than anybody else.Quote:
Thirdly, Rahul Sood has stated that the 3GHz Phenom is considerably faster than all of AMD and Intel's current offerings. These benchmarks fly against Rahul's claims, and, quite frankly, I trust Rahul Sood just a bit more than some guy on some forum who's possibly using bugged chip/mainboard/bios to boot!