Probably wasn't worth the effort at the time as they weren't bandwidth limited with even dual core chips, with quad core chips it makes a lot more sense though.Quote:
Originally Posted by doompc
Printable View
Probably wasn't worth the effort at the time as they weren't bandwidth limited with even dual core chips, with quad core chips it makes a lot more sense though.Quote:
Originally Posted by doompc
Hmmmm. Looks like the actual number of TLB entires is hidden from the OS kernel.
There are some crazy New Zealanders who have some measuring software...
well uOpt do what you have to in order to find out.
We must know the secret
Here it is, from Intel's system programmer's manual (updated Oct 30 to include Core2):
Quote:
- Intel Core 2 Duo processors: DTLB0, 16 entries, DTLB1, 256 entries, 4 ways.
- Pentium 4 and Intel Xeon processors: 64 entries, fully set associative; shared
with large page data TLBs.
- Intel Core Duo, Intel Core Solo processors, Pentium M processor: 128 entries,
4-way set associative.
- Pentium and P6 family processors: 64 entries, 4-way set associative; fully set.
associative for Pentium processors with MMX technology.
Doesn't look like the new quad core processors will be very good energy-wise. Already up 200 mhz, and its 200W. However, performance probably will kick ass ... at stock anyway
Because K8 has 64bit cache L2 :) .Quote:
Originally Posted by doompc
No, K8s Cache L2 width is 128 Bits. It was 64 Bits on K7s.
It's 64bit wide, but dual ported, basically you can't read more than 64bit or write more than 64bit to L2 cache at 1 clock:stick:Quote:
Originally Posted by zir_blazer
from X-bit Labs article
http://www.xbitlabs.com/articles/cpu...amd-k8l_6.htmlQuote:
The L2 cache (paired with the L1 cache) is exclusive: the data in the L1 and L2 caches are not duplicated. The L1 and L2 caches exchange data across two unidirectional buses (one goes from the L1 to the L2 and one goes from the L2 to the L1), each 64 bits or 8 bytes wide (Figure 6). With this organization, the processor receives data from the L2 cache at a rather slow rate of 8 bytes per clock (8 clocks to transfer a 64-byte line). As a result, the data transfer latency is high, especially when two or more lines in the L2 cache are being accessed simultaneously. The latency is somewhat compensated by the increased number of cache hits due the high associativity of the L2 cache, which is 16, and due to the larger total amount of cache memory (thanks to the exclusive design).
Fred this is true . What you posted. Except right now It really doesn't matter. As C2D cann't for the most part do any more than 3 vector ops per cycle most of the time. Hower if you read the White paper I linked to not long ago You will see that Intels SSE4 instructions that will come out with Wolf/York . 30 new instructions are almost all used to increase vector performance.
[ Vector processing on Nehalem ] is the Thread you can find the White paper on. In the news section. This is when you will see the intel cpu's shine. Because intel waited for Wolf/York these SSE4 instructions will be benefical right away as the programmers already have the instructions.
Quote Fred Pohl
This table explains a lot of things right away. And the most important thing is that the processors with Core microarchitecture have “wider” architecture that allows processing more instructions per clock cycle than CPUs with K8 microarchitecture. Although the execution units of both competing processor architectures can process up to three x86 and x87 instructions per clock cycle, Core Microarchitecture should prove more efficient with SSE instructions. While K8 processors can perform only one 128bit command per clock, Core can process up to three commands like that.
Moreover, Core Microarchitecture boasts another great advantage: more advanced decoding system. Together with the four decoders, macrofusion technology allows decoding up to five instructions per clock (in an ideal case). The competitor processors can only decode three instructions simultaneously. All this indicates that the decoders of Core Microarchitecture based CPUs will be able to better load the processor execution units by performing up to four instructions per clock in the most optimal conditions. In this case the overall commands execution will go 33% faster than by K8 AMD processors.
AMD will lead (C) (by AMD) -)
http://www.overclockers.ru/images/ne.../15/k8l_01.gif
AMD confirms 40 percent K8L superiority
what we see?
if(1.4*K8 == K8L && 1.2*K8 == C2D) K8L = 1.4/1.2*C2D = 1.16*C2D;
maybe 1.16 in a single core battle. In multithreaded environment it may be much faster (and disipate less energy).
cant wait.
New roadmap from AnalysDay is showing first Deerhounds around April 2007
http://techreport.com/etc/2006q4/amdfad02.jpg
Note the * at the 40%. Now tell me whats behind that *.Quote:
Originally Posted by MAS
i see only mid-2007
and not Deerhound, but Barcelona-Opteron and Agena-FX
* is multiplication operation if u didn't know
Precisely, its nothing more than a marketing number that never seen the light if day yet.
Tho MAS obviously is dreaming of something else while trying to be funny.
something else rather ))
Yeah,I guess AMD will wait 1 year after intel's Core2 arch. just to introduce underperforming new arch :rolleyes: . /*Fat chance,Pops! */Quote:
Originally Posted by Shintai
AMD is not crazy to spill the beans on perf. too early.They will "shock" us as they did in the past.AM2 and socket F launch was just a preparation&migration to a higher bandwith platform needed for New Core arrival next spring.
And btw,they sell their whole production atm,furthermore ,they can't keep up with the demand.All this while not fully converted to 65nm...Kinda amazing IMO.
And ,don't forget that AMD already crossed the bridges known as :point-to-point interconnects and IMC.The ones intel is strugling for years to cross while being multiple times larger company:rolleyes:
Interesting out look you have. But lets be fair about What intel is tring to accomplish with CSI . It will be radical design compared to Ht . Which by the way in its first implamentation came from Dec.Quote:
Originally Posted by informal
Since I haven't seen the white papers on CSI so I won't speculate on it. But the rumors are looking good.
On k8l I think its best we wait to see how it performs befor we put intel in its grave.
. We all seen what 4x4 did as compared to the hype.
We have gotten a glimpse of AMD on 65nm as compared to the hype.
So lets just wait for the results When K8L comes out. Unless you seen something in the resent K8L demo running Task Manager that you liked.
It is not clear from the sheet whether they mean a 40% speed increase per core or whether they mean that moving a "typical" application from dual-core to quad-core given an overall speedup of 40%.Quote:
Originally Posted by MAS
Since this is very marketing speechish I would think it is the latter.
However, 40% speedup per core at the same clockspeed is not impossible I'd say. Core2 is already 20-30% faster, and they would have a year more to fiddle. And they could count in better SSE units.
As I said hype is hype reality is something differant all together.Quote:
Originally Posted by MAS
new steppings can help brisbane reach 3,4 and even 3,5GHz, sooner or later
besides 3,1GHz OC is only single result
will wait for statistics
Oh Brent!! No one yet tried Brisbane on Phase :) We don't know absolute limits of this shrink.Quote:
Originally Posted by brentpresley
Besides I'm comfortable with 3.1GHz on air, my previous X2 3800+ S939 did only 2,5GHz @1.65V and 2,74GHz@1.45V(latter one) stable.
We have gone over this Twice.Quote:
Originally Posted by brentpresley
Conroe has 2 more stages on a similar process.
it is only logical that conroe on average will clock better. However that also means that conroe will poorly perform in overclocks when 45nm comes around. Probably averaging 3.5-3.7Ghz