Page 6 of 9 FirstFirst ... 3456789 LastLast
Results 126 to 150 of 212

Thread: Intel Details Nehalem uArch Improvements - 256KB L2, 8MB L3 Confirmed

  1. #126
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by xlink View Post
    this is netburst on steroids. It's wider more accurate and emphasizes width over length unlike the original. It's the return of the 20 stage pipeline and I think that this time the world is ready for it unlike the last. Just based off of the fact that it's using DDR3 on a 196 bit bus something tells me it'll be very bandwidth hungry. VERY. Why the heck do you think they redesigned the cache architecture they'de been using for the past decade?
    Nehalem wont be 20 stage. It will be somewhere around what we have today ~12-14. It is a Core 2 evolutionary step.

    DDR3 on a 192bit bus is the evolutionary step. More cores simply just need more bandwidth and Nehalem is designed to scale to 8 cores (16 threads).

    I think they redesigned the cache structure mainly due to SMT and speed of the shared cache. Plus its the exact same design as Itanium. So you have more knowledge and experience.
    Crunching for Comrades and the Common good of the People.

  2. #127
    Wanna look under my kilt?
    Join Date
    Jun 2005
    Location
    Glasgow-ish U.K.
    Posts
    4,396
    Quote Originally Posted by Movieman View Post
    Now we know that the Harpertowns(Penryns) get an approximate 10% increase clock for clock over the Clovertown(Kentsfields,C2D) and what I'm hearing is that Nehalem will be 20-30% better clock for clock than the Harpertowns.
    Hey Dave Do you know/are you allowed to say if thats across the board, average increase or only in a few certain situations?

    Quote Originally Posted by T_M View Post
    Not sure i totally follow anything you said, but regardless of that you helped me come up with a very good idea....
    Quote Originally Posted by soundood View Post
    you sigged that?

    why?
    ______

    Sometimes, it's not your time. Sometimes, you have to make it your time. Sometimes, it can ONLY be your time.

  3. #128
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by K404 View Post
    Hey Dave Do you know/are you allowed to say if thats across the board, average increase or only in a few certain situations?

    I'm hearing it's a HUGE increase in apps like encoding(maybe 30%) and his "impressions" were that it was app 20% across the board better.
    Now thats not giving away any secrets as that is one persons impressions and no benchmarks shown..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  4. #129
    Xtreme Mentor
    Join Date
    May 2007
    Posts
    2,792
    K10h is a 12 stage pipeline, 65nm, 283mm², 463M transistor, 23.x FO4 delays design. Not made for high clocks in any way, AMD intended, as presented at one of the global IEEE 2006 conferences to reach 2-2.8GHz with Barcelona with it's rated supply Vdd. Intel Core 2 is a 21 FO4 depth design AFAIK and Penryn at FO4 ~18, it is supposed to have been reduced substantially since HKMG integration.

    The IBM Power6 is not the least nor the only architecture with 13 FO4 inversion delay, it just happens to be very well tuned for absolute speed and performance. P3 had FO4 15 depth, Willamette P4 FO4 8-10, Alpha 21264 has 15 FO4, and so on. Neither of those could achieve what IBM did.

    However, I don't think the IBM Power6 bears any relevance to desktop computers. It is a major success for it's HPC market and trumped anything any competitor had to offer in 2007 including Harpertown and Itanium 2 Montecito. It's the only CPU to hold all 4 major industry records in one go, transactions, Java, throughput and floating point. Beat Harpertown 3.16GHz 8 core vs 8 core in Int too. Best in SAP, TPC-C OLTP, OASO, Spec Jbb2005, Linpack HPC and so on last I checked late 2007. For instance in TPC-C:

    Bull Escala PL1660R 16-cores IBM Power6 4.7 GHz 1,616,162tpmC
    NEC Express5800/1320Xf 32-cores Intel Dual-Core Itanium 2 9050 1.6GHZ 1,245,516tpmC
    Bull Escala PL1660R 4-core IBM Power6 4.7 GHz 404,462tpmC
    HP ProLiant ML370G5 X5460 QC 8-core Intel X5460 3.16GHz 275,149tpmC

    As you can see, it trumps anything for what it was designed to do.

    What it does show is those who usually guess SOI is the only clocking restriction are wrong, as if you look at IBM technical documentations, IBM Power6 scales to 6.1GHz with low LpolySi tuning on air using SOI at 1.3V Vdd supply. Far more than anything else out there including HKMG 45nm CPUs. The official 3.2GHz IBM Power6 is rated for less than 100W TDP at 65nm SOI, big achivement. The 4.7GHz is rated for a maximum of 160W TDP with massive 790M transistors inside a big 341mm² transitor package, wowzer achievement, especially at the same pipeline, instructions per cycle and latch cycle overhead from 90nm. No other chip from AMD/Intel at 65nm or 45nm can do sub-200W TDP at those specs or temperatures (sub 60C air, with 105C limit). To compare, the Power5+ 1.9GHz is a 200W TDP CPU at 389mm² 276M transistors. Intel "Montecito" Itanium 9000 running at 1.6GHz is less than half as powerful as IBM Power6 with a hefty 104W TDP, just shows how brilliant the engineering on Power6 really is and yet, you forget the +25W minimum TDP of the memory controller with Power6 that Itanium 2 doesn't have. Comparing Kentsfield 2.67GHz MCM 65nm was at 130W TDP, 286mm² 582M transistor package to IBM Power6 4.7GHz 160W gives us:

    2.67GHz vs 4.7GHz
    130W (+35W NB) vs 160W
    582M vs 790M
    20.5MHz/W vs 29.38MHz/W
    0.455W/mm² vs 0.469W/mm²
    0.22W/MilT vs 0.20W/MilT

    In all respects it is far better a CPU at 65nm, but it isn't a desktop market intended chip, hence comparisons with our market shouldn't be made to judge absolute numerical performance, although electrically, you can do.
    Quote Originally Posted by IBM
    However, on a CPW per watt basis, the Power6-based server is 67 percent more power efficient than the Power5 box.

    IBM also provided a comparison with the earlier iSeries Model 870 machine, equipped with 16 of IBM's Power4 cores running at 1.3 GHz, which consumed 6,000 kilowatt-hours for the base CEC. A four-core Power6-based System i 570 does essentially the same amount of work, but burns less than 1/4 of the juice as the Power4-based machine at the CEC level.
    To get the speeds on any architecture is not just about one timing, jitter, skew, latch, wiring delays all add up and can increase delays and lower your clock performances per cycle. IBM mainly had to employ the use of very high speed low delay wiring and mainly, Dual Stress Liners within the transistors. Look at the low Vdd it needed for high frequencies, 0.9V Fmax is 3GHz.

    Anyway, as for L3 cache, many have had it but not more than one Intel SKU before the K10h architectural lineup in the desktop market AFAIR (?). Alpha EV5 was the first that I can recall with others such as >Power4, UltraSPARC IV+, Madison/9M, etc. It helps only mainly when your L2 and L1 is saturated for high memory access or large matrix array applications, such as databases. It was always mainly a server design bonus, hence not featured much on the desktop but now that seems to be changing and it's led by AMD quite obviously with their MPU+IMC design. Not that Intel didn't know this before AMD, they just couldn't produce a chip below 45nm with it.

    Someones also mentioned AMD K10h L3 cache is different to the upcoming "Nehalem" (it's not called Nehalem), as in inclusive rather than being exclusive. This isn't entirely true either, AMDs design is not specifcially inclusive or exclusive, but a bit of both:
    The L3 cache is specifically designed with data sharing in mind. This entails three particular changes from AMD’s traditional cache hierarchy. First, it is mostly exclusive, but not entirely so. When a line is sent from the L3 cache to an L1D cache, if the cache line is shared, or is likely to be shared, then it will remain in the L3 – leading to duplication which would never happen in a totally exclusive hierarchy
    Also the L3 is 20% of the die in K10h.

    Well, the additions for Nehalem are good on paper, but fingers crossed as Native+IMC is too difficult to have running as a design without problems, esp. at your first go. 45nm HKMG helps a lot but not as much as 32nm would. I'm fearing the prices on these, as clocks are far harder to get, yeilds much lower, defect rates very high, and hence, price is where it'll bite us in the hind, unless AMD has something Intel fears by 13th October '08.

    *Cache arrangement is nearly exactly the same as AMD K10h, no doubt, though Intel chose to keep it mainly inclusive. The L3 cache by nature of redundant access is slower than L2 and L1 but far quicker than RAM access. That large size of L3 will only help with large matrix applications, mainly in videoing/imaging/large gaming/server apps but beyond 8 or so MB data access, they might have paramount performance scaling issues when all caches are full with the same replete data, the latency will build. That's the problem with keeping it inclusive, they need speed and very low latency for it.
    *IMC built within limits the current delta between IMC-Core and withholds overclocking/speeds. Each MB PWM design now has to provide separate power for the IMC and not just the separate cores. Same for VMods.
    *IMC also increases power/TDP much, especially with triple channel memory support. You can add 30-60W of minimum to maximum power here at just 2.0-2.8GHz clocks, maybe even more so with SMT and QPI support being internal. Maximum theoretical AC and DC power consumption becomes much higher through individual latch testing.
    *Triple channel memory is essentially needed, I reckon its a clever move, because Native+IMC design suffers from low real bandwidth, and worse so for write/copy bandwidth than read. Individual DRAM access by each core is the best way to go, should improve or at least keep level write/copy bandwidth but improve read bandwidth over current Penryn. Just having IMC+3 controllers, doesn't gurantee this at all though.
    *4 vs 3 instructions executed at a time means it will obviously be quicker than Penryn per clock - unless something down the line holds it back, latencies being my fear. Hoping there will be major improvements here espceially with SSE 4.2 instruction updates once they're supported in software.
    *I don't like the sound of keeping a small L2 and large L3, this is more a server segment design win. L2 will be far quicker than L3, but slower than L1; the Native+IMC approach require a large L2 for speed in smaller apps and large L3 for speed in larger apps, but suffers from little L2 in the smaller desktop apps. Apps like SuperPi will probably see a big hit with this, not just 1M, even 32M, although not as much hit as L3 exclsuive on AMD K10h does.
    *Unfortunately, Native+IMC also means, lower bins, lower clocks, lower overclocks, higher defect rates, higher TDPs and much chances of cold bugs and low clocks held by the IMC, especially if they are in-sync. You have to realize the nature of binning and chip sorting is more than three times as difficult with 4-cores+IMC in one package. I hope they make the IMC run at a separate clock and PLL to the cores, fully adjustable, or IMC/MEM oc will also be poorer and very hard compared to modern Core 2 oc, which is easy. They have only quantified DDR3 800-1333 support which gives me the shivers of these clock restrictions since Nehalem is set for production in Q4 '08, for that time, I would've expected them to add DDR3 1600 support unless something is holding things back here.
    *IMC clocking depends on the delta between IMC and MEM gate currents and volts, so this could be a very tricky area to have working with DDR3 1.5V unless the IMC gate voltages are at the required delta's.

    Can't wait to see it in action, it is a revolutionary design for Intel, completely different to their previous CPU designs: a new architecture very clearly. They've chosen the same desktop architectural design as AMD now, to compete. It's the right way to go, but introducing SMT aka HyperThreading back again is not a good idea unless the single threaded Front End performance is weak or the clocks are lower than Penryn. That isn't a good sign of multi-threaded performance and clock speeds. We know software developers at Intel have been ringing developer ears since before Penryn of how poor multi-core paralellism exists everywhere on the desktop and home market (videos on their site showing 4 core to 6 core lost perf. scaling majorly) but let's hope they've improved this through the cache data fetch and eviction algorithms, the larger TLB and BTB can do this. This is mainly where SMT will help most.

    As for people fighting cases of this firm vs that, you're all wrong as all electrical designs and knowledge of anything and everything is mostly copied and passed on = it's not called copying though, it's called sharing. How you teach your child, how you know anything about computers is mostly through reading or being told, which again is sharing by some male/female somewhere. How I draw a picture by watching a video of someone drawing it, doesn't mean I copied or that it isn't an achievement if I made it good or better. And FWIW, K10h featured many improvements which were exactly what Core 2 received from Pentium M.

    I hope they do launch sub-120W TDP 2.8GHz quad-core Nehalem CPUs by October. Would be a major achievement to pull off with plus 1.1x Penryn performance per clock, not sure how many recognize ardently how difficult it is to produce especially at the same fabrication node as your current SKU lineups. Monstrously difficult job, go visit a fab and you'll realize much better. Just had a little time to sit down today.

  5. #130
    Xtreme Mentor
    Join Date
    Mar 2006
    Posts
    2,978
    Quote Originally Posted by duploxxx View Post
    that was original barcelona L3 speed, by the time you will see nehalem you will also see shangai with several redesigns including this l3 speed and you just have to check the amd forum to see what performance difference this makes on phenom cpu's...
    Again we don't know, AMD's L3 cache operates at NB speeds, but this is not what kills them on latency so much as they need FIFO buffers which absorb the clocks skew.... Even with this new information, we don't have too much info on Nehalem, heck, Intel may have done similar dividing of the clock domains, in which case they will also have similar latency hits in the cache hierarchy.

    It is funny to watch people talk 'Retun of 20 stage pipeline', etc etc above when that info has never been released or disclosed by Intel... people are talking like they know detailed spec, I find it amusing
    One hundred years from now It won't matter
    What kind of car I drove What kind of house I lived in
    How much money I had in the bank Nor what my cloths looked like.... But The world may be a little better Because, I was important In the life of a child.
    -- from "Within My Power" by Forest Witcraft

  6. #131
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by JumpingJack View Post
    Again we don't know, AMD's L3 cache operates at NB speeds, but this is not what kills them on latency so much as they need FIFO buffers which absorb the clocks skew.... Even with this new information, we don't have too much info on Nehalem, heck, Intel may have done similar dividing of the clock domains, in which case they will also have similar latency hits in the cache hierarchy.

    It is funny to watch people talk 'Retun of 20 stage pipeline', etc etc above when that info has never been released or disclosed by Intel... people are talking like they know detailed spec, I find it amusing

    didnt you know, fanboys know everything.

  7. #132
    Xtreme Member
    Join Date
    Dec 2007
    Location
    Karachi, Pakistan
    Posts
    389
    So what to expect a TLB Erratum from intel in the Nehalem.

  8. #133
    Xtreme Enthusiast
    Join Date
    Oct 2006
    Posts
    617
    i wonder how overclocking will be handled... it's so easy and intuitive to o/c when you have a range of CPUs with different front side busses, but soon neither AMD nor Intel will have that
    IMC clock/overclock constraints will be lesser on the mainstream model assuming it has an off-die MC. maybe only the server/extreme-parts will coldbug... that'd be ironic

  9. #134
    Xtreme Member
    Join Date
    Oct 2004
    Location
    USSR
    Posts
    281
    maybe overclocking will not be possible with nehalem and Intel will force us to buy those 1000 $/€ cpus if we want high end stuff ¬_¬

  10. #135
    Xtreme Addict
    Join Date
    Jun 2007
    Location
    Thessaloniki, Greece
    Posts
    1,307
    Quote Originally Posted by savantu View Post
    The 12MB L3 on a Itanium 2 Montecito has 14 cycles latency.The The L2 has 5 for Int and 7 for FP.

    Core 2 has 14 cycles for L2 ; K8 has 12 for L2 ; K10 L2 is 15 , L3 is 30 to 45.
    Not true for K10. L2 is 12 cycles just like K8. Also L3 latency is higher than 30.
    See here http://www.digit-life.com/articles3/...ma-phenom.html

    Quote Originally Posted by JumpingJack View Post
    This gate thickness is about 0.15-0.2 nm thinner than either AMD or Intel at 65 nm (their reported thicknesses were 1.3 nm and 1.25 nm respectively as I recall). Translation, IBM's power 6 is a power sucker.

    http://www.research.ibm.com/journal/rd/516/berridge.pdf
    Just a minor correction. AMDs figures are quite old and can very well have changed since then because of CTI.
    Quote Originally Posted by JumpingJack View Post
    AMD CPUs don't have a FSB, how do those get overclocked?
    Well AMD chips have a reference HTT clock not a PCI-E clock. If future mainstream Intel CPUs get their clock from the PCI-E bus then overclocking is going to get severly limited because increasing PCI-e clocks too much causes data corruption and stability problems for various hardware. Lets hope Intel will put the pci-e on a seperate clock domain and get CPU speed from somewhere else.
    @KTE Simply amazing and very informative post. Thank you
    Last edited by BrowncoatGR; 03-19-2008 at 07:53 AM. Reason: typo
    Seems we made our greatest error when we named it at the start
    for though we called it "Human Nature" - it was cancer of the heart
    CPU: AMD X3 720BE@ 3,4Ghz
    Cooler: Xigmatek S1283(Terrible mounting system for AM2/3)
    Motherboard: Gigabyte 790FXT-UD5P(F4) RAM: 2x 2GB OCZ DDR3 1600Mhz Gold 8-8-8-24
    GPU:HD5850 1GB
    PSU: Seasonic M12D 750W Case: Coolermaster HAF932(aka Dusty )

  11. #136
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by KTE View Post
    K10h is a 12 stage pipeline, 65nm, 283mm², 463M transistor, 23.x FO4 delays design. Not made for high clocks in any way, AMD intended, as presented at one of the global IEEE 2006 conferences to reach 2-2.8GHz with Barcelona with it's rated supply Vdd. Intel Core 2 is a 21 FO4 depth design AFAIK and Penryn at FO4 ~18, it is supposed to have been reduced substantially since HKMG integration.

    The IBM Power6 is not the least nor the only architecture with 13 FO4 inversion delay, it just happens to be very well tuned for absolute speed and performance. P3 had FO4 15 depth, Willamette P4 FO4 8-10, Alpha 21264 has 15 FO4, and so on. Neither of those could achieve what IBM did.
    Power 6 is in-order and has a far higher power budget.
    However, I don't think the IBM Power6 bears any relevance to desktop computers. It is a major success for it's HPC market and trumped anything any competitor had to offer in 2007 including Harpertown and Itanium 2 Montecito. It's the only CPU to hold all 4 major industry records in one go, transactions, Java, throughput and floating point. Beat Harpertown 3.16GHz 8 core vs 8 core in Int too. Best in SAP, TPC-C OLTP, OASO, Spec Jbb2005, Linpack HPC and so on last I checked late 2007. For instance in TPC-C:

    Bull Escala PL1660R 16-cores IBM Power6 4.7 GHz 1,616,162tpmC
    NEC Express5800/1320Xf 32-cores Intel Dual-Core Itanium 2 9050 1.6GHZ 1,245,516tpmC
    Bull Escala PL1660R 4-core IBM Power6 4.7 GHz 404,462tpmC
    HP ProLiant ML370G5 X5460 QC 8-core Intel X5460 3.16GHz 275,149tpmC

    As you can see, it trumps anything for what it was designed to do.
    It's not the core , it's the I/O that rocks => with 300GBs of I/O per CPU and 5000+ pins .

    In single threaded tasks it get obliterated by Core and sometimes even Itanium Montecito altough the later has a 3x frequency and 30x BW gap.What does that say about the core ?

    What it does show is those who usually guess SOI is the only clocking restriction are wrong, as if you look at IBM technical documentations, IBM Power6 scales to 6.1GHz with low LpolySi tuning on air using SOI at 1.3V Vdd supply. Far more than anything else out there including HKMG 45nm CPUs. The official 3.2GHz IBM Power6 is rated for less than 100W TDP at 65nm SOI, big achivement. The 4.7GHz is rated for a maximum of 160W TDP with massive 790M transistors inside a big 341mm² transitor package, wowzer achievement, especially at the same pipeline, instructions per cycle and latch cycle overhead from 90nm. No other chip from AMD/Intel at 65nm or 45nm can do sub-200W TDP at those specs or temperatures (sub 60C air, with 105C limit).
    Hold on a little.

    With unlimited TDP , yeah , you could get to 6GHz.Too bad Power 6 has a fairly substantial problem once you go above 4.7GHz , leakage sky-rocket.
    At 5.2Ghz you already burn 200w.



    I suggest you read this : http://www.realworldtech.com/forums/...85903&roomid=2

    To compare, the Power5+ 1.9GHz is a 200W TDP CPU at 389mm² 276M transistors. Intel "Montecito" Itanium 9000 running at 1.6GHz is less than half as powerful as IBM Power6 with a hefty 104W TDP, just shows how brilliant the engineering on Power6 really is and yet, you forget the +25W minimum TDP of the memory controller with Power6 that Itanium 2 doesn't have. Comparing Kentsfield 2.67GHz MCM 65nm was at 130W TDP, 286mm² 582M transistor package to IBM Power6 4.7GHz 160W gives us:

    2.67GHz vs 4.7GHz
    130W (+35W NB) vs 160W
    582M vs 790M
    20.5MHz/W vs 29.38MHz/W
    0.455W/mm² vs 0.469W/mm²
    0.22W/MilT vs 0.20W/MilT

    In all respects it is far better a CPU at 65nm, but it isn't a desktop market intended chip, hence comparisons with our market shouldn't be made to judge absolute numerical performance, although electrically, you can do.
    Somehow you've lost a 128MB L3 for an Power MCM , but what's 128MB anyway ?

    Anyway, as for L3 cache, many have had it but not more than one Intel SKU before the K10h architectural lineup in the desktop market AFAIR (?). Alpha EV5 was the first that I can recall with others such as >Power4, UltraSPARC IV+, Madison/9M, etc. It helps only mainly when your L2 and L1 is saturated for high memory access or large matrix array applications, such as databases. It was always mainly a server design bonus, hence not featured much on the desktop but now that seems to be changing and it's led by AMD quite obviously with their MPU+IMC design. Not that Intel didn't know this before AMD, they just couldn't produce a chip below 45nm with it.
    Complete and utter BS.Why would Intel put an L3 on a chip just for the sake of it ? Core 2 could have had 2x 512KB L2s and a 2MB L3.But why do it since you already have the best answer - fast and large L2s ?

    L3s are needed more for QC because of coherency issues and saturation of a shared L2.

    Well, the additions for Nehalem are good on paper, but fingers crossed as Native+IMC is too difficult to have running as a design without problems, esp. at your first go. 45nm HKMG helps a lot but not as much as 32nm would. I'm fearing the prices on these, as clocks are far harder to get, yeilds much lower, defect rates very high, and hence, price is where it'll bite us in the hind, unless AMD has something Intel fears by 13th October '08.
    You know that how ? Magic crystal ball ? If AMD couldn't get it right that means Intel won't either ?
    *IMC also increases power/TDP much, especially with triple channel memory support. You can add 30-60W of minimum to maximum power here at just 2.0-2.8GHz clocks, maybe even more so with SMT and QPI support being internal. Maximum theoretical AC and DC power consumption becomes much higher through individual latch testing.
    More BS.A MC burns less than 10w.

    *Triple channel memory is essentially needed, I reckon its a clever move, because Native+IMC design suffers from low real bandwidth, and worse so for write/copy bandwidth than read. Individual DRAM access by each core is the best way to go, should improve or at least keep level write/copy bandwidth but improve read bandwidth over current Penryn. Just having IMC+3 controllers, doesn't gurantee this at all though.
    The stupidest claim of the month.
    If native+ IMC suffers from low BW , how does a non-native non-IMC design do ? Crawls ?

    Got bored with the rest of your post.Sorry , but it's obviously you had too much free time lately to write such rubbish.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  12. #137
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by KTE View Post
    *IMC also increases power/TDP much, especially with triple channel memory support. You can add 30-60W of minimum to maximum power here at just 2.0-2.8GHz clocks, maybe even more so with SMT and QPI support being internal.
    I'd think it'd be lower than that as current NBs don't draw that much and the IMC isn't even everything which is on the IMC, plus in the past integration lowered overall power consumption so it should be less than that assuming yields aren't an issue. We'll see though.

  13. #138
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    the MCH itself dont draw much power, 5-10W perhaps in the enthutiast and less in the normal. The main powersucker in a NB is the PCIe controller.

    I think P965 as an example had a 21W TDP but used around 8-9W with actual usage and on 90nm. This is 45nm.

    To say 30-60W is just clueless fearmongering.
    Last edited by Shintai; 03-19-2008 at 09:49 AM.
    Crunching for Comrades and the Common good of the People.

  14. #139
    Xtreme Mentor
    Join Date
    Aug 2006
    Location
    HD0
    Posts
    2,646
    Quote Originally Posted by Shintai View Post
    Nehalem wont be 20 stage. It will be somewhere around what we have today ~12-14. It is a Core 2 evolutionary step.

    DDR3 on a 192bit bus is the evolutionary step. More cores simply just need more bandwidth and Nehalem is designed to scale to 8 cores (16 threads).

    I think they redesigned the cache structure mainly due to SMT and speed of the shared cache. Plus its the exact same design as Itanium. So you have more knowledge and experience.
    I'd heard it was going to be before.

    ehh.

  15. #140
    Coat It with GOOOO
    Join Date
    Aug 2006
    Location
    Portland, OR
    Posts
    1,608
    Quote Originally Posted by Den Leiw View Post
    maybe overclocking will not be possible with nehalem and Intel will force us to buy those 1000 $/€ cpus if we want high end stuff ¬_¬
    From what I understand, the QPI links do set the base frequency that the processor multiplier operates on to dictate the final clock speed. Most boards should have the option to scale the base frequency of the QPI links. So anything with Tylersburg as a QPI hub will be overclockable from that knob there.

    Questions still arise from how memory overclocking comes in. If one of the AMD guys could help me out as to how they setup their memory clocking it would be appreciated as I'm far to use to chipset straps and memory ratios. From my understanding, in K10, AMD uses memory ratios based off the HTT reference frequency ranging from 3:3 --> 3:8 for DDR2-400 --> DDR2-1066. Nehalem will most likely use a similar setup with the memory frequency being set off this base frequency.

    The problem comes up, that the mainstream stuff (Lynnfield/Havendale) are going to have both integrated PCI lanes and even graphics. They receive their clock frequency directly from the PLL and thus all control over the reference and clock splitting is all done on die. Since none of the big 2 have done this yet, mainstream overclocking might become a thing of the past depending on how it's implemented.
    Main-- i7-980x @ 4.5GHZ | Asus P6X58D-E | HD5850 @ 950core 1250mem | 2x160GB intel x25-m G2's |
    Wife-- i7-860 @ 3.5GHz | Gigabyte P55M-UD4 | HD5770 | 80GB Intel x25-m |
    HTPC1-- Q9450 | Asus P5E-VM | HD3450 | 1TB storage
    HTPC2-- QX9750 | Asus P5E-VM | 1TB storage |
    Car-- T7400 | Kontron mini-ITX board | 80GB Intel x25-m | Azunetech X-meridian for sound |


  16. #141
    Xtreme Member
    Join Date
    Dec 2007
    Location
    Raleigh, NC
    Posts
    172
    Talk about a serious inter-generational leap.... Everyone's excited about this...I'm just a bit concerned about whether mobo makers will get it right at launch...

  17. #142
    Xtreme Enthusiast
    Join Date
    Jan 2005
    Location
    Frederick, MD
    Posts
    513
    So we might have to buy the 3 channel version w/ northbridge so we can oc?
    Core i5 750 3.8ghz, TRUE 120 w/Panaflo M1A 7v
    ASRock P55 Deluxe
    XFX 5870
    2x2GB GSkill Ripjaw DDR3-1600
    Samsung 2233RZ - Pioneer PDP-5020FD - Hyundai L90D+
    Raptor WD1500ADFD - WD Caviar Green 1.5TB
    X-FI XtremeMusic w/ LN4962
    Seasonic S12-500
    Antec P182

  18. #143
    Coat It with GOOOO
    Join Date
    Aug 2006
    Location
    Portland, OR
    Posts
    1,608
    Quote Originally Posted by shiznit93 View Post
    So we might have to buy the 3 channel version w/ northbridge so we can oc?
    maybe, depends how open the on die clock regulation for the mainstream chips come out.
    Main-- i7-980x @ 4.5GHZ | Asus P6X58D-E | HD5850 @ 950core 1250mem | 2x160GB intel x25-m G2's |
    Wife-- i7-860 @ 3.5GHz | Gigabyte P55M-UD4 | HD5770 | 80GB Intel x25-m |
    HTPC1-- Q9450 | Asus P5E-VM | HD3450 | 1TB storage
    HTPC2-- QX9750 | Asus P5E-VM | 1TB storage |
    Car-- T7400 | Kontron mini-ITX board | 80GB Intel x25-m | Azunetech X-meridian for sound |


  19. #144
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by shiznit93 View Post
    So we might have to buy the 3 channel version w/ northbridge so we can oc?
    Right now there isn't a 3 Ch version with a Northbridge. There was talk of Legacy Chips with a North Bridge but these were what we call Celerons today.
    An IBM guy told me, "Meant for Budget or Light Duty Workstations." Even he admit that was still up in the air. Oh and NOT Socket 775 compatible.
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  20. #145
    Coat It with GOOOO
    Join Date
    Aug 2006
    Location
    Portland, OR
    Posts
    1,608
    It's not really a northbridge, it's more of a QPI hub and PCIe controller.
    Main-- i7-980x @ 4.5GHZ | Asus P6X58D-E | HD5850 @ 950core 1250mem | 2x160GB intel x25-m G2's |
    Wife-- i7-860 @ 3.5GHz | Gigabyte P55M-UD4 | HD5770 | 80GB Intel x25-m |
    HTPC1-- Q9450 | Asus P5E-VM | HD3450 | 1TB storage
    HTPC2-- QX9750 | Asus P5E-VM | 1TB storage |
    Car-- T7400 | Kontron mini-ITX board | 80GB Intel x25-m | Azunetech X-meridian for sound |


  21. #146
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by Blauhung View Post
    From what I understand, the QPI links do set the base frequency that the processor multiplier operates on to dictate the final clock speed. Most boards should have the option to scale the base frequency of the QPI links. So anything with Tylersburg as a QPI hub will be overclockable from that knob there.

    Questions still arise from how memory overclocking comes in. If one of the AMD guys could help me out as to how they setup their memory clocking it would be appreciated as I'm far to use to chipset straps and memory ratios. From my understanding, in K10, AMD uses memory ratios based off the HTT reference frequency ranging from 3:3 --> 3:8 for DDR2-400 --> DDR2-1066. Nehalem will most likely use a similar setup with the memory frequency being set off this base frequency.

    The problem comes up, that the mainstream stuff (Lynnfield/Havendale) are going to have both integrated PCI lanes and even graphics. They receive their clock frequency directly from the PLL and thus all control over the reference and clock splitting is all done on die. Since none of the big 2 have done this yet, mainstream overclocking might become a thing of the past depending on how it's implemented.
    Might not be the right thing to say on XtremeSystems.org but will Overclocking have the same affect? Or will extra MHz Scale as well? Performance increases talked about so far say 10 to 30% gains clock for clock. Will we worry about NOT being able to overclock? You'll have to overclock old Intel and AMD processors to keep with them.
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  22. #147
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    To summon up the northbridge, 3ch etc.

    Server/Extreme: Socket 1366, 3ch IMC, PCIe/QPI on Northbridge, ICH10 Southbridge.

    Performance/Mainstream: Socket 1160, 2ch IMC, ondie PCie, Ibexpeak Southbridge. (No Northbridge)

    Value/Budget: Socket 1160, 2ch IMC, ondie PCIe, ondie GPU(IGP), Ibexpeak Southbridge. (No Northbridge)

    Intel basicly removed all flexibility on the mainstream/performance. I big F U to nVidia and their locked chipsets for SLi.
    Crunching for Comrades and the Common good of the People.

  23. #148
    Xtreme Enthusiast
    Join Date
    Jan 2005
    Location
    Frederick, MD
    Posts
    513
    Right, i didnt mean northbridge literally. I was asking if we might have to overclock with socket 1366 to avoid a possible bottleneck from the pci-e controller.

    Donnie27, yes its nice that Nehalem will be faster clock for clock and that we probably won't have to oc as much to get similar performance, but thats not the issue. Say a 3.5ghz Penryn = 3.0ghz Nehalem for the sake of argument. To get a 3.5ghz Penryn you can buy one thats 2.4ghz stock for under $300 (probably alot cheaper by Q4) but if Nehalem can't OC then you will have to buy the full bone fide version for close to $1000. Big problem.
    Last edited by shiznit93; 03-19-2008 at 11:23 AM.
    Core i5 750 3.8ghz, TRUE 120 w/Panaflo M1A 7v
    ASRock P55 Deluxe
    XFX 5870
    2x2GB GSkill Ripjaw DDR3-1600
    Samsung 2233RZ - Pioneer PDP-5020FD - Hyundai L90D+
    Raptor WD1500ADFD - WD Caviar Green 1.5TB
    X-FI XtremeMusic w/ LN4962
    Seasonic S12-500
    Antec P182

  24. #149
    I am Xtreme
    Join Date
    Jul 2004
    Location
    Little Rock
    Posts
    7,204
    Quote Originally Posted by shiznit93 View Post
    Right, i didnt mean northbridge literally. I was asking if we might have to overclock with socket 1366 to avoid a possible bottleneck from the pci-e controller.

    Donnie27, yes its nice that Nehalem will be faster clock for clock and that we probably won't have to oc as much to get similar performance, but thats not the issue. Say a 3.5ghz Penryn = 3.0ghz Nehalem for the sake of argument. To get a 3.5ghz Penryn you can buy one thats 2.4ghz stock for under $300 (probably alot cheaper by Q4) but if Nehalem can't OC then you will have to buy the full bone fide version for close to $1000. Big problem.
    But who knows the Nehalem prices? I don't believe Intel will sell 2.4GHz Nehalems for $1000, not even half that amount Intel knows the Market has gotten way the hell to use to lower prices except for a few suckers buying Extreme Edition for E-Penos reasons. IMHO, there will be that $1299 model maybe even a second $700 model but the rest will be affordable and left out of this is almost nothing said about the Dual Core versions. The DC versions will be even cheaper, Have Hyperthreading 2 for 4 threads and totally Obliterate whatever few Dual Cores still on the market. No, I don't think there is NO WAY in hell Intel will pull an AMD X2 on us.
    Quote Originally Posted by Movieman
    With the two approaches to "how" to design a processor WE are the lucky ones as we get to choose what is important to us as individuals.
    For that we should thank BOTH (AMD and Intel) companies!


    Posted by duploxxx
    I am sure JF is relaxed and smiling these days with there intended launch schedule. SNB Xeon servers on the other hand....
    Posted by gallag
    there yo go bringing intel into a amd thread again lol, if that was someone droping a dig at amd you would be crying like a girl.
    qft!

  25. #150
    Xtreme Enthusiast
    Join Date
    Jan 2005
    Location
    Frederick, MD
    Posts
    513
    when did I say a 2.4ghz Nehalem will be $1000? I was implying that, assuming overclocking flexibility is gone in the mainstream version, a Nehalem with comparable performance to a oveclocked $300 Penryn (say 3.6ghz) will be much more expensive since you will have to buy the actual ghz you are shooting for.

    Here is a roadmap I saw recently, http://pc.watch.impress.co.jp/docs/2...gai403_03l.gif. The first Nehalem parts it shows are socket 1336 versions @ 3.2ghz and 3.0ghz, and if you look to the right at the price axis, they fall in the $800-1000 range. The mainstream parts don't show up until Q2 09, and they start with a 2.4ghz part in the $200-400 range. Assuming they don't OC and are 30% faster than Penyn clock for clock (generous assumption for the sake of argument), that still only makes the 2.4 Nehalem equivalent to a 3.2ghz Penryn which you can have right now for $300. Most of us here aren't memory bandwidth limited anyway (Nvidia wake the F up plz) so the IMC might not be enough of a reason to switch over if overclocking is bad.

    Many people here OC to set records or to Fold, WCG, etc... They could probably use the memory bandwidth and SMT even if the chip doesn't overclock well. I do it to play games and save money. If the price/performance isn't there on Nehalem then it makes no sense for me to buy one. But if there is still room to OC despite the IMC, QPI, and PCIe on die then I'll be very happy .
    Last edited by shiznit93; 03-19-2008 at 12:10 PM.
    Core i5 750 3.8ghz, TRUE 120 w/Panaflo M1A 7v
    ASRock P55 Deluxe
    XFX 5870
    2x2GB GSkill Ripjaw DDR3-1600
    Samsung 2233RZ - Pioneer PDP-5020FD - Hyundai L90D+
    Raptor WD1500ADFD - WD Caviar Green 1.5TB
    X-FI XtremeMusic w/ LN4962
    Seasonic S12-500
    Antec P182

Page 6 of 9 FirstFirst ... 3456789 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •