Page 4 of 29 FirstFirst 123456714 ... LastLast
Results 76 to 100 of 719

Thread: AMD cuts to the core with 'Bulldozer' Opterons

  1. #76
    Xtreme Member
    Join Date
    Nov 2008
    Posts
    117
    Is there Bulldozer wafer at CES, JF-AMD ?
    When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
    When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
    When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
    When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)
    by John Fruehe

  2. #77
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by JF-AMD View Post
    It's about cache efficiency. Today, when there is a cache miss, the thread stalls while the core waits for the data to be fetched from memory. While that thread is stalled, SMT will dump the cache, insert a new thread, run that, then return the cache contents for the old thread (that just got the memory data.)

    I know that is a REALLY simplistic description but should help you visualize.

    HT originally came about in P4 because they had a very long pipeline and one cache miss had lots of penalty associated. But as they shortened the pipeline (i.e. Core2) they tossed out HT because they no longer needed that band-aid.

    If you take that same logic and extend it, as a microarchitecture, you should always be striving to reduce cache misses as much as possible. As you reduce the misses, you increase the efficiency. That is good. But the cache misses give you the "opportunity" that you need for SMT to work. So as primary core efficeincy goes up, the SMT efficiency generally goes down.
    I know what pipeline stalls are. BTW such stall may have reasons other then only cache miss - unsucessfull branch prediction as example. But my question was not about cache misses or pipeline stalls, but about core efficiency (or, in another words, about utilization of core resources). Some time ago I saw some research about instruction parallelism. The main conclusion was that the average instruction rate per cycle in average program is about 2 instruction per cycle. Now Phenom has >6 execution units and 3 pipelines per core and Nehalem has 5 execution ports and 4 pipelines per core which mean the core resource utilization on those cpus is under 50% on average. I would say that the main advantage of HT is not the ability of running one thread when another stalls but the ability of running of both threads in parallel while each thread can schedule instructions to available execution units in any given cycle. This is the reason why nehalem having shorter pipeline and faster caches still benefits from HT more then Pentium4. Nehalem just have more execution resources.

    The ability for parallelism to increase has more to do with the OS schedulers for the most part. OS's deployed 3 years ago were written when single cores ruled the earth. OS's deployed today were focused more on dual core and even to a small extent quad core, so they do a better job of scheduling. OS's that you will use in 3 years will do much better than today's. It is all a progression. Saying you don't need more cores in the future because today's OS's don't utilize all of the cores is like saying that a 1TB drive is too big. Give people enough storage space and they will fill it. Give them enough cores and they will figure out how to use them.

    My notebook probably has 50 different services running (and 3-4 actual programs). There is always a use for more cores, the OS just needs to come along for the ride - and that will be happening.
    I did not speak about "thread level parallelism" but about "instruction level parallelism".

    Sorry for my bad english.

  3. #78
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by STaRGaZeR View Post
    JF-AMD, can you comment on Bulldozer's single thread perfomance? Or at least about the approach you've taken.
    I won't comment on single thread performance because I do not have the numbers. The bulldozer architecture is a multi-core, multithreaded architecture.

    Single core performance is going to become less important over time as applications become more multithreaded. I am on the server side, I don't deal with client systems or applications at all. For my customers single threaded performance is far less important because all of the apps are multi-threaded.

    The proper server metric is throughput, which is a measure of all cores, not the clock speed of one core.

    Customers that care about single threaded performance in servers would be staying with dual cores becuase those have the highest clocks. But we see customers abandoning duals in the server world like crazy.

    Bulldozer will have more cores than magny cours, but the increase in performance between the two will be higher than the percentage increase in cores, so, to answer in a roundabout way, the cores will be faster, not slower.

  4. #79
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    selling a slower ship in fact is a suicide. AMD don't want suicide.

    Bulldozer will be faster than laster CPU.

    CQFD.

  5. #80
    Xtreme Mentor
    Join Date
    Nov 2006
    Location
    Spain, EU
    Posts
    2,949
    Quote Originally Posted by JF-AMD View Post
    I won't comment on single thread performance because I do not have the numbers. The bulldozer architecture is a multi-core, multithreaded architecture.

    Single core performance is going to become less important over time as applications become more multithreaded. I am on the server side, I don't deal with client systems or applications at all. For my customers single threaded performance is far less important because all of the apps are multi-threaded.

    The proper server metric is throughput, which is a measure of all cores, not the clock speed of one core.

    Customers that care about single threaded performance in servers would be staying with dual cores becuase those have the highest clocks. But we see customers abandoning duals in the server world like crazy.

    Bulldozer will have more cores than magny cours, but the increase in performance between the two will be higher than the percentage increase in cores, so, to answer in a roundabout way, the cores will be faster, not slower.
    That's true, and from a server point of view that's perfectly acceptable. However, increases in single thread perfomance, that means increase of each core perfomance, will have a direct impact in throughput. Also the Bulldozer architecture is going to be in desktop and mobile too, where single thread perfomance is still much more important than multithreaded and I can't see that changing in a long time. If BD cores are not "significantly" faster than Shangai I can't see AMD changing today's situation, where the competition is faster in almost all levels, including the server space, where they're attacking you with less expensive (for them), better perf/watt, better absolute perfomance parts.
    Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)


    Quote Originally Posted by PerryR, on John Fruehe (JF-AMD) View Post
    Pretty much. Plus, he's here voluntarily.

  6. #81
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by STaRGaZeR View Post
    That's true, and from a server point of view that's perfectly acceptable. However, increases in single thread perfomance, that means increase of each core perfomance, will have a direct impact in throughput. Also the Bulldozer architecture is going to be in desktop and mobile too, where single thread perfomance is still much more important than multithreaded and I can't see that changing in a long time. If BD cores are not "significantly" faster than Shangai I can't see AMD changing today's situation, where the competition is faster in almost all levels, including the server space, where they're attacking you with less expensive (for them), better perf/watt, better absolute perfomance parts.
    i think the problem is the balance between tech that works great now, and tech that will last a long time.

    most people are still happy with their 4400+X2 they bought 5 years ago. i just upgraded my 4850 to a 5850 and saw no increase in WoW, i guess somehow that game is cpu limited even though my PII 940 never has one core above 70%. some things really just dont like scale today, but if everyone only offered quads and above, maybe software would accelerate faster. it would be nice to know the single threaded jump, since many consumer programs probably dont care to be re-written for quads and above. but i think if i got a 8 core BD, i should be set for hopefully 4 years before i would need to upgrade again.

  7. #82
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    @JF-AMD Is this a true approx ??



    Can the 128bit FMAC's can be combined to be used for 256bit operations??
    Coming Soon

  8. #83
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by vietthanhpro View Post
    Is there Bulldozer wafer at CES, JF-AMD ?
    no, no wafer yet

  9. #84
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by ajaidev View Post
    @JF-AMD Is this a true approx ??



    Can the 128bit FMAC's can be combined to be used for 256bit operations??
    That's a diagram of a QC Zambezi,yes. And the FPU(2 128bit units) in one module can be used by either one int core(256b) or both int cores sharing it a la SMT way(2x128b). Each of the 128b units in FPU is FMAC capable so in effect much much faster than what we have today in K10 or Nehalem.

  10. #85
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by ajaidev View Post
    @JF-AMD Is this a true approx ??



    Can the 128bit FMAC's can be combined to be used for 256bit operations??
    Sorry, that image is blocked for me.

    the 2 128-bit FMACs can be combined into a single 256-bit unit. This can be done on the fly, per cycle, so either integer core could have a dedicated 256-bit OR both could have 128-bit simultaneously.

  11. #86
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by STaRGaZeR View Post
    That's true, and from a server point of view that's perfectly acceptable. However, increases in single thread perfomance, that means increase of each core perfomance, will have a direct impact in throughput. Also the Bulldozer architecture is going to be in desktop and mobile too, where single thread perfomance is still much more important than multithreaded and I can't see that changing in a long time. If BD cores are not "significantly" faster than Shangai I can't see AMD changing today's situation, where the competition is faster in almost all levels, including the server space, where they're attacking you with less expensive (for them), better perf/watt, better absolute perfomance parts.
    The one thing I won't do is comment on the client space. I've been in servers for almost 20 years, so my frame of reference is a bit skewed.

    Bulldozer cores should be faster than Magny Cours cores.

  12. #87
    Xtreme Mentor
    Join Date
    Nov 2006
    Location
    Spain, EU
    Posts
    2,949
    Quote Originally Posted by Manicdan View Post
    i think the problem is the balance between tech that works great now, and tech that will last a long time.

    most people are still happy with their 4400+X2 they bought 5 years ago. i just upgraded my 4850 to a 5850 and saw no increase in WoW, i guess somehow that game is cpu limited even though my PII 940 never has one core above 70%. some things really just dont like scale today, but if everyone only offered quads and above, maybe software would accelerate faster. it would be nice to know the single threaded jump, since many consumer programs probably dont care to be re-written for quads and above. but i think if i got a 8 core BD, i should be set for hopefully 4 years before i would need to upgrade again.
    If you take your time and test some games you'll see how CPU limited they are where it matters the most, min FPS. Crysis (yes, Crysis) works exactly the same in almost all the first and second level with or without AA with my 5850. Play with CPU frequency and you'll see some surprising numbers. Not to mention any "old" game (FEAR is old here) is only single threaded, and the situation is exactly the same as Crysis. Walk into a crowd in Assassin's Creed and play with GPU and CPU power, more interesting numbers will arise. Getting constant 120FPS with VSync is almost impossible in the majority of games because of the CPU, and this is with the fastest Intel processor, overclocked.
    Friends shouldn't let friends use Windows 7 until Microsoft fixes Windows Explorer (link)


    Quote Originally Posted by PerryR, on John Fruehe (JF-AMD) View Post
    Pretty much. Plus, he's here voluntarily.

  13. #88
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by STaRGaZeR View Post
    If you take your time and test some games you'll see how CPU limited they are where it matters the most, min FPS. Crysis (yes, Crysis) works exactly the same in almost all the first and second level with or without AA with my 5850. Play with CPU frequency and you'll see some surprising numbers. Not to mention any "old" game (FEAR is old here) is only single threaded, and the situation is exactly the same as Crysis. Walk into a crowd in Assassin's Creed and play with GPU and CPU power, more interesting numbers will arise. Getting constant 120FPS with VSync is almost impossible in the majority of games because of the CPU, and this is with the fastest Intel processor, overclocked.
    i think what i need to play with is resolution, i bet i get the same fps at 1920x1200 as i do with 2560x1600 in WoW. the really annoying part with WoW, was that no matter what settings i lowered, the fps did not go up by much at all. turned off shadows, particle effects, projected textures, no-matter what, i could only get maybe 30-40% more frames, with the settings severely reduced. not sure what a cpu does for WoW, but that game sure could use an update (or a bigger monitor, i so wish i could play with eyefinity before i have to give the card up)

  14. #89
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Thanks informal and JF-AMD for clearing that up, now i was thinking about how is STORE used in bulldozer i mean if say a 256bit operation has to be stored can this be done in one cycle or will it need two cycles.

    Is 256bit STORE possible in one cycle and does it have to be broken into two 128bit STORE or not?
    Coming Soon

  15. #90
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    My degree is in economics, you're outside my range on that one. However, I seem to recall the answer is yes, but there is no knowledge behind my answer, only overhearing a similar response from an engineer in a meeting.

  16. #91
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by JF-AMD View Post
    My degree is in economics, you're outside my range on that one. However, I seem to recall the answer is yes, but there is no knowledge behind my answer, only overhearing a similar response from an engineer in a meeting.
    no problem thanks anyway, will wait and see if something pops up in CES "well one can hope "

    Dam i so want to work for AMD, my dream job lol.
    Coming Soon

  17. #92
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    You won't hear anything about bulldozer from CES. This far out from launch, all news will come out of AMD Analyst day presentations. We do them twice a year, spring and fall. The fall was just in november and that is where the bulk of the Bulldozer data came from. Also keep an eye on my blogs, I will talk about bulldozer from time to time and there will be data clarifications.

  18. #93
    Xtreme Member
    Join Date
    May 2005
    Posts
    159
    JF,
    Do you have any information on AMD turbo implementation on bulldozer?
    Do you also have any information on the type of l3 cache bulldozer will use?

    Cheers .
    Quote Originally Posted by Movieman
    been lots of years since I played with an AMD and this is just an hour so bear with me..
    My first thoughts on it is that it's fast, it's smoothe and it's fun.
    Quote Originally Posted by Movieman
    Yes, the i7 does have the edge in pure grunt but then again the AMD has that little something I can't quite put my finger on except to use that word 'smoother" and that will get me flamed faster than posting kiddy :banana::banana::banana::banana: on the Christian networks site.
    Main Rig: Phenom II 550 (x4) @3.9Ghz - Gigabyte 6950@6970 - Asus M4A-785D M Pro - Samsung HDs 2x2TB,1x1.5TB,2x1TB - Season X-650 | OpenCL mining rigs: 2x Phenom II 555(x4) - 1xMSI 890FXA-GD70 - 1xGB 990FXA-UD7 (SICK ) - 1xHD6990 - 1x6950@70 - 6x5850 - 2xCooler Master Silent Pro Gold 1kW

  19. #94
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by JkS View Post
    JF,
    Do you have any information on AMD turbo implementation on bulldozer?
    Do you also have any information on the type of l3 cache bulldozer will use?

    Cheers .
    There will be a turbo-type function in Bulldozer. I am not a big fan of features like this (and have said so on many occasions) because most server customers want lower power more than higher performance. I get more people asking me how to downclock the processor (we have a feature that turns off P-States) to reduce power consumption. I can't give details but I believe we have a better implementation.

    As for the caches we have not released cache size data, that typically comes out at launch. Releasing cache sizes makes it easier for the other guys to try to model performance. The less they know, the better.

  20. #95
    Xtreme Member
    Join Date
    Nov 2007
    Posts
    151
    Quote Originally Posted by JF-AMD View Post
    There will be a turbo-type function in Bulldozer. I am not a big fan of features like this (and have said so on many occasions) because most server customers want lower power more than higher performance. I get more people asking me how to downclock the processor (we have a feature that turns off P-States) to reduce power consumption. I can't give details but I believe we have a better implementation.
    Lower power P-states make a ton of sense. The interesting thing is that all of these chips were always capable of running at more states than listed in the Thermal Spec guide. I never understood why AMD chose to limit it and hide all of the P-state setup in the BIOS. I reprogrammed the ACPI tables on all of my Opteron systems to add a new lower power state. It was easy enough once I had read enough documentation, but there's no good reason for this stuff to be hidden from users like that. (Of course, I'm also of the opinion that hiding *anything* is wrong, in the BIOS or wherever. Open source firmware, please.)

    AMD's OverDrive software for desktop/consumers was a step forward, making tweaks more accessible. But *Under*drive is more interesting, for machines that are on 24/7 but not always crunching at full load.

  21. #96
    Xtreme Addict
    Join Date
    Feb 2006
    Location
    northern ireland
    Posts
    1,008
    Welcome JF-AMD, I have read you posts on AMDZone over time and you seem to bring some good info to the forums but I think it is only fair that you inform people that you are a marketing guy at AMD so when you post here you are working and naturally make things look good for AMD awful for Intel.

  22. #97
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    @gallag He is doing the same thing Mark does for Intel, noting wrong in that. Given that he cleared some doubts i had about 256bit execution i would say its a good thing to have marketing ppl with some inside knowledge otherwise its ppl claiming and curses...
    Coming Soon

  23. #98
    Xtreme Addict
    Join Date
    Feb 2006
    Location
    northern ireland
    Posts
    1,008
    Quote Originally Posted by ajaidev View Post
    @gallag He is doing the same thing Mark does for Intel, noting wrong in that. Given that he cleared some doubts i had about 256bit execution i would say its a good thing to have marketing ppl with some inside knowledge otherwise its ppl claiming and curses...
    I agree, I just think that if you work for marketing for any company be it Intel, Nvidia,ati,AMD then you should say so.

  24. #99
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    Maybe all industry reps would just add it in their signature.

    "I work for company X as Y"

    Would be good.


    Is bulldozer supposed to be compatibel with 800 series motherboards?

  25. #100
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    i hope to see in near future 900 series with dual CPU, and lucid onboard.

Page 4 of 29 FirstFirst 123456714 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •