Page 11 of 17 FirstFirst ... 891011121314 ... LastLast
Results 251 to 275 of 403

Thread: AMD to Disclose Details About Bulldozer Micro-Architecture in August

  1. #251
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by madcho View Post
    I would be disapointed serously if AMD choose to take an API for use the accelerator.

    The best was to "simply" extand x86-64 to a new instruction set.
    API's provide a level of abtraction that is very useful for targeting different hardware. the only reason to use x86 would be for saving costs on designing the ISA. x86-64 is essentially x86 with 2x registers and 64bit support. at its core it is still plagued with the performance issues of x86. it's very wasteful and consequentially instruction decoding is a bottleneck in virtually every x86 cpu.

    what you have suggested has sort of been done with SSE. i have wondered what a cpu that just did SSE would be capable of in terms of performance and compatibility, it's probably not so great. when you have that much compute density you really need to save memory bandwidth with streaming. SSE doesnt handle that as well as it could.
    Yes not an easy work, but would be a lot faster to get performance improvement, update compilers and re-compile code.

    And kill that f*cking x87, free some die space.
    you dont update a compiler for a new architecture or in some cases even extensions, you start from scratch. that's why compilers take forever to mature.
    And JF : directcompute is proprietary.
    which has advantages over an open standard such as being much faster to support new features, no board of people filibustering API's = things get done.

    also directcompute is arguably more vender neutral that opencl or opengl. khronos group allows for proprietary extensions in OGL&OCL where as DC doesnt. nvidia sort of abuses the extension system which is no surprise.

  2. #252
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by gOJDO View Post
    Yeap. Obviously I'm just pi$$ing off some die-hard AMD fanboys.

    This thread is funny. It's a 10 pages waste in the database, so I'm adding fuel for the next 10

    @informal
    Intel is better than AMD
    No your not. This isn't the first time but it's the last.
    You are app 30 seconds from being past history on this forum.
    Sayonara!
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  3. #253
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Since directcompute will run on intel or AMD and run on the ~90% of the market that runs windows, it doesn't rank as proprietary in my mind.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  4. #254
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    Quote Originally Posted by JF-AMD View Post
    Since directcompute will run on intel or AMD and run on the ~90% of the market that runs windows, it doesn't rank as proprietary in my mind.
    It's compatible with 95% of the market, but it's not open so it's proprietary.

    Yeah it's good writing code/cpu/gpu compatible with market, but it's not a reason to say, we want do open standart, and in real it's not true.

    And now steam is available on linux, with some real games. And if we want switch linux ? This is not really possible, because ATI's drivers a really bad on that os.

    AMD need improve communication for sure.

  5. #255
    Xtreme Member
    Join Date
    Jan 2004
    Location
    uk
    Posts
    159
    Quote Originally Posted by hollo View Post
    that's the programmer's fault, not intel's
    how can a app programmer fix that?

    htt shows up as extra cpu's in the OS. Can maybe lie the blame in the OS in that it should show a difference between a virtual cpu and real cpu, but things like mysql get told its a real cpu.

  6. #256
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by Chrysalis View Post
    how can a app programmer fix that?

    htt shows up as extra cpu's in the OS. Can maybe lie the blame in the OS in that it should show a difference between a virtual cpu and real cpu, but things like mysql get told its a real cpu.

    its the programer who has to code for intel's HT implementation iirc

    its the same as intel implementing AVX or amd implementing it... no program will benefit from its use until they code for it .....
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  7. #257
    Xtreme Enthusiast
    Join Date
    Nov 2009
    Posts
    526
    You cant be serious. HT aint anything like AVX or other extension. For HT you should not need to program other than you program your program to use multiple cores. If HT is broken in your workload, then switch it off.

    And no, its not programmers fault. It is Intels fault.

  8. #258
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Actually for HT, you do want to program for it. Just telling it that you have more cores is a fiasco because you'll end up scheduling too many threads to one core while other cores are sitting idle.

    Remember that with HT, 2 cores do not execute at the same time, one has to wait for the other. So if you had a quad core with HT and you wanted to schedule 4 threads, you would put them all on different cores for the best speed. If you just said take the first 4 cores, you would get two threads sharing the first core, 2 sharing the second, and two idle cores.

    In a server workload, HT gives you a 10-20% increase in throughput, so do you want 2 threads sharing for that 10-20% increase, or do you want all of your threads on individual cores?
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  9. #259
    Xtreme Enthusiast
    Join Date
    Mar 2005
    Posts
    644
    There are very good examples of that, but that is because Windows XP Task Scheduler lacks intelligence. When Intel had the Pentium D Smithfield and a Pentium Extreme Edition of it that had Hyper Threading (Core and Frequency being the same), the Pentium D was faster in all applications with 2 Threads because Windows XP in the EE usually assingned them to the first two Cores, that were the Processor first Physical Core and its own Logical Core, leaving the entire second Core on Idle.
    I don't think that you should need to program with Hyper Threading in mind, but this happens because you should make sure than your application is aware of Hyper Threading to not use Logical Cores before Physical Cores are fully used, however, that should be a workaround to compensate for the Task Scheduler stupidity of not knowing what Core is Physical and what one is Logical, and assign the Threads first to the Physical and then to the Logicals. If the Task Scheduler was more intelligent, you shouldn't actually need Hyper Threading specific considerations because the OS would take care of it.

  10. #260
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    That just shows you that Windows XP is an old (and obsolete) OS. More modern OS's like Windows 7 and Linux handle that much better.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  11. #261
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Quote Originally Posted by JF-AMD View Post
    Remember that with HT, 2 cores do not execute at the same time, one has to wait for the other. So if you had a quad core with HT and you wanted to schedule 4 threads, you would put them all on different cores for the best speed. If you just said take the first 4 cores, you would get two threads sharing the first core, 2 sharing the second, and two idle cores.
    Just some nitpicking here:
    What you describe is "fine-grain temporal multithreading" (Wiki) with one thread per pipeline stage and clock cycle. This is true only for some parts of a SMT pipeline. Especially some of the execution units could execute thread 0 and the others could be used for thread 1 in the same clock cycle. But resource contention could happen easily.
    Now on Twitter: @Dresdenboy!
    Blog: http://citavia.blog.de/

  12. #262
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by zir_blazer View Post
    There are very good examples of that, but that is because Windows XP Task Scheduler lacks intelligence. When Intel had the Pentium D Smithfield and a Pentium Extreme Edition of it that had Hyper Threading (Core and Frequency being the same), the Pentium D was faster in all applications with 2 Threads because Windows XP in the EE usually assingned them to the first two Cores, that were the Processor first Physical Core and its own Logical Core, leaving the entire second Core on Idle.
    I don't think that you should need to program with Hyper Threading in mind, but this happens because you should make sure than your application is aware of Hyper Threading to not use Logical Cores before Physical Cores are fully used, however, that should be a workaround to compensate for the Task Scheduler stupidity of not knowing what Core is Physical and what one is Logical, and assign the Threads first to the Physical and then to the Logicals. If the Task Scheduler was more intelligent, you shouldn't actually need Hyper Threading specific considerations because the OS would take care of it.
    I call BS on that since windows xp is aware of HT cores.
    http://download.microsoft.com/downlo...ad_Windows.doc

    Read this doc, it describes how the thread assigning in xp/2003 works. Since Windows XP the scheduler tries to always utilize physical idle cores, before logical cores. But if the software your using is hardcoded to use certain cores, without asking the os which cores are smt cores and which not, its not the fault of the os. Since it provides the possibility to do that.
    Last edited by Hornet331; 06-28-2010 at 04:51 AM.

  13. #263
    Xtreme Enthusiast
    Join Date
    Mar 2005
    Posts
    644
    Quote Originally Posted by Hornet331 View Post
    I call BS on that since windows xp is aware of HT cores.
    http://download.microsoft.com/downlo...ad_Windows.doc

    Read this doc, it describes how the thread assigning in xp/2003 works. Since Windows XP the scheduler tries to always utilize physical idle cores, before logical cores. But if the software your using is hardcoded to use certain cores, without asking the os which cores are smt cores and which not, its not the fault of the os. Since it provides the possibility to do that.
    I know that WXP is supposed to be HT aware, yet still there were quite many instances where it performed significally worse with it enabled. I recall that PEE (Pun intended) was quite bashed due that 5 years ago.
    I have been looking for those Benchmarks, but I wasn't able to find them on the popular reviewers (Anandtech, X-Bit Labs, Tech Report) where it looks to perform basically the same if the application uses just two Threads. There were at least one Article that I saw that reported issues and why, but don't recall where I saw it.

  14. #264
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    Quote Originally Posted by JF-AMD View Post
    Actually for HT, you do want to program for it. Just telling it that you have more cores is a fiasco because you'll end up scheduling too many threads to one core while other cores are sitting idle.

    Remember that with HT, 2 cores do not execute at the same time, one has to wait for the other. So if you had a quad core with HT and you wanted to schedule 4 threads, you would put them all on different cores for the best speed. If you just said take the first 4 cores, you would get two threads sharing the first core, 2 sharing the second, and two idle cores.

    In a server workload, HT gives you a 10-20% increase in throughput, so do you want 2 threads sharing for that 10-20% increase, or do you want all of your threads on individual cores?
    What a load of CRAP. There is so much wrong with these statements I just don't know where I should begin.

    First, The thread scheduler of the OS takes care of this, you NEVER have to code your application to say what "core" you want it to run on.

    Good thread schedulers fill up "real" cores before they "double up" to an HT core. That's just the way of things, and has been for a very long time. Since Win Server 2003 and Linux 2.6.x at the very least.

    Second, A "virtual" core never "waits" on the real core, or vice versa, to complete it's computations. TWO THREADS can be pushed down the same "real + virtual" core at the same time.

    XBitLabs has the best Diagram I have seen for this:



    Is the solution proposed in BD better? Likely. But does Intel's solution improve overall IPC and resource utilization? Absolutely.



    I run servers both with AMD Istanbuls and Intel Nehalems in a cloud environment and push the limits of how many threads can be concurrently run on very very high-end hardware. It's how I make money - how many VM Servers can we push onto a real server without degrading performance.

    I love the AMDs because they are CHEAP, low power, and do the job WELL, but I'll be frank with you right here and now: Even fully loaded on all "real" and "virtual" cores, the Nehalem cloud server runs circles around the Istanbul ones that we have deployed in terms of number of customers that can be crammed onto the system without "overloading" the CPU resources. So while the Intel system costs more, on other metrics, it costs me less: LESS power per customer consumed, MORE customers per 2U of rack space (less datacenter costs). So in the end, it is a about a wash from my perspective.



    JF-AMD - Just because you have a particular agenda to push, please stop spreading FUD about the competition when it is clear you don't have the technical background to do so. Not trying to be rude, but you yourself have said you are a marketing guy, not an engineer.

  15. #265
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by mstp2009 View Post
    What a load of CRAP. There is so much wrong with these statements I just don't know where I should begin.

    First, The thread scheduler of the OS takes care of this, you NEVER have to code your application to say what "core" you want it to run on.


    Good thread schedulers fill up "real" cores before they "double up" to an HT core. That's just the way of things, and has been for a very long time. Since Win Server 2003 and Linux 2.6.x at the very least.

    Second, A "virtual" core never "waits" on the real core, or vice versa, to complete it's computations. TWO THREADS can be pushed down the same "real + virtual" core at the same time.


    XBitLabs has the best Diagram I have seen for this:



    Is the solution proposed in BD better? Likely. But does Intel's solution improve overall IPC and resource utilization? Absolutely.
    .................................................. ........
    Ahhhh you should also do a little checking....

    Thread scheduler issue's priority levels yes but those 32's can overflow quite easily and then what? The second core is used thats is one of the reasons HT has a negative impact on some programs.

    The virtual core has no resources of its own, so it shares the real cores resources. Now when a specific amount of resources are in use the virtual thread can not be initialized until a resources are free.
    Last edited by ajaidev; 06-28-2010 at 06:20 AM.
    Coming Soon

  16. #266
    Xtreme Addict
    Join Date
    May 2005
    Posts
    1,341
    Quote Originally Posted by mstp2009 View Post
    What a load of CRAP. There is so much wrong with these statements I just don't know where I should begin.

    First, The thread scheduler of the OS takes care of this, you NEVER have to code your application to say what "core" you want it to run on.

    Good thread schedulers fill up "real" cores before they "double up" to an HT core. That's just the way of things, and has been for a very long time. Since Win Server 2003 and Linux 2.6.x at the very least.

    Second, A "virtual" core never "waits" on the real core, or vice versa, to complete it's computations. TWO THREADS can be pushed down the same "real + virtual" core at the same time.

    XBitLabs has the best Diagram I have seen for this:

    Is the solution proposed in BD better? Likely. But does Intel's solution improve overall IPC and resource utilization? Absolutely.



    I run servers both with AMD Istanbuls and Intel Nehalems in a cloud environment and push the limits of how many threads can be concurrently run on very very high-end hardware. It's how I make money - how many VM Servers can we push onto a real server without degrading performance.

    I love the AMDs because they are CHEAP, low power, and do the job WELL, but I'll be frank with you right here and now: Even fully loaded on all "real" and "virtual" cores, the Nehalem cloud server runs circles around the Istanbul ones that we have deployed in terms of number of customers that can be crammed onto the system without "overloading" the CPU resources. So while the Intel system costs more, on other metrics, it costs me less: LESS power per customer consumed, MORE customers per 2U of rack space (less datacenter costs). So in the end, it is a about a wash from my perspective.



    JF-AMD - Just because you have a particular agenda to push, please stop spreading FUD about the competition when it is clear you don't have the technical background to do so. Not trying to be rude, but you yourself have said you are a marketing guy, not an engineer.
    Virtualization usage with SMT or NOT depends on application type, when you have several applications with constant cpu load and are depending on each other, forget about HT, scheduling fails on those HT cores and performance decreases. When you have a bunch of small independent Vm's sure go ahead with HT it will increase the consolidation.

    On the Istanbul vs Nehalem, what cat do you drag in, this is know for a year by now, Istanbul just didn't have the cpu speed to tackle the high-end Nehalem. Try again with few MC and see what virtualization beasts these are with much lower price and handle more memory.
    Quote Originally Posted by Movieman View Post
    Fanboyitis..
    Comes in two variations and both deadly.
    There's the green strain and the blue strain on CPU.. There's the red strain and the green strain on GPU..

  17. #267
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by mstp2009 View Post
    Even fully loaded on all "real" and "virtual" cores, the Nehalem cloud server runs circles around the Istanbul ones that we have deployed in terms of number of customers that can be crammed onto the system without "overloading" the CPU resources.
    If I had posted up that Magny Cours kicks Harpertown to the curb there would be 100 people posting up that it is not a fair comparison.

    I recognize that you are making comparisons off of your own servers, but let's be clear here, you can always find some data point at some place in time to back up your assumptions. But today's discussion is not about Nehalem and Istanbul, it is about Magny Cours and Westmere. And in that arena, we are doing fine in performance, power consumption, and most importantly, price.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  18. #268
    Xtreme Enthusiast
    Join Date
    Apr 2007
    Posts
    772
    Quote Originally Posted by ajaidev View Post
    Ahhhh you should also do a little checking....

    Thread scheduler issue's priority levels yes but those 32's can overflow quite easily and then what? The second core is used thats is one of the reasons HT has a negative impact on some programs.

    The virtual core has no resources of its own, so it shares the real cores resources. Now when a specific amount of resources are in use the virtual thread can not be initialized until a resources are free.
    That is of course a given. Anyone overriding the core assignment should REALLY know what they are doing, and that is a special case. The discussion was on DEFAULT behaviour on systems where core assignment is based on thread scheduler.

    Quote Originally Posted by duploxxx View Post
    Virtualization usage with SMT or NOT depends on application type, when you have several applications with constant cpu load and are depending on each other, forget about HT, scheduling fails on those HT cores and performance decreases. When you have a bunch of small independent Vm's sure go ahead with HT it will increase the consolidation.

    On the Istanbul vs Nehalem, what cat do you drag in, this is know for a year by now, Istanbul just didn't have the cpu speed to tackle the high-end Nehalem. Try again with few MC and see what virtualization beasts these are with much lower price and handle more memory.

    Sorry, but MC and Westmere were too expensive as of 2 mos ago when we did our latest deployments. We will re-evaluate them at the next expansion (say 3 mo or so).

  19. #269
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by ajaidev View Post
    Ahhhh you should also do a little checking....

    Thread scheduler issue's priority levels yes but those 32's can overflow quite easily and then what? The second core is used thats is one of the reasons HT has a negative impact on some programs. The virtual core has no resources of its own, so it shares the real cores resources. Now when a specific amount of resources are in use the virtual thread can not be initialized until a resources are free.

    Do you have some data to show where Nehalem HT actually has a negative impact ? And I don't mean single or pseudo-threaded game engines.

    HT allows first and utmost an increase in throughput. You can have a core with HT disabled which does 100 work units per thread in a given time frame. You enable 2 thread HT and now you do 70 work units per thread in the same amount of time.

    Does HT has a negative impact ? From a thread point of view yes, you're 30% slower per thread. But from a workload point of view ? No, you've done 40% more work ( 2x70=140 work units ).

    Especially in Nehalem ( in P4 HT did not have that many units to start with, its main task was to hide memory latency ), HT is a definite plus.

    AMD's approach in BD is totally different.A BD module is basically a souped up core with double the INT units or conversely, it's a module with 2 INT cores and a shared FP unit.
    Their ideea is that it's not worth tinkering with the core itself, but simply cramming more cores ( or clusters if you want ) on the same die. Improvements in process tech allows you to put 6-10 cores in a reasonable die area, next process is 12-16, than 30 and so on. Why bother with SMT when you'll end up with dozens of "real" cores, as many as the number of threads today ? When you're resources are more limited, it's not worth doing SMT. Simply do CMT, copy and paste as many cores as possible on a die and you're done.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  20. #270
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    linpack is so arithmetic intensive that all HT does is cut the amount of cache in half. it's just overhead. the amount of operands accessed from main memory is usually under 1%.

    http://www.anandtech.com/show/3470

    on the other hand HT can increase performance by up to 70%. with the 3rd channel it's easily 2x faster than other 45nm processors.
    http://blogs.sun.com/4HPCISVs/entry/..._on_nehalem_an

  21. #271
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by savantu View Post
    Do you have some data to show where Nehalem HT actually has a negative impact ? And I don't mean single or pseudo-threaded game engines.

    HT allows first and utmost an increase in throughput. You can have a core with HT disabled which does 100 work units per thread in a given time frame. You enable 2 thread HT and now you do 70 work units per thread in the same amount of time.

    Does HT has a negative impact ? From a thread point of view yes, you're 30% slower per thread. But from a workload point of view ? No, you've done 40% more work ( 2x70=140 work units ).

    Especially in Nehalem ( in P4 HT did not have that many units to start with, its main task was to hide memory latency ), HT is a definite plus.

    AMD's approach in BD is totally different.A BD module is basically a souped up core with double the INT units or conversely, it's a module with 2 INT cores and a shared FP unit.
    Their ideea is that it's not worth tinkering with the core itself, but simply cramming more cores ( or clusters if you want ) on the same die. Improvements in process tech allows you to put 6-10 cores in a reasonable die area, next process is 12-16, than 30 and so on. Why bother with SMT when you'll end up with dozens of "real" cores, as many as the number of threads today ? When you're resources are more limited, it's not worth doing SMT. Simply do CMT, copy and paste as many cores as possible on a die and you're done.
    Here are some examples:

    http://blogs.amd.com/work/2010/01/21...out-the-cores/

    Also, look at LINPACK (HPC) for another example of turning off HT and getting higher throughput.

    Games are different than servers. Games generally have a lot more gaps in processing so they take better advantage of HT than servers.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  22. #272
    Banned
    Join Date
    Jan 2010
    Posts
    263
    Quote Originally Posted by JF-AMD View Post
    Here are some examples:

    http://blogs.amd.com/work/2010/01/21...out-the-cores/

    Also, look at LINPACK (HPC) for another example of turning off HT and getting higher throughput.

    Games are different than servers. Games generally have a lot more gaps in processing so they take better advantage of HT than servers.
    Frankly, this is a non-issue. All servers can/are configured/customized to maximize efficiency in whatever software environment they are deployed. If HT hurts performance in a particular software environment, it's simply turned off. This is not even a question of it does more good than harm so let's keep it on. It is configured to maximize output right off the bat.

    One other thing, Intel is capable of putting more cores on a silicon too (with HT, of course) so what's AMD going to do when Intel matches your cores?
    Last edited by OhNoes!; 06-28-2010 at 09:58 AM.

  23. #273
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    how big is a SMT core or module vs a CMT core or module on 32nm? ignoring L3

  24. #274
    Banned
    Join Date
    Jan 2010
    Posts
    263
    Quote Originally Posted by Manicdan View Post
    how big is a SMT core or module vs a CMT core or module on 32nm? ignoring L3
    With regard to what uarchs?

    Edit: Bulldozer/Westmere of course. I don't have the numbers, but it all has do with design complexity and tdp. Intel already has 48 cores on a die with a 125w tdp. Also, larabbee design should give us a big clue what Intel is capable of.
    Last edited by OhNoes!; 06-28-2010 at 10:17 AM.

  25. #275
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    the ones that make the most sense to compare

Page 11 of 17 FirstFirst ... 891011121314 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •