Page 12 of 17 FirstFirst ... 29101112131415 ... LastLast
Results 276 to 300 of 403

Thread: AMD to Disclose Details About Bulldozer Micro-Architecture in August

  1. #276
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    1 BD module will probably end up the same size or smaller than one SB core with L2. As long as AMD manages to cover or stay close to the total number of intel's threads(cores with SMT) they will do great.

  2. #277
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    well, AMD basically destroyed the meaning of the word module. it generally refers to smaller units that make up a core. they have well defined functionality (i.e. a NAND/NOR/XOR gate or an adder/multiplier). an AMD "module" is a core. from the looks of CMT it is probably bigger than an SMT counterpart assuming that everything else about the cores are constant. the thing is there is more potential performance gains from CMT. it's basically hyperthreading but more aggressive, with the possibility of a better risk/reward ratio.

    in the end the nonsense arguments against hyperthreading will become the basis of intel fanboys' nonsense arguments against CMT, a full circle.

  3. #278
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    question:
    Is BD going to be compatable with the current MC dual socket boards?
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  4. #279
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by Chumbucket843 View Post
    well, AMD basically destroyed the meaning of the word module. it generally refers to smaller units that make up a core. they have well defined functionality (i.e. a NAND/NOR/XOR gate or an adder/multiplier). an AMD "module" is a core. from the looks of CMT it is probably bigger than an SMT counterpart assuming that everything else about the cores are constant. the thing is there is more potential performance gains from CMT. it's basically hyperthreading but more aggressive, with the possibility of a better risk/reward ratio.

    in the end the nonsense arguments against hyperthreading will become the basis of intel fanboys' nonsense arguments against CMT, a full circle.
    exactly what im trying to get too, everything has their pros and cons

    its simple: smaller and possible faster
    its complex: bigger but has better IPC

    in the end its going to be pretty damn close in terms of power requirement for the performance, and the only thing left is pricing.

  5. #280
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Movieman View Post
    question:
    Is BD going to be compatable with the current MC dual socket boards?
    Yes . You can plug in 2 Interlagos (16cores) beasts in your board ,next year .

    Quote Originally Posted by Chumbucket843 View Post
    well, AMD basically destroyed the meaning of the word module. it generally refers to smaller units that make up a core. they have well defined functionality (i.e. a NAND/NOR/XOR gate or an adder/multiplier). an AMD "module" is a core. from the looks of CMT it is probably bigger than an SMT counterpart assuming that everything else about the cores are constant. the thing is there is more potential performance gains from CMT. it's basically hyperthreading but more aggressive, with the possibility of a better risk/reward ratio.

    in the end the nonsense arguments against hyperthreading will become the basis of intel fanboys' nonsense arguments against CMT, a full circle.
    I haven't seen where generally module refers to smaller units inside a core.AMD defines a module as an "optimized dual core" . The optimized part refers to the fact that the (super beefed up) frond end is shared among dedicated execution units and is able to maximize the work done by catching peaks and values and cramming as much instructions in the the same units as possible at the time.
    Also I'm not sure how you come to the conclusion that one module(2 cores) will be probably bigger than a SB core,which already in Nehalem generation is huge compared to one 10h core inside Shangai. Int exec. units are relatively small compared to the say FPU or the rest of the core. The fpu unit will be rather large,but will be very strong too,thanks to the FMAC capability that can dramatically improve performance.It's also shared so it can behave in SMT-like manner too.

  6. #281
    Banned
    Join Date
    May 2006
    Location
    Brazil
    Posts
    580
    Quote Originally Posted by informal View Post
    1 BD module will probably end up the same size or smaller than one SB core with L2. As long as AMD manages to cover or stay close to the total number of intel's threads(cores with SMT) they will do great.
    smaller than ~20mm˛?

    I dont think so

  7. #282
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by -Sweeper_ View Post
    smaller than ~20mm˛?

    I dont think so
    One Llano core with L2 is sub 10mm2 .That's a full dedicated core with no parts shared with other cores(dedicated L2 too).One BD module that will have parts shared(front end,L2),even with increased logic(invested to improve per core IPC) will probably land under 20mm2 .
    Last edited by informal; 06-28-2010 at 11:33 AM.

  8. #283
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by informal View Post
    yes . You can plug in 2 interlagos (16cores) beasts in your board ,next year .

    Now if we can only get them to 3+GHz..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  9. #284
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Movieman View Post

    Now if we can only get them to 3+GHz..
    32 cores at your disposal,with FMAC capable FPU .
    If the boards support minor HTT bumps it should be doable.But who knows,maybe they will ship with clocks close to the ones you desire so you wouldn't have to bother . But somehow if that happens,I suppose you will just up the desired target to 4+Ghz ,won't you?

  10. #285
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by informal View Post
    32 cores at your disposal,with FMAC capable FPU .
    If the boards support minor HTT bumps it should be doable.But who knows,maybe they will ship with clocks close to the ones you desire so you wouldn't have to bother . But somehow if that happens,I suppose you will just up the desired target to 4+Ghz ,won't you?
    Of course..This is XtremeSystems!
    Slightly OT but that dual MC system I have although running "slow" at 1900mhz is what you'd call a "good" system.
    Max of 39-41C at 100% load in a 72-74F room.
    Stable, no hiccups at all.
    In terms of production in WCG it's equal to my dual Westmere quad E5640's with HT turned on..
    That has 16 threads at 3022( slight OC from 2800 default)
    Still tweaking those Intel quads..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  11. #286
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by Movieman View Post
    question:
    Is BD going to be compatable with the current MC dual socket boards?
    Yes.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  12. #287
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by informal View Post
    I haven't seen where generally module refers to smaller units inside a core.
    http://lmgtfy.com/?q=define%3A+module
    AMD defines a module as an "optimized dual core" . The optimized part refers to the fact that the (super beefed up) frond end is shared among dedicated execution units and is able to maximize the work done by catching peaks and values and cramming as much instructions in the the same units as possible at the time.
    i dont care what AMD refers to it as. core is completely independent. the ALU('s), which should NOT be the focus of an architecture, are dependent on the rest of the core to fetch, retire, and do everything else besides arithmetic. the front end has only been beefed up 33%. that's not enough to feed double the resources.
    Also I'm not sure how you come to the conclusion that one module(2 cores) will be probably bigger than a SB core,which already in Nehalem generation is huge compared to one 10h core inside Shangai. Int exec. units are relatively small compared to the say FPU or the rest of the core. The fpu unit will be rather large,but will be very strong too,thanks to the FMAC capability that can dramatically improve performance.It's also shared so it can behave in SMT-like manner too.
    im not sure how you came to the conclusion that i was comparing sandy bridge to bulldozer. im not even talking about BD here.

    i simulated a processor with SMT and CMT. they are the same in every way except for the variable.

    btw, im not a fan of FMA. the tradeoffs are pretty ugly compared to a MADD ALU. you can make pure addition/multiplication faster but ruin the benefit of FMA or you can ruin the performance of pure + or * but make MADD slightly faster and accurate.

  13. #288
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by OhNoes! View Post
    If HT hurts performance in a particular software environment, it's simply turned off. This is not even a question of it
    Believe it or not, a good number of the customers that I have talked to had never considered turning off HT because they assumed it would give them more performance.

    Nowhere, that I have ever seen documented, has Intel said that it could result in lower performance. So, intuitively, if the vendor is telling you that it doubles the threads and gives you more performance, why would you try to turn it off to see if you got better performance?

    If it had been marketed differently I would buy your argument, but I have talked to a lot of customers who believe the opposite.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  14. #289
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Chumbucket843 View Post
    http://lmgtfy.com/?q=define%3A+module

    i dont care what AMD refers to it as. core is completely independent. the ALU('s), which should NOT be the focus of an architecture, are dependent on the rest of the core to fetch, retire, and do everything else besides arithmetic. the front end has only been beefed up 33%. that's not enough to feed double the resources.

    im not sure how you came to the conclusion that i was comparing sandy bridge to bulldozer. im not even talking about BD here.

    i simulated a processor with SMT and CMT. they are the same in every way except for the variable.

    btw, im not a fan of FMA. the tradeoffs are pretty ugly compared to a MADD ALU. you can make pure addition/multiplication faster but ruin the benefit of FMA or you can ruin the performance of pure + or * but make MADD slightly faster and accurate.
    Google jokes are getting old you know . AMD defines module as it is since it is not a core per se. There are dedicated ALU's and FPUs inside the module and the way it's organized is made on purpose(maximize the throughput of such a structure and minimize the core logic area investment,at the same time). The front end will not be the bottleneck as Chuck Moore stated on several occasions.
    As for the module size,I'm sorry my mistake.I've thought you have been comparing BD and SB.

  15. #290
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by JF-AMD View Post
    Believe it or not, a good number of the customers that I have talked to had never considered turning off HT because they assumed it would give them more performance.

    Nowhere, that I have ever seen documented, has Intel said that it could result in lower performance. So, intuitively, if the vendor is telling you that it doubles the threads and gives you more performance, why would you try to turn it off to see if you got better performance?

    If it had been marketed differently I would buy your argument, but I have talked to a lot of customers who believe the opposite.
    Your correct on this.
    HT has it's place but not all the time.
    Not that it's really relavant in this discussion but don't turn on HT on a dual westmere on Vantage or you'll see it come to a dead stop in the second cpu test and I mean dead stop.. No bluescreen just stop dead and thats with a machine that will score over 57,000 cpu with HT turned off in Vantage.
    I know, seen it with my own eyes..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  16. #291
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Chumbucket843 View Post
    linpack is so arithmetic intensive that all HT does is cut the amount of cache in half. it's just overhead. the amount of operands accessed from main memory is usually under 1%.

    http://www.anandtech.com/show/3470

    on the other hand HT can increase performance by up to 70%. with the 3rd channel it's easily 2x faster than other 45nm processors.
    http://blogs.sun.com/4HPCISVs/entry/..._on_nehalem_an
    You've shown the worst example for HT. Linpack uses Intel libraries which are assembler optimized, you cannot get any higher efficiency than Linpack on CPUs; obviously HT has nothing to help here.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  17. #292
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    HT is not a bad tech on wide super scalar cpu. P4 was a too long pipe and not really usefull because all apps where single threaded. But the advantage was to get some free cpu for system.

    Now HT was improved by intel and is better efficient on a 4 issue core. And the negative effect is "canceled" by turbo in benchmarks. But when cpus is burning, turbo in multithreads don't give back the performance loss, but it's easyer to use the marketing word, no more performance loss in HT disable.

    CMT seem very different of HT. I think AMD used the core 2 L2 cache in mind when they created the CMT.

    Share units that are very low used, or used in a low number of time to get 2 cores with less transistors. But AMD, didn't want loose in IPC in single thread, and they finnally get their cpu in 4 wide.

    I was disapointed when K10 didn't come with this.

    A CMT module is 2 cores more wide glued in parts not very hardly thermal used. And finnaly the L2 is shared by two cores, and i think this is a good idea. It's gonna improve perfomance of L2 with the reduce of bandwith used between cores caches. The most black point in actual Phenom II architecture.

    BD seem a good piece of silicon, We just need to wait a bit more, to see frequency, wattage, Ipc, and most important how much time will take to AMD to ramp up.

    And How much will be L2 ? 2Mo ? 1Mo ? 512ko ?

  18. #293
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by savantu View Post
    You've shown the worst example for HT. Linpack uses Intel libraries which are assembler optimized, you cannot get any higher efficiency than Linpack on CPUs; obviously HT has nothing to help here.
    i showed you exactly what i intended: hyperthreading causing negative performance. nice straw man though. here are some more.

    ASM optimizations are usually not that useful. the compilers designers know way more about optimizations than programmers (it's a big part of their job). they know when x87 is faster than SSE, what fp instructions have higher accuracy, what instruction have lowest latency, etc. well written source code with inline assembly can be faster than pure ASM, with less bugs, development time and more portable. MKL is still a great lib though.

  19. #294
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by savantu View Post
    You've shown the worst example for HT. Linpack uses Intel libraries which are assembler optimized, you cannot get any higher efficiency than Linpack on CPUs; obviously HT has nothing to help here.
    Well, technically, he showed you what you asked for, and then you dismissed it.

    If you are arguing that with perfectely optimized code, there will be no use for HT, then I agree with you. HT is designed to take advantage of gaps in the pipeline.

    What happens when code gets more optimized, does HT become less relevant?

    As code becomes more optimized, more physical cores will give you better throughput. As code becomes more optimized, HT would, theoretically, give you less throughput (or, more likely the same, but the main thread would be better and the HT thread would be worse, probably zero-sum game.)
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  20. #295
    Banned
    Join Date
    Jul 2004
    Posts
    1,125
    Perfectly optimized code != no HT-exploitable gaps

    It depends on what the code needs to do (dataset size, randomness of access, interdependency of accesses, etc), how wide (in different ways) the core is, etc. The "best" way to solve most problems will hardly ever result in a completely occupied core.

    On an earlier topic, BD "modules" will have the same scheduling issues as Hyperthreading presents. Perhaps to a lesser extent, but both architectures have thread-pairwise-shared resources. Either the hardware, the OS, or the application needs to manage the "moving this thread to a completely unoccupied core(HT case) or module(BD case) would be better" issue for maximum performance.

    (Consider two AVX(256) threads, or two independent threads that each would ideally have the whole module's L2 to itself, etc.)

  21. #296
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    even with optimizations some code is just inherently unfriendly. take physics simulations for example. nature is unpredictable thus temporal and spatial locality are gone which means cache is basically useless if your program uses more than 8MB.

  22. #297
    Registered User
    Join Date
    Aug 2007
    Location
    Michigan
    Posts
    8
    mstp2009,
    What a load of CRAP. There is so much wrong with these statements I just don't know where I should begin.

    First, The thread scheduler of the OS takes care of this, you NEVER have to code your application to say what "core" you want it to run on.
    First, it wouldn't hurt for you to be civil .... especially since you are wrong.

    First read this: http://www.intel.com/software/produc...d_affinity.htm

    I just did a quick search for some common API sets to do this with. If you do even a small amount of research, you can find quite a bit of information on this subject.

    Any decent threaded program will utilize these calls since it is processor intensive to switch threads around from core to core. If you can set your critical threads to have high affinity to a specific processor (or processors), you can achieve much better performance rather than relying on the OS to simply scatter shot your threads all over the processors.

    I run servers both with AMD Istanbuls and Intel Nehalems....
    As pointed out earlier, how about MC?

    Now don't get me wrong, I don't believe that the BS about SMT causing performance degradation either, but you have to give credit where credit is due. AMD's MC is quite a value.

    Chumbucket843,
    well, AMD basically destroyed the meaning of the word module. it generally refers to smaller units that make up a core. they have well defined functionality (i.e. a NAND/NOR/XOR gate or an adder/multiplier). an AMD "module" is a core. from the looks of CMT it is probably bigger than an SMT counterpart assuming that everything else about the cores are constant. the thing is there is more potential performance gains from CMT. it's basically hyperthreading but more aggressive, with the possibility of a better risk/reward ratio.

    in the end the nonsense arguments against hyperthreading will become the basis of intel fanboys' nonsense arguments against CMT, a full circle.
    First, surely you have something better to do than complain about AMD's use of the word "module"?

    Second,
    A module is not "a core". I can understand your objection, and have even asked a few questions myself about this architectural terminology. For instance, why doesn't Intel simply say that they have a 12 "core" product today?

    The answer is in the performance scaling and amount of shared resources. According to AMD, each "core" will scale 80% as good as 2 complete cores scale today. The only real difference between "real" cores and BD "cores" is that BD cores share more resources than we have typically seen in existing architectures. Intel's SMT only gains 20-30% in the best situations.

    CMT is nothing like SMT other than it attempts to share resources within a processor to execute more efficiently. AMD isn't calling their approach CMT (yet). I don't think they intend to at this point in time, but IMHO, it is pretty close to the definition.

    This is a pretty good article: http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf

    I don't think that CMT is appreciably bigger than SMT considering the rather small portion of the die that a core currently takes up compared to cache. Additionally, it can afford to be somewhat larger since it scales the performance by a much bigger margin than does SMT.

    Finally, scaling CMT to 3 or 4 cores should be of little difficulty for AMD in future iterations of BD. Not only would Intel gain little performance from moving to 4 way SMT from 2 way, but it would be very costly in development time and would still have the limitations and issues discussed in the article I linked to above.

    i dont care what AMD refers to it as. core is completely independent.
    Really? What about cache? What about cache coherency? Processors haven't been "independent" since the advent of SMP.

    i simulated a processor with SMT and CMT. they are the same in every way except for the variable.
    Not according to the research I have read. Perhaps you have other information?

    well written source code with inline assembly can be faster than pure ASM, with less bugs, development time and more portable.
    Faster? Only if the ASM is poorly written. I agree with your other points though. Most applications have little need for the speed and efficiency of ASM (with the exception of embedded applications, and even most of these can be done in C with enough efficiency to get by competitively).


    madcho,
    CMT seem very different of HT. I think AMD used the core 2 L2 cache in mind when they created the CMT.
    Indeed. I also think that AMD's L2 cache design will be pretty greatly changed in BD.... at least I am hoping so

    i think this is a good idea. It's gonna improve perfomance of L2 with the reduce of bandwith used between cores caches.
    Agree. I also believe that the L2 latency will be decreased.

  23. #298
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by terrace215 View Post
    (Consider two AVX(256) threads, or two independent threads that each would ideally have the whole module's L2 to itself, etc.)
    We'll have the ability to dispatch 8 256-bit executions per cycle, pretty sure that Sandybridge will as well. Not seeing the issue.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  24. #299
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Quote Originally Posted by Chumbucket843 View Post
    i showed you exactly what i intended: hyperthreading causing negative performance. nice straw man though. here are some more.
    Fine, I must say though that Linpack wasn't exactly the app when I had in mind when I asked for examples, I was thinking commercial apps.

    Regarding the "more part", what's to see ?

    - In visualization tasks you get 4% on average due to HT
    - In rendering tasks ( which do matter btw ) you get 21% on average

    That's a very significant improvement which helps a lot the productivity.
    ASM optimizations are usually not that useful. the compilers designers know way more about optimizations than programmers (it's a big part of their job). they know when x87 is faster than SSE, what fp instructions have higher accuracy, what instruction have lowest latency, etc. well written source code with inline assembly can be faster than pure ASM, with less bugs, development time and more portable. MKL is still a great lib though.
    ??
    Any respectable programmer is fully aware of instruction latency and mix, knows what to use for the intended target. When it comes to Intel, especially a benchmark like Linpack ( matrix multiplication is the most FP friendly thing ) , rest assured they squeezed every ounce of it.

    Quote Originally Posted by JF-AMD View Post
    Well, technically, he showed you what you asked for, and then you dismissed it.
    His example is perfectly valid, but we cannot infer anything from it regarding HT utility, in fact his other examples clearly show that overall, it brings nice gains.
    If you are arguing that with perfectely optimized code, there will be no use for HT, then I agree with you. HT is designed to take advantage of gaps in the pipeline.
    True, and Linpack falls under perfectly optimized code.
    What happens when code gets more optimized, does HT become less relevant?

    As code becomes more optimized, more physical cores will give you better throughput. As code becomes more optimized, HT would, theoretically, give you less throughput (or, more likely the same, but the main thread would be better and the HT thread would be worse, probably zero-sum game.)
    If anything I'd say code is becoming less optimized. Optimization means a lot of hard work and man hours, why not simply let the compiler do what it can and just make sure it works ? With the advent of a serious drive for cross platform portability and the increasing complexity ( you care about getting it done and working reliably even it it's not well optimized ) means less and less code is truly optimized to use all the HW resources.

    Secondly, there is no such thing as "main thread" and "HT thread". They are both in flight at the same time in the core. And your "zero sum game" is easily disproved by an abundance of examples for increase throughput. Hell, some CPUs like Niagara used an 8 thread HT per core and those have nowhere near the resources of a Nehalem core. Yet, even with the per thread performance of a Pentium 2, they offer significant throughput and a sensible business proposition.

    As for "more physical cores giving better throughput" that's an axiom. The question however is what are the costs ?
    -HT takes little die area and power, but more validation time
    -an extra physical core takes significantly more die area and power, validation is the same.

    Secondly, HT hides the ever growing gap between CPU speed ( 3-4GHz ) and memory clock ( 266MHz the fastest one ). Another core doesn't help at all, but worsens the problem.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  25. #300
    Xtreme Member
    Join Date
    Jan 2004
    Location
    uk
    Posts
    159
    Quote Originally Posted by JF-AMD View Post
    Actually for HT, you do want to program for it. Just telling it that you have more cores is a fiasco because you'll end up scheduling too many threads to one core while other cores are sitting idle.

    Remember that with HT, 2 cores do not execute at the same time, one has to wait for the other. So if you had a quad core with HT and you wanted to schedule 4 threads, you would put them all on different cores for the best speed. If you just said take the first 4 cores, you would get two threads sharing the first core, 2 sharing the second, and two idle cores.

    In a server workload, HT gives you a 10-20% increase in throughput, so do you want 2 threads sharing for that 10-20% increase, or do you want all of your threads on individual cores?
    my point been you cannot program for it.

    HT shows up as extra real cores to an app.

    On a intel i7 how many cpu graphs is there in task manager 4 or 8?

Page 12 of 17 FirstFirst ... 29101112131415 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •