Page 31 of 39 FirstFirst ... 2128293031323334 ... LastLast
Results 751 to 775 of 954

Thread: AMD's Bobcat and Bulldozer

  1. #751
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by AliG View Post
    No one is sure, all JF has said is that AMD is working with MS to devise core utilization order etc.

    I would imagine, that ideally for multithreaded tasks you would want the same module due to the shared L2, but for separate tasks you would want different modules due to the performance loss from sharing components
    At the same time as you have a performance loss to shared components you have a boost from Turbo. If four threads run on a module each, you will have no turbo, since turbo managment is at a module level and not at a core level. If all threads run at two modules, you will have a 10% performance hit, but you will have turbo making up for that and more.
    Last edited by -Boris-; 08-31-2010 at 09:27 AM.

  2. #752
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by Hornet331 View Post
    http://flamewheelspin.ytmnd.com/

    perfectly sums up this thread...

    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  3. #753
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by Motiv View Post
    It was answered on the blog, that the shared L2 Cache wouldn't really help.

    As for the Multitasking, I suspect it will work like Intels HT. As far as I'm aware, that doesn't cripple 1 core only, but spreads it out amongst the other cores first and foremost.
    If that's the case, then ideally AMD would create a lineup that was priced such that 1 module ~ 1 intel HT core - but we know that's probably never going to happen
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  4. #754
    Xtreme Enthusiast
    Join Date
    Jun 2006
    Location
    Space
    Posts
    769
    Quote Originally Posted by -Boris- View Post
    At the same time as you have a performance loss to shared components, you have a boost from Turbo. If four threads run on a module each, you will have no turbo, since turbo managment is at a module level and not at a core level. If all threads run at two modules, you will have a 10% performance hit, but you will have turbo making up for that and more.
    Why would turbo work when it's 2 module/2 core? Surely turbo would be more suited to running when all 4 modules are only utilising 1 core?

    I thought the performance hit would be around 20%, if both cores are used within a module.

  5. #755
    Xtreme Enthusiast
    Join Date
    Jun 2006
    Location
    Space
    Posts
    769
    Quote Originally Posted by AliG View Post
    If that's the case, then ideally AMD would create a lineup that was priced such that 1 module ~ 1 intel HT core - but we know that's probably never going to happen
    I suspect we'll be seeing AMD lineup using 8 cores vs 4 cores, even if than means 4 modules. The AMD cores within the modules are certainly more core like, than HT.

    At the end of the day, the prices will be set based on workloads and how it copes with them. If a 4 module/8 core AMD chip (at say 2.5ghz), can deal with the same workload as a 4 core(8ht) Intel chip (at 2ghz), then that will be it's price window (speed values for arguments sake etc).
    Last edited by Motiv; 08-31-2010 at 09:32 AM.

  6. #756
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by Motiv View Post
    Why would turbo work when it's 2 module/2 core? Surely turbo would be more suited to running when all 4 modules are only utilising 1 core?

    I thought the performance hit would be around 20%, if both cores are used within a module.
    If you run one thread per module all modules work at the same time, and no modules rest, therefore no module can enter turbo. But if two the modules work with two threads, then two modules rest, if two modules rest the other to can enter turbo mode.
    You can't have turbo and all modules working at the same time, the fact that parts of a module is idle doesn't matter since turbo works on a module level.

    And it's said everywhere that a second thread run in a module "only" increases performance with 80%. That is a 10% performance loss compared to a traditional dual core approach.

  7. #757
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by Motiv View Post
    At the end of the day, the prices will be set based on workloads and how it copes with them. If a 4 module/8 core AMD chip (at say 2.5ghz), can deal with the same workload as a 4 core(8ht) Intel chip (at 2ghz), then that will be it's price window (speed values for arguments sake etc).
    I doubt that would happen though due to manufacturing costs. I have to believe 1 module is bigger than 1 Intel core. For consumers, the Intel core would make more sense, whereas for servers the module would make more sense as you are comparing 130% to 180% of the integer performance. Thus since server processors are always priced with much higher margins in mind, they could probably line up their processors that way, so even if intel's ipc is 10% faster, they would still win the performance battle.

    However, I just can't see AMD being able to price their products as you described for the general consumer and still make a profit, especially when Intel is at 32nm whereas AMD is stuck at 45nm. Even if they could, if the Intel product offers anywhere from 5-20% more ipc, I would just by an unlocked k series processor and be happy with that. Having anything beyond 4 threads is pretty much useless for me, so single threaded performance is what will earn my money.
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  8. #758
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by AliG View Post

    However, I just can't see AMD being able to price their products as you described for the general consumer and still make a profit, especially when Intel is at 32nm whereas AMD is stuck at 45nm.
    Bulldozer is 32nm SOI highk/mg...

  9. #759
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by informal View Post
    Bulldozer is 32nm SOI highk/mg...
    is it? 45nm makes a lot more sense because it's a proven process. That seems like a bad idea considering how well their 65nm k10 transition went. Perhaps that's the root of all the delays
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  10. #760
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by AliG View Post
    is it? 45nm makes a lot more sense because it's a proven process. That seems like a bad idea considering how well their 65nm k10 transition went. Perhaps that's the root of all the delays
    No, it was planned for 45nm, the delays made them change that to 32nm, giving them more time to develop the architecture.

  11. #761
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    846
    Quote Originally Posted by -Boris- View Post
    If you run one thread per module all modules work at the same time, and no modules rest, therefore no module can enter turbo. But if two the modules work with two threads, then two modules rest, if two modules rest the other to can enter turbo mode.
    You can't have turbo and all modules working at the same time, the fact that parts of a module is idle doesn't matter since turbo works on a module level.

    And it's said everywhere that a second thread run in a module "only" increases performance with 80%. That is a 10% performance loss compared to a traditional dual core approach.
    I would not make assumptions about how our processor works based on how our competitor has implemented technology.

    As you may (or may not) be aware, I was critical of the way that they implemented turbo. I am happy with the way that we have implemented it. I can't get into specifics, but I can assure you that when you look at the two implementations, you will see a clear difference and you'll appreciate what we have done with the technology.

    I hate to say things like that without being able to disclose any of the detail, but more than that I hate people going down the path of assuming things about our product that might not be fully accurate. It's a fine line.

    Just keep in mind that this is a brand new architecture and things are going to be approached from a different perspective. The modularity is only one small part of it; there are a lot of things that have been changed.

    People have been asking for someone to really bring some real innovation to the market, I think you will see that.
    While I work for AMD, my posts are my own opinions.

    http://blogs.amd.com/work/author/jfruehe/

  12. #762
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    Quote Originally Posted by -Boris- View Post
    No, it was planned for 45nm, the delays made them change that to 32nm, giving them more time to develop the architecture.
    that explains it then
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  13. #763
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by JF-AMD View Post
    I would not make assumptions about how our processor works based on how our competitor has implemented technology.

    As you may (or may not) be aware, I was critical of the way that they implemented turbo. I am happy with the way that we have implemented it. I can't get into specifics, but I can assure you that when you look at the two implementations, you will see a clear difference and you'll appreciate what we have done with the technology.

    I hate to say things like that without being able to disclose any of the detail, but more than that I hate people going down the path of assuming things about our product that might not be fully accurate. It's a fine line.

    Just keep in mind that this is a brand new architecture and things are going to be approached from a different perspective. The modularity is only one small part of it; there are a lot of things that have been changed.

    People have been asking for someone to really bring some real innovation to the market, I think you will see that.
    Ok, I've read that it was working on a module level. But I guess you are telling me that there is more to it than that?

  14. #764
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    It is working on a module level but that is all we know. Many things AMD didn't reveal,for obvious reasons.

  15. #765
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    This is quite a fascinating architecture. If that RWT article is accurate then I am extremely interested in seeing some benchmarks.

    I don't buy that overall per-core IPC must necessarily decrease (in relation to K10) because of reduced interger ALUs. Of course they will obviously miss out, compared to a 3 or 4 ALU core, on cases where int ILP is greater then 2. But in cases where the code is more mixed int and memory ops, IPC could go up in relation to K10 - based on available execution resources alone. Which case is more common obviously depends on the specific code being ran. Though I'd suggest that a program with consistently high integer ILP would be more efficient using packed integers (handled by the FPU) anyway.

    If we add to that the fact that missed branches and cache misses (both significantly improved in BD) have a much greater effect on overall IPC than some missed ILP cases, it's clear that claiming lower IPC than K10 isn't really justified based on fewer ALUs alone. I doubt that BD will have lower IPC per-core than K10. In reality it's probably somewhere in the vast gulf between PII and SB.

    As already noted though, IPC isn't the only factor in a processor's performance. This is obviously a high frequency design. The memory and cache subsystems are a big leap forward for AMD. They are designed to keep a large number of cores well fed - to minimize the amount of time that execution resources are waiting on data and thus increase efficiency. Intel will probably continue to lead in IPC by a significant margin. Whether AMD can increase frequency enough to make single threaded performance competitive remains to be seen. On the multi-threaded side BD sounds like a monster.

    If AMD can't match Intel's single threaded performance it looks like we will have a split market come 2011. Office users and gamers might do best with SB while people doing encoding, folding, heavy multitasking, HPC, and servers might do best with BD.

  16. #766
    Xtreme Enthusiast
    Join Date
    Jun 2006
    Location
    Space
    Posts
    769
    Quote Originally Posted by Solus Corvus View Post
    snipped...

    If AMD can't match Intel's single threaded performance it looks like we will have a split market come 2011. Office users and gamers might do best with SB while people doing encoding, folding, heavy multitasking, HPC, and servers might do best with BD.
    While I agree with everything else you put (that RWT article is a must read for anyone who hasn't), I would say this last statement is wrong.

    I suspect that margins will be significantly lower for gamers/office users (although will bobcat/llano fill the office space?). It could be a great result for overclockers, as we'll have access to decent multicore tech, that should have a bit of room to mess with.

    So unless Intel go for a price war, all AMD has to do is price match on a performance level.

    it's only people wanting absolute max, that care about who has the best CPU. The mainstream gamer just wants to spend £200 on a cpu and make sure that the cpu is competitive to other cpus round that price break.

  17. #767
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by Solus Corvus View Post
    This is quite a fascinating architecture. If that RWT article is accurate then I am extremely interested in seeing some benchmarks.

    I don't buy that overall per-core IPC must necessarily decrease (in relation to K10) because of reduced interger ALUs. Of course they will obviously miss out, compared to a 3 or 4 ALU core, on cases where int ILP is greater then 2. But in cases where the code is more mixed int and memory ops, IPC could go up in relation to K10 - based on available execution resources alone. Which case is more common obviously depends on the specific code being ran. Though I'd suggest that a program with consistently high integer ILP would be more efficient using packed integers (handled by the FPU) anyway.

    If we add to that the fact that missed branches and cache misses (both significantly improved in BD) have a much greater effect on overall IPC than some missed ILP cases, it's clear that claiming lower IPC than K10 isn't really justified based on fewer ALUs alone. I doubt that BD will have lower IPC per-core than K10. In reality it's probably somewhere in the vast gulf between PII and SB.

    As already noted though, IPC isn't the only factor in a processor's performance. This is obviously a high frequency design. The memory and cache subsystems are a big leap forward for AMD. They are designed to keep a large number of cores well fed - to minimize the amount of time that execution resources are waiting on data and thus increase efficiency. Intel will probably continue to lead in IPC by a significant margin. Whether AMD can increase frequency enough to make single threaded performance competitive remains to be seen. On the multi-threaded side BD sounds like a monster.

    If AMD can't match Intel's single threaded performance it looks like we will have a split market come 2011. Office users and gamers might do best with SB while people doing encoding, folding, heavy multitasking, HPC, and servers might do best with BD.
    Why don't we look at the argument from another view point.

    Show me the source code to 1 program which can sustain under optimal conditions an IPC greater than 1.8, for which multi-threading isn't a better solution.

    For those of you smart enough to actually wonder what makes IPC greater than 1 possible [In source code]; let me save you a long winding trip and give you the answer; such a beast DOES NOT EXIST.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  18. #768
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Let me think about that.

  19. #769
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by -Boris- View Post
    BD has more resources since it can use 2 ALUs and 2 AGUs every clock, Phenom II averages at 1.5 ALUs and 1.5 AGUs since the share pipe. Again, if you can't use it, it isn't a resource. 2+2=4 (3+3)/2=3..
    Hans wrote for the K8:
    Each Scheduler can launch one ALU and one AGU operation per cycle. The ALU operation may come from one x86 instruction while the AGU operation may come from another.
    http://chip-architect.com/news/2003_...it_Core.html#3
    That is no 1.5, that is 3 ... maybe u missed the fact, that the MacroOps are splitted into µOps at that stage ?

  20. #770
    Xtreme Guru
    Join Date
    May 2007
    Location
    Ace Deuce, Michigan
    Posts
    3,955
    correct there are 3 full integer operations in k8 and on, that can do either ALU or AGU, but as I understand it is more efficient due to improved prefetchers and smaller die sizes to use a 2+2 simplified design
    Quote Originally Posted by Hans de Vries View Post

    JF-AMD posting: IPC increases!!!!!!! How many times did I tell you!!!

    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    terrace215 post: IPC decreases, The more I post the more it decreases.
    .....}
    until (interrupt by Movieman)


    Regards, Hans

  21. #771
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Opteron146 View Post
    Hans wrote for the K8:
    http://chip-architect.com/news/2003_...it_Core.html#3
    That is no 1.5, that is 3 ... maybe u missed the fact, that the MacroOps are splitted into µOps at that stage ?
    Yes ,but at the back end the Macro ops are retired and K8/10h can do 3 of those while each Bulldozer integer core can do 4. That is 33% difference.

  22. #772
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by JF-AMD View Post
    I would not make assumptions about how our processor works based on how our competitor has implemented technology.

    As you may (or may not) be aware, I was critical of the way that they implemented turbo. I am happy with the way that we have implemented it. I can't get into specifics, but I can assure you that when you look at the two implementations, you will see a clear difference and you'll appreciate what we have done with the technology.

    I hate to say things like that without being able to disclose any of the detail, but more than that I hate people going down the path of assuming things about our product that might not be fully accurate. It's a fine line.

    Just keep in mind that this is a brand new architecture and things are going to be approached from a different perspective. The modularity is only one small part of it; there are a lot of things that have been changed.

    People have been asking for someone to really bring some real innovation to the market, I think you will see that.
    I'd love to see that, IN MY LIFETIME!.....Just joking....anyhow, I'm not asking anymore questions about BD. I'm just going to wait for a product release.
    As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"

  23. #773
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    Quote Originally Posted by JF-AMD View Post
    I would not make assumptions about how our processor works based on how our competitor has implemented technology.

    As you may (or may not) be aware, I was critical of the way that they implemented turbo. I am happy with the way that we have implemented it. I can't get into specifics, but I can assure you that when you look at the two implementations, you will see a clear difference and you'll appreciate what we have done with the technology.

    I hate to say things like that without being able to disclose any of the detail, but more than that I hate people going down the path of assuming things about our product that might not be fully accurate. It's a fine line.

    Just keep in mind that this is a brand new architecture and things are going to be approached from a different perspective. The modularity is only one small part of it; there are a lot of things that have been changed.

    People have been asking for someone to really bring some real innovation to the market, I think you will see that.
    this right here is the human element! it separates you from the bots JF! you show us that you care, that you want to tell us; but are unable too.

    we (most anyways) understand, and appreciate what you have told us so far.
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  24. #774
    Registered User
    Join Date
    Apr 2008
    Posts
    17
    JF, so each BD is faster clock per clock than the Phenom cores? Or is it by just comparing the top clocked frequency processors of each product line?

  25. #775
    Xtreme Member
    Join Date
    Aug 2004
    Posts
    210
    Quote Originally Posted by AliG View Post
    correct there are 3 full integer operations in k8 and on, that can do either ALU or AGU,
    No, it is not "either" it is both ... what do you not understand in the quote of Hans' article ?
    but as I understand it is more efficient due to improved prefetchers and smaller die sizes to use a 2+2 simplified design
    That is correct, the current IPCs of usual code is around 1, I think Nehalem achievs 1.5-1.7 in best cases, thus: 2 pipes are enough

    Quote Originally Posted by informal View Post
    Yes ,but at the back end the Macro ops are retired and K8/10h can do 3 of those while each Bulldozer integer core can do 4. That is 33% difference.
    Yes you are right, but I never said anything against that point ;-)
    Maybe one note on that, because I red it earlier: The AGU results are not retired, they go immediately into the LD/STR units, so the waiting µOp can get its mem-data ;-) Later, after the calculation of the µOp is finished, that µOp is retired.
    So in short the retire / ExU ratio is 1:2 for both, not 1:3. For K10 it's (3:6) and for BD it's (4:8).

Page 31 of 39 FirstFirst ... 2128293031323334 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •