Page 1 of 4 1234 LastLast
Results 1 to 25 of 93

Thread: Fermi: What went wrong, from the mouth of Jen-Hsun Huang

  1. #1
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116

    Fermi: What went wrong, from the mouth of Jen-Hsun Huang

    http://www.golem.de/1009/78179.html

    Very honest of him. It is clear now that Fermi is old news for him. He is talking about history.

  2. #2

  3. #3
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116
    It sounds like they tried a complex technique that the simulations implied would work, but it came back from TSMC and didn't. That is why they had problems and AMD didn't. AMD didn't use such a complicated "fabric".

    But more importantly, if I fill in the blanks, it sounds as though the design guys over-reached, and the hardware guys could have prevented the problem if they had more design responsibility. So a management change was implemented. Design and hardware, theory and practicality, are more tightly coupled.

  4. #4
    Xtreme Enthusiast
    Join Date
    Sep 2008
    Location
    ROMANIA
    Posts
    687
    That could happen at all, but it was Jen-Hsun Huang was considered not just a technical problem. Rather, it was not for the Fabric own development department. "My engineers, dealing with architecture, and those dealing with the physics, sit in two different departments," said Huang. He continued: "The management lesson we have learned: There should always be a chief pilot - for everything in our business is complicated."
    So physics guys need for very much SP's and L2 cachez screw the work of engineers, they wanted to much from Fermi which on paper seemed to work, on practic didn't work. The "software guys" forget that there some physics , power- laws.
    THe two departaments didn't comunicated well.
    i5 2500K@ 4.5Ghz
    Asrock P67 PRO3


    P55 PRO & i5 750
    http://valid.canardpc.com/show_oc.php?id=966385
    239 BCKL validation on cold air
    http://valid.canardpc.com/show_oc.php?id=966536
    Almost 5hgz , air.

  5. #5
    Xtreme Enthusiast
    Join Date
    Dec 2009
    Posts
    591
    Their g80 design is no longer scalable is what he is trying to say. It took a lot of trial and error to make it work and in the end it did, with a surface temp similar to that of the sun.

    Lesson learned: change is good and smaller process cannot always save your a$$.

  6. #6
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by bamtan2 View Post
    It sounds like they tried a complex technique that the simulations implied would work, but it came back from TSMC and didn't. That is why they had problems and AMD didn't. AMD didn't use such a complicated "fabric".
    AMD or for that matter any integrated circuit also has a "fabric". it is just a lay term to create an idea or image of what an interconnect is like. keep in mind this still doesnt explain A2's problems.

  7. #7
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    That's why people invented ring buses. Nvidia is in love with crossbar networks but at some point you just end up with too many wires. Like he said, there could be other problems plaguing the next architecture but this specific issue won't be repeated obviously.

    Interesting that it had nothing to do with silicon defects or die-size like all the resident armchair engineers have been suggesting for the last year.

  8. #8
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by damha View Post
    Their g80 design is no longer scalable is what he is trying to say. It took a lot of trial and error to make it work and in the end it did, with a surface temp similar to that of the sun.

    Lesson learned: change is good and smaller process cannot always save your a$$.
    not in the least. g80 and fermi are very different, not even the same ISA.

    what he is saying, and keep in mind this is just for A1 silicon, is that they trusted TSMC's software tools which gave them misleading information and lead to the failure of the interconnect.

  9. #9
    Xtreme Addict
    Join Date
    May 2007
    Location
    'Zona
    Posts
    2,346
    Quote Originally Posted by Chumbucket843 View Post
    not in the least. g80 and fermi are very different, not even the same ISA.

    what he is saying, and keep in mind this is just for A1 silicon, is that they trusted TSMC's software tools which gave them misleading information and lead to the failure of the interconnect.
    Why should they have trusted TSMC when they were having "similar" problems with 40nm GT2xx chips?
    At some point you have to take a step back and say to yourself, we must be missing something. Nvidia management never did.
    Originally Posted by motown_steve
    Every genocide that was committed during the 20th century has been preceded by the disarmament of the target population. Once the government outlaws your guns your life becomes a luxury afforded to you by the state. You become a tool to benefit the state. Should you cease to benefit the state or even worse become an annoyance or even a hindrance to the state then your life becomes more trouble than it is worth.

    Once the government outlaws your guns your life is forfeit. You're already dead, it's just a question of when they are going to get around to you.

  10. #10
    Xtreme Addict
    Join Date
    Apr 2006
    Location
    City of Lights, The Netherlands
    Posts
    2,381
    Quote Originally Posted by xdan View Post
    So physics guys need for very much SP's and L2 cachez screw the work of engineers, they wanted to much from Fermi which on paper seemed to work, on practic didn't work. The "software guys" forget that there some physics , power- laws.
    THe two departaments didn't comunicated well.
    You're misunderstanding what Huang means with the physics guys and the engineers. The physics guys are responsible for the actual implementation of the architecture while the engineers came up with the architecture. That's how Huang used the two terms here. In other words, the engineers came up with an architecture with lots of SPs and L2 cache, but that's not what caused the delay in Fermi though although it is part of the reason why it's so hot.
    "When in doubt, C-4!" -- Jamie Hyneman

    Silverstone TJ-09 Case | Seasonic X-750 PSU | Intel Core i5 750 CPU | ASUS P7P55D PRO Mobo | OCZ 4GB DDR3 RAM | ATI Radeon 5850 GPU | Intel X-25M 80GB SSD | WD 2TB HDD | Windows 7 x64 | NEC EA23WMi 23" Monitor |Auzentech X-Fi Forte Soundcard | Creative T3 2.1 Speakers | AudioTechnica AD900 Headphone |

  11. #11
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    Quote Originally Posted by LordEC911 View Post
    Why should they have trusted TSMC when they were having "similar" problems with 40nm GT2xx chips?
    At some point you have to take a step back and say to yourself, we must be missing something. Nvidia management never did.
    Where did you see they were having similiar problems with GT2xx?

  12. #12
    I am Xtreme
    Join Date
    Oct 2004
    Location
    U.S.A.
    Posts
    4,743
    At least he is admitting that they took the wrong approach with Fermi. I believe he is wrong though about "Now is time for innovation, not for integration." I believe it is time for innovation and integration. From a business stand point he should get more software developers to use Cuda to take advantage of the gpu. It's not about the gpu just doing games anymore it's about the apps they can run. The GPU is more powerful than a cpu at certain tasks video encoding is one of these tasks. If Nvidia takes too long this time developing the successor to Fermi this time around I don't think they can survive in the computer gpu market any longer. They will have to switch to the mobile market as their primary means of profit, or the company will cease to exist.


    Asus Z9PE-D8 WS with 64GB of registered ECC ram.|Dell 30" LCD 3008wfp:7970 video card

    LSI series raid controller
    SSDs: Crucial C300 256GB
    Standard drives: Seagate ST32000641AS & WD 1TB black
    OSes: Linux and Windows x64

  13. #13
    Banned
    Join Date
    Mar 2010
    Posts
    88
    Quote Originally Posted by safan80 View Post
    At least he is admitting that they took the wrong approach with Fermi. I believe he is wrong though about "Now is time for innovation, not for integration." I believe it is time for innovation and integration. From a business stand point he should get more software developers to use Cuda to take advantage of the gpu. It's not about the gpu just doing games anymore it's about the apps they can run. The GPU is more powerful than a cpu at certain tasks video encoding is one of these tasks. If Nvidia takes too long this time developing the successor to Fermi this time around I don't think they can survive in the computer gpu market any longer. They will have to switch to the mobile market as their primary means of profit, or the company will cease to exist.

    Cool, I'll pm you next time a thread needs doomsday predictions. Lol. NV has $3B in the bank btw. Just as an FYI.

  14. #14
    Xtreme Addict
    Join Date
    Mar 2006
    Location
    Saskatchewan, Canada
    Posts
    2,207
    Quote Originally Posted by safan80 View Post
    At least he is admitting that they took the wrong approach with Fermi. I believe he is wrong though about "Now is time for innovation, not for integration." I believe it is time for innovation and integration. From a business stand point he should get more software developers to use Cuda to take advantage of the gpu. It's not about the gpu just doing games anymore it's about the apps they can run. The GPU is more powerful than a cpu at certain tasks video encoding is one of these tasks. If Nvidia takes too long this time developing the successor to Fermi this time around I don't think they can survive in the computer gpu market any longer. They will have to switch to the mobile market as their primary means of profit, or the company will cease to exist.
    I think they actually need to stop integrating to some extent, atleast with their product line.

    To me, it's going to be hard to make a competitive CGPU device that also does great at games.

    Its make the silicone to big and makes the product a jack of all trades, master of none. They are lucky that their is no products to make their professional products look bad(Firegl products are POS because they were never supposed to be CGPU products in the first place). But at the same time, fermi is really inefficient for gaming and when you consider power consumption is not a good gaming part(atleast the gf100 variety)

    I think they need to research two different lines. One for computing and one for games. I think they needs to specialize these two lines because making a single product to do both has diluted fermi performance and has prevented it from being a full blown product with all the parts being salvages.
    Core i7 920@ 4.66ghz(H2O)
    6gb OCZ platinum
    4870x2 + 4890 in Trifire
    2*640 WD Blacks
    750GB Seagate.

  15. #15
    I am Xtreme
    Join Date
    Oct 2004
    Location
    U.S.A.
    Posts
    4,743
    Quote Originally Posted by Svnth View Post
    Cool, I'll pm you next time a thread needs doomsday predictions. Lol. NV has $3B in the bank btw. Just as an FYI.
    LOL It's just all those business classes that I'm taking. Since they have $3B in the bank they should start one


    Quote Originally Posted by tajoh111 View Post
    I think they actually need to stop integrating to some extent, atleast with their product line.

    To me, it's going to be hard to make a competitive CGPU device that also does great at games.

    Its make the silicone to big and makes the product a jack of all trades, master of none. They are lucky that their is no products to make their professional products look bad(Firegl products are POS because they were never supposed to be CGPU products in the first place). But at the same time, fermi is really inefficient for gaming and when you consider power consumption is not a good gaming part(atleast the gf100 variety)

    I think they need to research two different lines. One for computing and one for games. I think they needs to specialize these two lines because making a single product to do both has diluted fermi performance and has prevented it from being a full blown product with all the parts being salvages.

    What did in Fermi was the fact they relied on a computer simulation to design the card with and they should of listened more to their engineers. Architects do this in building designing and it ends up being changed during the construction process because certain things just cannot be done and keep a building standing. I think Nvidia looked at Boeing and the way they made the 777 and tried to apply that to making a super video card.
    Last edited by safan80; 09-23-2010 at 01:22 PM.


    Asus Z9PE-D8 WS with 64GB of registered ECC ram.|Dell 30" LCD 3008wfp:7970 video card

    LSI series raid controller
    SSDs: Crucial C300 256GB
    Standard drives: Seagate ST32000641AS & WD 1TB black
    OSes: Linux and Windows x64

  16. #16
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    Nvidia doesn't have a problem with integration or innovation. Just execution.

  17. #17
    Xtreme Enthusiast
    Join Date
    Oct 2008
    Posts
    678
    Quote Originally Posted by trinibwoy View Post
    That's why people invented ring buses. Nvidia is in love with crossbar networks but at some point you just end up with too many wires. Like he said, there could be other problems plaguing the next architecture but this specific issue won't be repeated obviously.

    Interesting that it had nothing to do with silicon defects or die-size like all the resident armchair engineers have been suggesting for the last year.
    The amount of wires increase exponentially with units and size. Just as the interference do with amount of wires. A chip at 200mm˛ have a much simpler crossbar and wouldn't suffer from it at all. So, the crossbar simply didn't scale to the huge chips nVidia wanted to build. A ringbus might have helped, but we can't know if it's enough. Since the redesigned GF100 with a new fixed crossbar wasn't enough to make the chip entirely functional it seems like size was an important factor.


    EDIT:
    I think GF100 was nVidias R600 or Prescott. I think their upcomming chips will be alot more efficient per mm˛. With R600 ATi learned that a huge 512 ringbus didn't pay of. And I've heard that since the failure with prescott Intel only makes changes that produces at least 2% performance increase for every 1% power increase.
    Last edited by -Boris-; 09-23-2010 at 01:36 PM.

  18. #18
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    Quote Originally Posted by -Boris- View Post
    The amount of wires increase exponentially with units and size. Just as the interference do with amount of wires. A chip at 200mm˛ have a much simpler crossbar and wouldn't suffer from it at all. So, the crossbar simply didn't scale to the huge chips nVidia wanted to build.
    The problem wasn't simply one of scale though, design also played a part. The problem was that their simulations told them it would work and when they got chips back they found out those simulations were wrong. If it was caught up front their A1 might have looked like their A3.

    A ringbus might have helped, but we can't know if it's enough. Since the redesigned GF100 with a new fixed crossbar wasn't enough to make the chip entirely functional it seems like size was an important factor.
    Yep there definitely are still yield problems hence the disabled SM. But that's a far cry from completely non-functional chips as seems to have been the case with the A1 interconnect problems.

  19. #19
    Xtreme Enthusiast
    Join Date
    Dec 2008
    Posts
    640
    Quote Originally Posted by safan80 View Post
    LOL It's just all those business classes that I'm taking. Since they have $3B in the bank they should start one


    Where do you see they have $3B in the bank? Nowhere I've looked at their financials shows any sort of figure like that.....which "$3B in the bank", as you put it, would refer to cash reserves. Latest statement, as of March 2010, showed $1.7B in cash reserves, not $3B.

    So, can you please link to the $3B figure? Thanks!

  20. #20
    XS_THE_MACHINE
    Join Date
    Jun 2005
    Location
    Denver
    Posts
    932
    Quote Originally Posted by Humminn55 View Post
    Where do you see they have $3B in the bank? Nowhere I've looked at their financials shows any sort of figure like that.....which "$3B in the bank", as you put it, would refer to cash reserves. Latest statement, as of March 2010, showed $1.7B in cash reserves, not $3B.

    So, can you please link to the $3B figure? Thanks!
    I think he was responding to the quote below.

    Quote Originally Posted by Svnth View Post
    Cool, I'll pm you next time a thread needs doomsday predictions. Lol. NV has $3B in the bank btw. Just as an FYI.


    xtremespeakfreely.com

    Semper Fi

  21. #21
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    1,870
    They have 3.7b in assets, 2.2b of which is cash or equivalent.

  22. #22
    Xtreme Enthusiast
    Join Date
    Dec 2008
    Posts
    640
    Quote Originally Posted by rogueagent6 View Post
    I think he was responding to the quote below.
    Yeah, I keep seeing that figure bandied about like it's a "fact"....but the facts don't add up to anywhere near that piece of misinformation. Guess someone needs to learn to read financial reports.

  23. #23
    Xtreme Enthusiast
    Join Date
    Dec 2008
    Posts
    640
    Quote Originally Posted by trinibwoy View Post
    They have 3.7b in assets, 2.2b of which is cash or equivalent.
    Cash and Short Term Investments.......$1,728.23 (In thousands), from NV's 10-K filing on 3/2010.

    $3B in assets is NOT "in the bank". Inventory isn't exactly money in the bank and neither is their specialized equipment, property, etc. True, one can borrow against physical capital assets, but they're far from "in the bank" money.

    And NV's liabilities wipe out their assets......making their company a zero sum company.

    The sad part of NV's structure right now is their net profit margin is only 6.36%, horrible. Compare that to Intel's (23.08%) or AMD (24.92%.)

  24. #24
    Xtreme Addict
    Join Date
    Apr 2007
    Location
    canada
    Posts
    1,886
    Quote Originally Posted by Svnth View Post
    Cool, I'll pm you next time a thread needs doomsday predictions. Lol. NV has $3B in the bank btw. Just as an FYI.

    nope ... they have a value of 3bn maybe .. but not 3bn in the bank ..
    WILL CUDDLE FOR FOOD

    Quote Originally Posted by JF-AMD View Post
    Dual proc client systems are like sex in high school. Everyone talks about it but nobody is really doing it.

  25. #25
    Xtreme Addict
    Join Date
    Aug 2008
    Location
    Hollywierd, CA
    Posts
    1,284
    Quote Originally Posted by Humminn55 View Post
    Where do you see they have $3B in the bank? Nowhere I've looked at their financials shows any sort of figure like that.....which "$3B in the bank", as you put it, would refer to cash reserves. Latest statement, as of March 2010, showed $1.7B in cash reserves, not $3B.

    So, can you please link to the $3B figure? Thanks!
    you ever checked nvidia's SEC filings?
    Last edited by 570091D; 09-23-2010 at 02:44 PM.
    [SIGPIC][/SIGPIC]

    I am an artist (EDM producer/DJ), pls check out mah stuff.

Page 1 of 4 1234 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •