MMM
Results 1 to 25 of 2036

Thread: The GT300/Fermi Thread

Threaded View

  1. #11
    Xtreme Addict
    Join Date
    May 2007
    Location
    'Zona
    Posts
    2,346
    Quote Originally Posted by kgk View Post
    Also, when you compare die size and how many chips they get per wafer, doesn't AMD get twice as many churned out per wafer as NVidia?

    Gives them a pricing/profitability advantage doesn't it?

    I mean, assuming R&D and overhead for both were equal, which they clearly are not.
    AMD/ATi gets ~60% more dies per wafer, Cypress vs GF100.
    Then again, current yields put AMD/ATi ~3x higher.

    Quote Originally Posted by jaredpace View Post
    edit btw, does anyone know how many clock domains this thing has? It's something like rops&core/l2&scheduler/sampler/shader/memory. going to be kinda wierd seing 5 different mhz's in your gpu-z window
    You will get to play with 3, just like last time.
    Core, which is tied to the ROPs & L2, shader which is tied to CCs and the texture units w/ the TMUs/scheduler running at half the hot clock(more on this later) and then the memory clocks.

    FYI- GPUs have many different units running at many different frequencies.

    Quote Originally Posted by FischOderAal View Post
    Yepp, it does. But don't forget yields. I think it's safe to assume that Fermi has worse yields than RV870.

    Someone made a claim that NVIDIA would have to sell Fermi at a loss in the desktop market. If things go bad for NVIDIA (AMD cuts prices, very bad yields etc) I can see this being true easily.
    Much worse yields.
    Nvidia won't be selling GF100 at a loss, unless something drastically changes but GF100 will have a BOM of more than 2x Cypress, meaning the 5970 will still be cheaper to make.

    Quote Originally Posted by W1zzard View Post
    given the same defect density per wafer of silicon a bigger die will automatically have lower yield (take a piece of paper, put 5 defect dots on it with a pen, cut it into 10 pieces, now cut another piece of paper with dots into 20 pieces, count how many pieces without dot you get)

    if nvidia is smart (they probably are) they put some spares on their gpu which is basically extra pieces of hardware that can replace pieces where defects are in the silicon. for example you could imagine having a 5th GPC cluster that can replace one with defects. if you do the proper math you can compute spare designs that are statistically going to increase your per-die yield even though such measures increase the die area
    Yes, but this has been discussed before. Having too much redundancy adds additional area that is not going to be enabled, unless things go very poorly. Too little redundancy means terrible yields if you are using a salvaging technique. There needs to be a choice made of a middle of the road design decision and it seems as if Nvidia went a little on the low side this time.
    Though I have a feeling you knew that.

    Quote Originally Posted by omar little View Post
    has it been established whether fermi uses hardware or software tesselation?
    Both. There is specific units that handle the tessellation, see Polymorph Engine, but are tied to clusters. This is a very different implementation than AMD, obviously. Where Nvidia has a massive advantage is being able to do 4 triangles per clock, compared to Cypress' 1 triangle/clock and Hemlock's 2 triangles/clock.

    Quote Originally Posted by SKYMTL View Post
    Which synthetic tests? Unigine, Far Cry 2, HawX, DX11 Toolkit, etc. don't use PhysX at all.
    Ummm... the hair demo and water demo plus the other "Nvidia" supplied demos. I know for sure he said the hair demo used PhysX.


    Since some don't want to look over the articles, Rys summarized the good stuff that we didn't know before, well most of us.
    Quote Originally Posted by Rys
    • Sampler runs at scheduler clock (half the hot clock)
    • 4 samplers per cluster (64 total)
    • Sampler will do jittered-offset for Gather4 (no idea how, the texture-space offset is constant per call)
    • 4 tris/clock setup and raster
    • Raster area per unit is now 2x4 rather than 2x16
    • PolyMorph Engine (heh), effectively pre-PS FF, one per cluster
    • ROPs now each take 24 coverage samples (up from 8)
    • Compression is improved, 4x->8x delta drop is less than GT200 clock-for-clock
    • Display engine improvements


    That's the list of the stuff I either got wrong or missed in my article at TR, concerning the graphics. Biggest thing is probably the > 1tri/clk for small triangles, and the change in the per-clock rasterisation area for each of the four units. Aggregate setup and rasterisation performance is no faster per clock than G80+ for triangles that are > 32 pixels.

    Sampler count was out by 2x, so NV will need a > 1.6 GHz hot clock to beat a GTX 285 in peak possible texture performance, and there's a distinct lack of information about the sampler hardware in the latest whitepaper. Doing more digging there, but it looks like no change to texturing IQ other the ability to jitter the texcoords per sample during an unfiltered fetch.

    NV claim that everything they list in the PolyMorph block exists as a physical block in the silicon. Obviously interesting thing there that didn't exist before is the tessellator, and it seems the fixed block there is responsible for generating the new primitives (or killing geometry too), and the units run in parallel (where possible), with most other stuff running on the SM.

    As for my clock estimates, I doubt 1700 MHz hot clock at launch (:sad, but the base clock should be usefully higher, up past 700 MHz. They still haven't talked about GeForce productisation or clocks, but at this point it looks unlikely the fastest launch GeForce will texture faster than a GTX 285.

    That's about it, will have an article up ASAP.
    It was also brought up that the TFUs are run at the full hot clock, to give it a 1:2 ratio like G80, to make sure enough info is fed to the TMUs since they are more efficient than G200's.

    I am very impressed with some of these large architectural changes Nvidia made with GF100. They can easily scale this architecture up in the coming years and hopefully see some good performance scaling, after small tweaks. The main problem other than the huge complexity and manufacturing problems, is being able to scale this design down. Sure it is doable but with the same clock complexities, I don't see the smaller chips being able to hit much higher clocks which will be needed.

    I look forward to see what they can do on 28nm.
    Last edited by LordEC911; 01-18-2010 at 02:53 PM.
    Originally Posted by motown_steve
    Every genocide that was committed during the 20th century has been preceded by the disarmament of the target population. Once the government outlaws your guns your life becomes a luxury afforded to you by the state. You become a tool to benefit the state. Should you cease to benefit the state or even worse become an annoyance or even a hindrance to the state then your life becomes more trouble than it is worth.

    Once the government outlaws your guns your life is forfeit. You're already dead, it's just a question of when they are going to get around to you.

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •