Page 3 of 7 FirstFirst 123456 ... LastLast
Results 51 to 75 of 157

Thread: 5870 Bottleneck Investigation (CPU and/or Memory Bandwidth)

  1. #51
    Xtreme Addict
    Join Date
    Mar 2007
    Posts
    1,489
    Quote Originally Posted by jaredpace View Post
    crysis likes cpu as much as gpu
    Yeah, I'm going to try to do some other titles.

    Quote Originally Posted by Gurr View Post
    Not sure why you're less than excited. You're 600 mhz below him and only 5-10 FPS behind him in possibly the most intense game(graphically) there is. Still.

    Nice results, are you 2 going to be doing any other tests together?
    Less than excited because I'm a hardware whore, and I don't need any excuses, no matter how weak, to buy new hardware

    Considering our min FPS are still pretty much the same, means that we both would get roughly the same gameplay experience, so I still love my little engine that could Phenom II

    The OP of that thread (the guy I'm benching against) seems to have dissapeared, so atm no, but I'm hoping to round up someone else with a 5870 and heavily OC'ed i7
    Asus G73- i7-740QM, Mobility 5870, 6Gb DDR3-1333, OCZ Vertex II 90Gb

  2. #52
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    I can't believe you guys thing it's really only 320 shaders.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  3. #53
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by demonkevy666 View Post
    I can't believe you guys thing it's really only 320 shaders.
    it IS 320 shaders. there are 5 alu's per shader. they are called stream processors. the 320 shaders are then grouped into 20 thread clusters.

  4. #54
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by Chumbucket843 View Post
    it IS 320 shaders. there are 5 alu's per shader. they are called stream processors. the 320 shaders are then grouped into 20 thread clusters.
    are actually 320 rather complex 5-stage computing subunits

    which means that's the textures stream processors other stream processors are simple or general.

    The general principle of the computing section has not changed much in the RV870. It is still based on shader processors with superscalar design, each processor incorporating five ALUs four of which are general-purpose ALUs and the fifth is a special-purpose ALU capable of executing complex instructions like SIN, COS, LOG, EXP, etc. Besides the ALUs, each shader processor also contains a branch control unit and an array of general-purpose registers.
    When we are talking about 1600 stream processors in the RV870, we must keep it in mind that there are actually 320 rather complex 5-stage computing subunits. Provided sufficient code optimization, this design of the GPU’s computing section helps achieve a much higher level of performance than with Nvidia’s scalar architecture.
    which is why nvidia has a shader clock and ati shader clock would be locked to the core speeds.

    cpu is the bottleneck lol
    Last edited by demonkevy666; 09-27-2009 at 06:25 PM.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  5. #55
    Xtreme Member
    Join Date
    Dec 2008
    Location
    Raleigh, NC
    Posts
    318
    I wish I could break the 3.8Ghz barrier to release some more potential outta these guys. Damn C0's.

  6. #56
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by hurleybird View Post
    First of all, that's just the number of compute threads possible, and has nothing to do with limiting the shader power in games. Secondly, the 5870 can only run a maximum of 1600 / 5 = 320 threads, because you can only run one thread per group of 5 "stream processors", so in terms of DirectCompute even TriFire 5870's will not be held back by the thread limit, although QuadFire might be.
    I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

  7. #57
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by demonkevy666 View Post
    are actually 320 rather complex 5-stage computing subunits

    which means that's the textures stream processors other stream processors are simple or general.

    cpu is the bottleneck lol
    i dont see what you are disagreeing with. lets just stick with ALU because thats what they are. ATi can call them whatever they want. the 5th alu can also handle double precision btw.


    realistically they say they will get more performance than nvidia but they dont tell you that their gpu's are heavily dependent on compilers to effectively keep the gpu under full load. not saying a gt200 is faster than rv870 though.

  8. #58
    Xtreme Addict
    Join Date
    May 2007
    Posts
    2,125
    It'll be interesting to see if the DX11 changes affected performance

    Obviously, it's hard to say since DX11 supposedly gives inherent performance boosts over DX10/10.1, but given that DX11 had some changes that might affect how the compiler works and how things are executed, im wondering if its possible one setup is bottlenecked

    From what I've seen/heard too, the CPU power needed to keep up with one of these things might be a factor as well

  9. #59
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    4.2 isnt really heavily oced
    thats kinda what all 920s max out at... heavily oced means 4.4-4.6, which IS possible with good chips on water or even good air
    im quite surprised it needs that much cpu oomph...

    especially at 1920x1080.... strange...

  10. #60
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    Quote Originally Posted by hurleybird View Post
    First of all, that's just the number of compute threads possible, and has nothing to do with limiting the shader power in games. Secondly, the 5870 can only run a maximum of 1600 / 5 = 320 threads, because you can only run one thread per group of 5 "stream processors", so in terms of DirectCompute even TriFire 5870's will not be held back by the thread limit, although QuadFire might be.
    ahhhhhh, alright, thanks for clearing that up man!

    Quote Originally Posted by Chumbucket843 View Post
    they doubled registers and cache so internally it is fine.
    well thats what they say... could still be limited internally somehow...
    does anybody know how to disable parts of the gpu or get the driver to not use them like w1zz did in his 5870 review to simulate a 5850?

    Quote Originally Posted by demonkevy666 View Post
    cpu is the bottleneck lol
    i dont think so... hwcanucks benched with an i7 at 4g or even 4.2 iirc...
    and their numbers arent all that diferent from other reviews...

    nascasho, only 3.8? is your cpu multiplier dropping to default multiplier under load? your probably limited by max tdp/tdc, ask evga how to disable or manipulate current feedback sensing of your cpu on that board either by bios or hardmod and you should get 4G+

    its usually a single small resistor that needs to be removed or pencilled to adjust the resistance and as a result your cpu will think current draw is low and wont throttle.

  11. #61
    Xtreme Addict
    Join Date
    Mar 2007
    Posts
    1,489
    Quote Originally Posted by saaya View Post
    4.2 isnt really heavily oced
    thats kinda what all 920s max out at... heavily oced means 4.4-4.6, which IS possible with good chips on water or even good air
    im quite surprised it needs that much cpu oomph...

    especially at 1920x1080.... strange...
    Yeah, I just meant heavily OC'ed for 24/7 on air, he does have a 965 though. I'm more used to pre-D0 OC's too

    I was pretty surprised as well with the results



    He just posted a bench of downclocked i7, avg stayed about the same, but min fps took a BIG hit... worse than my 720BE now


    i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

    i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46
    Asus G73- i7-740QM, Mobility 5870, 6Gb DDR3-1333, OCZ Vertex II 90Gb

  12. #62
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Quote Originally Posted by demonkevy666 View Post
    I haven't found any reference for this to be true no where dose it say you can only enable 1 of 5 parts of the SIMD.

    Not what I was saying. What I was saying is that one thread is assigned to each group of five ALUs, unlike Nvidia's architecture where each ALU gets it's own thread. Because of this RV870 can only run a maximum of 1600 / 5 = 320 threads, which is way below the 1024-thread limit of DirectCompute11.

  13. #63
    Xtreme Enthusiast
    Join Date
    Aug 2002
    Location
    London,Uk
    Posts
    950
    Quote Originally Posted by iandh View Post
    Yeah, I just meant heavily OC'ed for 24/7 on air, he does have a 965 though. I'm more used to pre-D0 OC's too

    I was pretty surprised as well with the results



    He just posted a bench of downclocked i7, avg stayed about the same, but min fps took a BIG hit... worse than my 720BE now


    i7 @ 4.2Ghz Min: 21.29 Max: 37.74 Avg: 28.65

    i7 @ 3.2Ghz Min: 15.93 Max: 37.27 Avg: 28.46
    Errm, as his numbers show, the minimum is the cpu limited situation, the max is gpu limited, not the other way around his numbers prove that.

    The phenom with 4 cores should give almost identical results in most games to a i7, 99% of games aren't cpu limited, the biggest difference between your results are the max not the minimum, meaning theres just something very different with his rig to yours, forcing AF, aa, or something, different drivers, something.

    You can't remotely investigate cpu limits on two different rigs set up differently by two people.

    As he's shown by dropping cpu speed, thats how you test cpu limits, identical system and setup, with a diff cpu speed. Which shows completely reversed results and conclusions compared to your initial "testing".


    Either way, benchmarks which aren't a great reflection of in game experience, aren't very useful for testing the real limits behind a card. Most benchmarks are designed as just that, not necessarily to give you an impression of performance you'll recieve in game, but something for sites to use for years to come.

    Most other games, and most in game testing should have minimum fps as gpu limited numbers, but the Crysis benchmark benches cpu just as much as gpu and the lowest numbers are normally the very heaviest physics sections.

    But while the minimum has changed drastically, the average results showed what a 0.2fps difference, meaning that "minimum fps" result is rare enough that it is barely in the benchmark. The average result shows for that benchmark at least, that the 1Ghz difference in CPU speed, made a completely un noticeable difference in average fps/max fps. It killed the min fps, but probably had that min fps up for around a quarter of a second anyway, so pretty much irrelevant.

    If you can draw any conclusions, its that your rig has a major problem thats severely effecting performance, the crysis benchmark isn't remotely cpu limited by a 3.2Ghz i7 and moving to a 4.2Ghz i7 makes no noticeable difference and, theres no point testing further till you sort out your rig. Frankly, you should be getting very similar average fps with the same card.
    Mail Me | 3500+ , dfi sli-dr, g-skill la, 2x6800gt, 600w pcz, stacker case, air cooled

  14. #64
    Xtreme Member
    Join Date
    Oct 2007
    Posts
    107
    http://img9.imageshack.us/img9/8168/...erclocking.jpg

    Definitely diminishing returns on the memory, if all you do is pump the memory the gains are negligible. But core & mem incremental increases seem to have a real benefit. I ran each benchmark twice (each is 3 loops long) and results never different more than a 0.2 fps (which makes the final result even more meaningless).

    I was gonna simply have the image display but it was too damn big. Link instead.

    Summary:

    Crysis VH @ 1080p

    850 core /1200 memory 37.2fps

    890 core /1220 memory 38.34fps

    900 core / 1230 memory 39.06fps

    900 core / 1300 memory 39.33fps
    Last edited by astrallite; 09-28-2009 at 02:18 AM.

  15. #65
    Xtreme Member
    Join Date
    Oct 2007
    Posts
    107
    With Crysis the minimum typically happens in area to area, or area to cutscene transitions, and definitely a faster CPU can help in these instances. Crysis has a long view distance, but most far objects are not rendered. When you quickly move into a new area, there's a sudden strain on the CPU to process a ton of new objects. I definitely found my min. framerates go up a bit moving from 3.2ghz to 3.6ghz on an i7.

    Check out PCGH's GTX280 CPU scaling review. On Crysis, minimum framerates scale linearly with cpu clock speed, from 2.4ghz to 3.6ghz.
    Last edited by astrallite; 09-28-2009 at 02:26 AM.

  16. #66
    Banned
    Join Date
    Oct 2006
    Posts
    963
    just out of intrest i'll run this bench for you...

    i can see i'm gonna have to put some effort into oc'ing the cpu above 4.1 for this...

    i'll be back in england and in front of my pc by tommorow evening hopefully... expect an update then...

    and thanks for an interesting thread...

  17. #67
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/...d.php?t=235181, thanks!
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  18. #68
    I am Xtreme
    Join Date
    Dec 2008
    Location
    France
    Posts
    9,060
    Quote Originally Posted by nascasho View Post
    I wish I could break the 3.8Ghz barrier to release some more potential outta these guys. Damn C0's.
    What's your vCore for 3,8?
    Donate to XS forums
    Quote Originally Posted by jayhall0315 View Post
    If you are really extreme, you never let informed facts or the scientific method hold you back from your journey to the wrong answer.

  19. #69
    Royal Administrator
    Join Date
    Jul 2005
    Location
    New York City
    Posts
    3,434
    Quote Originally Posted by zalbard View Post
    Uh, oh, someone merge this thread with http://www.xtremesystems.org/forums/...d.php?t=235181, thanks!
    Done.

  20. #70
    Xtreme Member
    Join Date
    Nov 2002
    Posts
    253
    Quote Originally Posted by astrallite View Post
    http://img9.imageshack.us/img9/8168/...erclocking.jpg

    Summary:

    Crysis VH @ 1080p

    850 core /1200 memory 37.2fps

    890 core /1220 memory 38.34fps

    900 core / 1230 memory 39.06fps

    900 core / 1300 memory 39.33fps
    Interesting, so 5-6% gain from a slight overclock ! Given the game you used, seems impressive to me ?
    EVGA X58 Classified 760
    W3520 @ 4410 HT 24/7
    Lapped TRUE 2x 110 CFM Scythe p-p
    6Gb Corsair XMS3 1600 7-7-7-20-1
    XFX 5870 1Gb
    PC Power and Cooling 750 Crossfire edition
    CoolerMaster 590

    XtremeSystems: Welcome to the Prommie Land !

  21. #71
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  22. #72
    Xtreme Member
    Join Date
    Jun 2005
    Location
    MA, USA
    Posts
    146
    It seems like the memory system complements the GPU well - neither appears to be a complete bottleneck at the speeds tested.
    | Cooler Master 690 II Advanced | Corsair 620HX | Core i5-2500K @ 5.0GHz | Gigabyte Z68XP-UD4 | 2x4096MB G.Skill Sniper DDR3-2133 @ 2134MHz 10-11-10-30 @ 1.55V | 160GB Intel X-25 G2 | 2x 2TB Samsung EcoGreen F4 in RAID 1 | Gigabyte HD 7970 @ 1340MHz/1775MHz | Dell 30" 3007WFP-HC | H2O - XSPC RayStorm and Swiftech MCW82 on an MCP350 + XSPC Acrylic Top, XSPC RX240 and Swiftech MCR220 radiators.

  23. #73
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,463
    When they increased the memory by 100mhz they got the same boost as when they increased the core by 65mhz.
    Bring... bring the amber lamps.
    [SIGPIC][/SIGPIC]

  24. #74
    Registered User
    Join Date
    Nov 2008
    Posts
    16
    This could mean nothing but I tested with 4870 1gb in COD4 at 750/530 (same compute resource - bandwidth ratio as 5870) and at 750/900 to see the impact of memory bandwidth. Settings: 1440x900, All options max, 4xAA, AF max

    750core 530mem


    750core 900mem


    crysis being much more shader intensive could mean memory overclocks having less effect on overall performance.
    Last edited by JimmyH; 09-28-2009 at 07:22 AM.

  25. #75
    Xtreme Mentor
    Join Date
    May 2008
    Location
    cleveland ohio
    Posts
    2,879
    Quote Originally Posted by hurleybird View Post
    Not what I was saying. What I was saying is that one thread is assigned to each group of five ALUs, unlike Nvidia's architecture where each ALU gets it's own thread. Because of this RV870 can only run a maximum of 1600 / 5 = 320 threads, which is way below the 1024-thread limit of DirectCompute11.
    so your saying the bottleneck is the thread dispatch and it's only being used at about 31.25% if it where redesigned to use all 1024 dispatches threads at once and not have those 5 alu's grouped. 5 alu is one SIMD. changing this to be all seprate alu shouldn't be too hard, the alu's them self are quite small already.

    it's seem to me it's more like what ever is easy the programs will go for shorter times to code things.
    easy isn't the best possible way to do things.
    HAVE NO FEAR!
    "AMD fallen angel"
    Quote Originally Posted by Gamekiller View Post
    You didn't get the memo? 1 hour 'Fugger time' is equal to 12 hours of regular time.

Page 3 of 7 FirstFirst 123456 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •