Page 1 of 3 123 LastLast
Results 1 to 25 of 72

Thread: GPU Benchmarking Methods Investigated: Fact vs. Fiction

  1. #1
    Xtreme Mentor
    Join Date
    Jul 2004
    Posts
    3,247

    GPU Benchmarking Methods Investigated: Fact vs. Fiction

    Benchmarks. Every website worth their marbles uses them to varying degrees of accuracy. Meanwhile, every reader wants to recreate them in some way, shape or form in order to do exactly what their favorite publications are doing: to evaluate the performance of their hardware choices and quantify their purchase. Benchmarks can also help diagnose a problem but more often than not websites like Hardware Canucks use these tools to determine how well a given product performs against the competition. As with all things, the number of programs we can attain results with is nearly infinite but it is the job of publications to choose the right set of tools which will accurately convey results to the masses. Unfortunately, as we will show you in this article choosing the right programs and sequences is extremely hard and most of the current methods are inaccurate.

    The reason why we have chosen to focus on GPU benchmarking is because this really is the wild-west of the online review industry. A fortune in terms of traffic can be had if GPU reviews are published regularly but with potential traffic increases comes the risk of cutting corners in order to complete the time-consuming benchmarking portion as quickly as possible. Naturally, some time-cutting methods will still produce accurate results while others won’t.

    In a general canvassing of over two dozen English-speaking tech websites we found a wide swath of benchmarks being used; from timedemos to stand-alone programs to in-game benchmarks to walkthroughs. What we also saw at times was a general lack of information beyond a game’s title regarding the actual type of benchmark used. For the most part it seemed many websites were using in-game benchmarking tools (mostly “rolling” demos) instead of actual gameplay and coming up with some interesting results. This along with comments in several forums got us wondering: is there a “right” way to benchmark a particular game? In addition, do these in-game or stand-alone benchmarking programs –like the recently released AvP DX11 test- represent in-game performance? If not, do they even provide an accurate enough analysis for a writer to formulate a conclusion about a given product? Well, we’re about to find out.
    http://www.hardwarecanucks.com/forum...s-fiction.html

  2. #2
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Good article!

    I've always preferred a gaming run rather than a timedemo. The results are pretty consistent if the reviewer uses the same route and go gun blazing. An experience of mine is that despite the 100 fps+ source engine benchmarks, I'm still getting 20 fps min fps on some very heavy on the graphics card/CPU area . Single threaded game, can't do much about that huh. Plus, we can hear the reviewer's touch of his gaming experience.

    EDIT: http://www.hardwarecanucks.com/forum...fiction-7.html

    L4d2 and FC2 is swapped.
    Last edited by blindbox; 06-15-2010 at 09:08 PM.

  3. #3
    Xtreme Addict
    Join Date
    Nov 2004
    Posts
    1,550
    Timedemoes are pretty accurate.

    From QUAKE to Counter-Strike: Source to FEAR.

    It depends on the game.

  4. #4
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    DiRT 2

    DiRT 2 is on of the only games we have in this article which uses a benchmarking sequence depecting actual gameplay.
    I see a typo :p

    nice article

    All along the watchtower the watchmen watch the eternal return.

  5. #5
    Xtreme Member
    Join Date
    Aug 2008
    Location
    Australia
    Posts
    373
    Very cool article

    I've always liked hwc reviews, they have the nice clean straightforward layout when it comes to graphics and figures

  6. #6
    Xtreme Addict
    Join Date
    Feb 2008
    Location
    America's Finest City
    Posts
    2,078
    We never run gaming "benchmarks" its more effective and realistic if you just run the actual game in a certain level and just take the statistics from that run rather than running a canned benchmark. The only reason we even run benchmarks that are usually considered synthetic is only to allow consumers to quantitatively compare against other synthetic benchmarks but in no way accurately reflects real world performance. My point of view is that real world performance results are more important than any 3DMark score when it comes to a product review.
    Quote Originally Posted by FUGGER View Post
    I am magical.

  7. #7
    Xtreme Mentor
    Join Date
    Apr 2003
    Location
    Ankara Turkey
    Posts
    2,631
    Quote Originally Posted by Russian View Post
    My point of view is that real world performance results are more important than any 3DMark score when it comes to a product review.


    i always think gpus had to have 2 reviews one for normal customers made by the normal reviewers and a overclocking review for oc community made by hardcore overclocklers


    When i'm being paid i always do my job through.

  8. #8
    World Champion - IRONMODS
    Join Date
    Sep 2007
    Location
    Northern Japan
    Posts
    2,029
    What's wrong with reviews that cover both aspects?
    Quote Originally Posted by Massman
    My definition of 'efficient' is 'it does not suck monkeyballs'. Yes, I set bars low.
    [CENTER]The post counter is not an intelligence meter!

    MAX11L - "It's like a console...with the suck turned down and the awesome turned up" -tet5uo
    Heat Team IRONMODS

  9. #9
    Xtreme Mentor
    Join Date
    Apr 2003
    Location
    Ankara Turkey
    Posts
    2,631
    Quote Originally Posted by miahallen View Post
    What's wrong with reviews that cover both aspects?
    there is nothing wrong but not enough for ocers. most read them but the final decision always giving after reading oc forums like here.


    When i'm being paid i always do my job through.

  10. #10
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    Quote Originally Posted by Russian View Post
    We never run gaming "benchmarks" its more effective and realistic if you just run the actual game in a certain level and just take the statistics from that run rather than running a canned benchmark. The only reason we even run benchmarks that are usually considered synthetic is only to allow consumers to quantitatively compare against other synthetic benchmarks but in no way accurately reflects real world performance. My point of view is that real world performance results are more important than any 3DMark score when it comes to a product review.
    I agree. One of the main problems is consistency when it comes to actual gameplay versus some of the built-in / stand along rolling benchmarks. The vast majority of the time they do give reviewers a quick way to benchmark a game but for the most part they are inaccurate.

    I have also seen talk (and I'm not sure where it stems from) that timedemos don't allow for accurate results. There weren't too many games to test this out with but from what I have seen, timedemos are actually an excellent way to benchmark.

    It all comes down to knowledge of a game. Just loading up a built-in benchmark without know if it is accurate or not is a surefire way to come up with the wrong conclusion.

  11. #11
    Xtreme Addict
    Join Date
    Nov 2007
    Location
    Illinois
    Posts
    2,095
    Whoa, doesn't that mean HardOCP have been doing it right all along????
    E7200 @ 3.4 ; 7870 GHz 2 GB
    Intel's atom is a terrible chip.

  12. #12
    Xtreme Cruncher
    Join Date
    Jul 2003
    Location
    Finland, Eura
    Posts
    1,744
    Excellent article


    http://mato78.com - Finnish PC Hardware news & reviews
    BulldogPO @ Twitter


  13. #13
    Xtreme Member
    Join Date
    May 2005
    Posts
    196
    Quote Originally Posted by SKYMTL View Post
    I agree. One of the main problems is consistency when it comes to actual gameplay versus some of the built-in / stand along rolling benchmarks. The vast majority of the time they do give reviewers a quick way to benchmark a game but for the most part they are inaccurate.

    I have also seen talk (and I'm not sure where it stems from) that timedemos don't allow for accurate results. There weren't too many games to test this out with but from what I have seen, timedemos are actually an excellent way to benchmark.

    It all comes down to knowledge of a game. Just loading up a built-in benchmark without know if it is accurate or not is a surefire way to come up with the wrong conclusion.
    It comes from about a decade ago when nvidia and ati would optimize their drivers so that it performs better in a specific time demo. This was back when Quake 3 Arena was the standard bench for every review on every site. If you have a repeatable pattern then it will be exploited. This prompted tech-report to start recording their own time demos and [H] to make elaborate analysis.

    Overall most reviews are pretty bad and lack true scientific analysis but most people don't really understand or care.
    i5 750 @ 4.2ghz
    EVGA P55 FTW
    8gig G.Skill Ripjaw @ 1055mhz
    Gigabyte 6950 modded
    Seasonic X-650
    Antec P180 modded and watercooled
    Thermochill PA160
    Apogee XT
    MCP350

  14. #14
    Xtreme Enthusiast
    Join Date
    Feb 2009
    Posts
    800
    Quote Originally Posted by cegras View Post
    Whoa, doesn't that mean HardOCP have been doing it right all along????
    I don't like how they prioritize resolution than AA. Not everyone here has a 2560x1600. HardOCP might have been using real gameplay, that's nice, but I still don't like their best settings methodology.

  15. #15
    Xtreme Member
    Join Date
    May 2010
    Location
    Goldsboro NC
    Posts
    111
    Excellent article.

  16. #16
    Xtreme Member
    Join Date
    May 2005
    Posts
    193
    Conclusion: the 5850 is still the king price/perf, perf/watt, heat & noise.

  17. #17
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    im pretty sure older cards are still better price/perf kings, considering a 4850 is 100$, nearly 1/3 the price, but i doubt its 3x slower

  18. #18
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116
    thank you SKYMTL for trying to introduce reason into this situation.

    it is clear that some software doesn't represent interesting results. interesting results? results that reflect what people buy the hardware for. game performance, computation performance, ... but showing off with benchmarks doesn't really count. people only care about showing off benchmarks because they think they represent the games. they are devalued if they dont represent games.

    here is a sample of the not-interesting results:
    "While the “Dark Tower” benchmark does tend to almost come close to the overall results from our in-game testing, it still doesn’t represent real-world performance"
    "In the game itself, there is a wide gap between the NVIDIA and ATI cards while the benchmarks result in very similar performance between the two solutions."

    and of course just running through the game manually introduces error. but the timedemos were interesting, and you said this:
    "One of the main issues we have with timedemos is that so few games actually support them."

    so clearly it is a minefield of USELESS and USEFUL software. remember "just because you CAN, doesn't mean you SHOULD". that goes for benchmarks.

    someone with a brain needs to filter out the useless software. so they can review hardware well.

    SKYMTL to the rescue:

    However, there are currently a small number of games like DiRT 2 and HawX which incorporate benchmark sequences that accurately recreate in-game scenarios.

    For the most part we have seen accurate results when timedemos are compared to in-game sequences

    it is imperative publications state exactly what tools they are using for their benchmarks.

    When we choose to add a game to our stable of benchmarkable titles, the first thing that’s done is a full play-through in order to determine what typical performance is like. With the help of FRAPS, a “worst case” scenario is derived from this playthrough which we will use for the basis of our benchmark run or to help determine if a built-in benchmark holds any foundation in reality.

    you will find videos of our benchmark sequences (if applicable) as well as save game files, timedemos, graphics setting screenshots and even guides to show you how to set everything up.
    thank you SKYMTL for trying to innovate and produce quality work. I wish you luck in the future as you continue to iterate through improvements of your benchmark methodology.

  19. #19
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601
    Thanks Mike ! Really good work

  20. #20
    Xtreme Member
    Join Date
    Jan 2007
    Location
    Lancaster, UK
    Posts
    473
    Well done to Hardware Canucks for investing all this time and producing a very well written and interesting article. Cheers guys! Would rep you if we had such a system
    CPU: Intel 2500k (4.8ghz)
    Mobo: Asus P8P67 PRO
    GPU: HIS 6950 flashed to Asus 6970 (1000/1400) under water
    Sound: Corsair SP2500 with X-Fi
    Storage: Intel X-25M g2 160GB + 1x1TB f1
    Case: Sivlerstone Raven RV02
    PSU: Corsair HX850
    Cooling: Custom loop: EK Supreme HF, EK 6970
    Screens: BenQ XL2410T 120hz


    Help for Heroes

  21. #21
    Xtreme Addict
    Join Date
    Aug 2007
    Location
    Toon
    Posts
    1,570
    Quote Originally Posted by kromosto View Post


    i always think gpus had to have 2 reviews one for normal customers made by the normal reviewers and a overclocking review for oc community made by hardcore overclocklers
    Absolutely, same goes for games. I like the performance review Guru3D was doing for games (hint, hint do some more) as an alternative to actually reviewing a game.

    This review of reviewing methods is important as there are regular instances of disagreements between some sites and other and it's only those who know what to look for who will see the difference.
    Intel i7 920 C0 @ 3.67GHz
    ASUS 6T Deluxe
    Powercolor 7970 @ 1050/1475
    12GB GSkill Ripjaws
    Antec 850W TruePower Quattro
    50" Full HD PDP
    Red Cosmos 1000

  22. #22
    Xtreme Addict
    Join Date
    Jul 2009
    Posts
    1,023
    I prefer time demo's because of their accuracy but looking at the results, on most of them the benchmark is lower than what you get in gameplay which could mislead people to thinking that a game might not play on a certain gfx card
    i7 920 @ 4GHz 1.25v
    GTX 470 @ 859MHz 1062mv

  23. #23
    Xtreme Member
    Join Date
    Oct 2007
    Posts
    407
    In theory, I like Kyle's methods too, but these days few games can challenge the high end cards enough to drop down below 30" monitor resolutions. So they are mainly useful to owners of 30" monitors. And that's the problem with that method. It is too dependent on what size monitor you happen to have. My monitor is 1600x1200 and that res almost never makes it into benchmarks these days.

  24. #24
    Xtreme Member
    Join Date
    Dec 2009
    Posts
    435
    Quote Originally Posted by cegras View Post
    Whoa, doesn't that mean HardOCP have been doing it right all along????
    No because they compare cards using different ingame settings.
    i7 920 D0 / Asus Rampage II Gene / PNY GTX480 / 3x 2GB Mushkin Redline DDR3 1600 / WD RE3 1TB / Corsair HX650 / Windows 7 64-bit

  25. #25
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    The ironic thing here is that this article is perpetuating one of the most common benchmarking mistakes of today: providing minimum frame rates without qualifying them. Minimum FPS by itself is worthless, since for all you know it be for a single frame at the start of the level, or conversely that card might be hitting that minimum frame-rate all of that time. Another example, if one card hits a very low minimum frame rate once for a very short period, and another card hits a higher minimum frame rate but goes there more often, it's the first card with the lower min fps that is providing the better game play experience. If you want to provide minimum frame-rates, you MUST qualify them with a graph of fps over time, or at the very least a description of the gameplay. Unfortunately this poor methodology is very widespread.

Page 1 of 3 123 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •