MMM
Page 2 of 3 FirstFirst 123 LastLast
Results 26 to 50 of 72

Thread: GPU Benchmarking Methods Investigated: Fact vs. Fiction

  1. #26
    Xtreme Addict
    Join Date
    Feb 2008
    Location
    America's Finest City
    Posts
    2,078
    Quote Originally Posted by cegras View Post
    Whoa, doesn't that mean HardOCP have been doing it right all along????
    Kinda, except for with their reviews... they compare different cards with different settings enabled and usually include statements about the quality of the game difference with or without those settings enabled. In many cases, a faster card will have higher AA settings enabled and they usually try to peg the FPS as closely to each other. I'd rather just crank the settings identically on both cards and compare the performance deltas rather than to try to quantify the different AA settings.
    Quote Originally Posted by FUGGER View Post
    I am magical.

  2. #27
    Xtreme Member
    Join Date
    Sep 2007
    Posts
    382
    Quote Originally Posted by hurleybird View Post
    The ironic thing here is that this article is perpetuating one of the most common benchmarking mistakes of today: providing minimum frame rates without qualifying them. Minimum FPS by itself is worthless, since for all you know it be for a single frame at the start of the level, or conversely that card might be hitting that minimum frame-rate all of that time. Another example, if one card hits a very low minimum frame rate once for a very short period, and another card hits a higher minimum frame rate but goes there more often, it's the first card with the lower min fps that is providing the better game play experience. If you want to provide minimum frame-rates, you MUST qualify them with a graph of fps over time, or at the very least a description of the gameplay. Unfortunately this poor methodology is very widespread.
    Thats why you have the average frame rate
    the closer the min is to the average, the more often that card hits the low-end. I find min to be fairly useful in determining which cards will deliver smoother gameplay. Its not accurate all the time, but its good for when you want to breeze through the pages
    my mini-fridge
    MoBo: GA-EP45-UD3P | CPU: Q9550 3.6GHZ @ 1.216v | RAM: 4x1GB 900mhz @ 5-5-5-15 | GPU: GTX 460 900G/1800S/4400M | PSU: Corsair 750TX| HDD's: Seagate 320GB + 500GB, Samsung 1TB | Case: antec p180

  3. #28
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by Russian View Post
    Kinda, except for with their reviews... they compare different cards with different settings enabled and usually include statements about the quality of the game difference with or without those settings enabled. In many cases, a faster card will have higher AA settings enabled and they usually try to peg the FPS as closely to each other. I'd rather just crank the settings identically on both cards and compare the performance deltas rather than to try to quantify the different AA settings.
    then go to every other site out there, let them be unique and have a niche.

    there are many ways to do things, and there are also many ways that dont have a right/wrong method. putting cards to the best playable setting is really good since i dont care if 480 or 5870 get 200 to 250frps, since thats not the setting im going to use

    but u have to be careful, if they put on super high res textures that kill 512MB cards, then when u look at once that does 0xaa and 0xaf, the other does 4xAA, and 16xAF, u think one is 10x stronger. but drop it down to just high res textures, and both can do 4/16, and its much closer in battle, and you only have to question if u want the utlra textures or not, instead of the impression of one being crap.

    so what i like in a review:
    especially for new architectures, test how a card is impacted with increase to AA, AF, textures, LOD, resolution, and maybe the few random effects like, water or shadows, but not primarlily.
    then using that data u can assume how it handles each major performance to determine what becomes a quick bottleneck, and your best playable settings can be justified. for some people they must be able to game at 1080p, others dont have such a high res and must have 4xAA minimum, while there is always some give and take, the 2900 was a great example where resolution wasnt a problem, but AA was.
    then do the standard tests for whats a best playable settings comparison,
    do a decent overclocking comparison,
    then the usual, power consumption, temps, overclocking

  4. #29
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    Quote Originally Posted by cegras View Post
    Whoa, doesn't that mean HardOCP have been doing it right all along????
    Like I said in the conclusion: there is no "right" way to go about it. Kyle's team has a unique persepctive and it sets them (and sometimes their conlcusions) apart from the norm. They do use in-game sequences and research things properly it seems which is a huge step in the right direction. However, I am not actually sure if they include action sequences in their benchmarks or if they are just doing a run-through.

    What the article is really meant to convey is that the vast majority of standard benchmarking methods (built in benchmarks & stand-alone) are dead wrong. This is why readers should push for a transparent benchmarking process where there is disclosure of exactly which methods were used.

    Quote Originally Posted by hurleybird View Post
    The ironic thing here is that this article is perpetuating one of the most common benchmarking mistakes of today: providing minimum frame rates without qualifying them. Minimum FPS by itself is worthless, since for all you know it be for a single frame at the start of the level, or conversely that card might be hitting that minimum frame-rate all of that time. Another example, if one card hits a very low minimum frame rate once for a very short period, and another card hits a higher minimum frame rate but goes there more often, it's the first card with the lower min fps that is providing the better game play experience. If you want to provide minimum frame-rates, you MUST qualify them with a graph of fps over time, or at the very least a description of the gameplay. Unfortunately this poor methodology is very widespread.
    I can't speak for other sites but the fact that we test every run three times and average out the results eliminates any "zingers" when it comes to minimum framerates. Average FPS are also there for a reason.

  5. #30
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by ElSel10 View Post
    No because they compare cards using different ingame settings.
    This.

  6. #31
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Quote Originally Posted by Aleki View Post
    Thats why you have the average frame rate
    the closer the min is to the average, the more often that card hits the low-end.
    Not necessarily. If one card has a memory deficiency it could hit very low minimums far more often than another card that hits higher minimums less often. You can't tell for sure without looking a graph of FPS/time. Again, the minimum by itself is vague and misleading.


    Quote Originally Posted by SKYMTL View Post
    I can't speak for other sites but the fact that we test every run three times and average out the results eliminates any "zingers" when it comes to minimum framerates. Average FPS are also there for a reason.
    Yes, obviously no one would read your reviews if you didn't have average FPS

    As far as averaging results, that may or may not make a difference depending on how that minimum is reached. If it is reached in a cut scene or during the loading of a scene or level, you can still have a very low min FPS that is reached only once and does not have any meaningful impact on game-play. On the other hand many people would interpret those numbers as if there was a meaningful impact in game-play experience. That's sloppy. Look at Heaven 2.0, as far as I can tell running on my 5870 it reaches it's min frame rate loading between two scenes for such a short time that its not really noticeable if you aren't looking at the FPS as it happens. However, for the most part frame-rate stays stable... I would say more so than in Heaven 1.0 -- yet how often do you see mention of Fermi cards "doubling" the min FPS of the Radeons in Heaven? It's obvious that there is a problem with methodology here. If you can't qualify a vague and easily misinterpreted number, you shouldn't be using it, period. Do it *right* or don't do it.
    Last edited by hurleybird; 06-16-2010 at 11:04 AM.

  7. #32
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    You can't really equate Heaven 2.0's shortcomings with potential in-game issues. In the end, I would rather have an idea of minimums rather than none at all.

  8. #33
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by SKYMTL View Post
    You can't really equate Heaven 2.0's shortcomings with potential in-game issues. In the end, I would rather have an idea of minimums rather than none at all.
    i wouldnt, its way to misleading if its based only on the hard-drive, and theres no way to know that unless we have a frequency

    what could help is standard deviations. basically if you get 60fps average, with 5fps as your standard deviation, then most frames are between 55-65, and a few 50-70, and extremely few above or below that.

    or just a percentage within 10%, 20% and 50%

  9. #34
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Quote Originally Posted by SKYMTL View Post
    You can't really equate Heaven 2.0's shortcomings with potential in-game issues. In the end, I would rather have an idea of minimums rather than none at all.
    Here's the thing... making a graph of FPS over time might be difficult, I don't know... I know [H] is able to, although there are parts of their methodology which a lot of people including myself don't care for. However, the least that one can do if they want to use a metric like minimum FPS is to qualify it with a description of game-play experience, and the same goes for dual GPU options that might have micro-stuttering. Saying something like "we recorded a low minimum frame-rate, but we didn't experience any choppiness in-game," or "minimum frame rates were low, and it felt like we were hitting them often in scenes with a lot of action" would go a long way. As it is right now, a low minimum could mean everything or nothing, leaving that up to the user to interpret muddies the review, and in that case they *are* better left out. I think I sent you a PM earlier about this, but a far better metric, if it could be extracted, would be general frame stability via std deviation (EDIT: thanks poster above me!), or even minimums and maximums (and actually high maximums are a BAD thing because that means the frame-rate is less stable, but like minimums it can mean everything or nothing depending on how often it occurs) qualified by std deviation.

  10. #35
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    You make very good points and I'm actually quite happy this came up since some of your suggestions would add a new dimension to the reviews. This is especially true considering we are advocating using actual gameplay sequences which lends itself to a more accurate representaion of minimum framerates over time.

    However, I do take issue to the statement that the article advocates an incorrect benchmarking process because I don't feel that it does. A follow-up article will be going into the uses of minimum framerates / averages but this one was really concentrating upon the methods to achieve results rather than how the results are shown and communicated in graph form.
    Last edited by SKYMTL; 06-16-2010 at 11:32 AM.

  11. #36
    Xtreme Enthusiast
    Join Date
    Jul 2004
    Posts
    535
    Quote Originally Posted by SKYMTL View Post
    You make very good points and I'm actually quite happy this came up since some of your suggestions would add a new dimension to the reviews. This is especially true considering we are advocating using actual gameplay sequences which lends itself to a more accurate representaion of minimum framerates over time.

    However, I do take issue to the statement that the article advocates an incorrect benchmarking process because I don't feel that it does. A follow-up article will be going into the uses of minimum framerates / averages but this one was really concentrating upon the methods to achieve results rather than how the results are shown and communicated in graph form.


    Sounds good, can't wait for the next article.

  12. #37
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116
    there is room to talk here, but I want you to decide on your own how you want to express performance http://en.wikipedia.org/wiki/Three_sigma_rule

  13. #38
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    Quote Originally Posted by bamtan2 View Post
    there is room to talk here, but I want you to decide on your own how you want to express performance http://en.wikipedia.org/wiki/Three_sigma_rule
    Is this in reference to using multiple benchmark runs to create an average result?

    All along the watchtower the watchmen watch the eternal return.

  14. #39
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    Quote Originally Posted by STEvil View Post
    Is this in reference to using multiple benchmark runs to create an average result?
    I think so. However, this early in the morning my math cap doesn't seem to sit properly on my head....

  15. #40
    Xtreme Addict
    Join Date
    Nov 2007
    Location
    Illinois
    Posts
    2,095
    Actually, SKY, I felt that a lot of benchmarks never told the real story. Most glaringly was with games like TF2, that can be benched in a siiiimilar way through HL2:Ep 2 benches and L4D / L4D2 benches. I found that a 4850 really is inadequate, even at 1440 x 900.

    I actually sat down and read through your article, it was quite good. Not much to say except I always felt that there was a lot of statistical analysis missing from reviews.
    E7200 @ 3.4 ; 7870 GHz 2 GB
    Intel's atom is a terrible chip.

  16. #41
    Xtreme Member
    Join Date
    Apr 2010
    Posts
    145
    Quote Originally Posted by Manicdan View Post
    i wouldnt, its way to misleading if its based only on the hard-drive, and theres no way to know that unless we have a frequency

    what could help is standard deviations. basically if you get 60fps average, with 5fps as your standard deviation, then most frames are between 55-65, and a few 50-70, and extremely few above or below that.

    or just a percentage within 10%, 20% and 50%
    I've been wondering why reviews haven't been using the standard deviation.

  17. #42
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by iMacmatician View Post
    I've been wondering why reviews haven't been using the standard deviation.
    i think many programs are overly simplistic, they just take the last framerate, and add it to an average framerate. i dont care to get into the exact math, but the idea is that its able to give you an average, without actually having to save off any data other than current average, and number of frames. however due to the fact every game is probably not using all of your cpu, i dont think its hard to give good quality benchmarkes like the one in FC2, since i doubt its very taxing at all.

  18. #43
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116
    Quote Originally Posted by STEvil View Post
    Is this in reference to using multiple benchmark runs to create an average result?
    na. standard deviation is used to express how far away from the mean a certain amount of the sample set is. you can change that certain amount by changing the number of standard deviations you count away from the mean.

    so if you were concerned about the minimum framerate value expressing just some odd one-off circumstance, like some posters are, you could use the mean framerate minus three standard deviations instead. this would leave off the most extreme minimums while still capturing 99% of the other values.

    you could even do this with only the values below the original mean, which would leave off all the high values you dont care about. standard deviation is normally used with "normal distributions" and using it with whatever a benchmark turns out to be may not produce results we could understand or predict right now

    just think about the requirement (expressing useful information to potential video card buyers) and find a math tool from the massive library of math tools to manipulate your data set (ideally instantaneous frame rates like from fraps, because using pre-calculated fps is like cooking with flavored sardines) into what will help you

  19. #44
    Xtreme Addict
    Join Date
    Aug 2007
    Location
    Toon
    Posts
    1,570
    Quote Originally Posted by gojirasan View Post
    In theory, I like Kyle's methods too, but these days few games can challenge the high end cards enough to drop down below 30" monitor resolutions. So they are mainly useful to owners of 30" monitors. And that's the problem with that method. It is too dependent on what size monitor you happen to have. My monitor is 1600x1200 and that res almost never makes it into benchmarks these days.
    Yeah, I play at 1080p or 1600x1200, but there are plenty of games that can slow down at these resolutions.
    Intel i7 920 C0 @ 3.67GHz
    ASUS 6T Deluxe
    Powercolor 7970 @ 1050/1475
    12GB GSkill Ripjaws
    Antec 850W TruePower Quattro
    50" Full HD PDP
    Red Cosmos 1000

  20. #45
    Banned
    Join Date
    Jun 2008
    Location
    Mi
    Posts
    1,063
    Quote Originally Posted by SKYMTL View Post
    Like I said in the conclusion: there is no "right" way to go about it. Kyle's team has a unique persepctive and it sets them (and sometimes their conlcusions) apart from the norm. They do use in-game sequences and research things properly it seems which is a huge step in the right direction. However, I am not actually sure if they include action sequences in their benchmarks or if they are just doing a run-through.

    What the article is really meant to convey is that the vast majority of standard benchmarking methods (built in benchmarks & stand-alone) are dead wrong. This is why readers should push for a transparent benchmarking process where there is disclosure of exactly which methods were used.



    I can't speak for other sites but the fact that we test every run three times and average out the results eliminates any "zingers" when it comes to minimum framerates. Average FPS are also there for a reason.

    Great article, thank you.

    I agree with your post above, when reading reviews I personally weigh each review to peace an over-all picture of performance. But, I place more value on histograms, because visually that explains more information to me.

    Max FPS has never (ever) been a criteria in which I judge, it has always been sustained minimal frames that I am most interested in avoiding.

  21. #46
    c[_]
    Join Date
    Nov 2002
    Location
    Alberta, Canada
    Posts
    18,728
    Quote Originally Posted by bamtan2 View Post
    na. standard deviation is used to express how far away from the mean a certain amount of the sample set is. you can change that certain amount by changing the number of standard deviations you count away from the mean.

    so if you were concerned about the minimum framerate value expressing just some odd one-off circumstance, like some posters are, you could use the mean framerate minus three standard deviations instead. this would leave off the most extreme minimums while still capturing 99% of the other values.

    you could even do this with only the values below the original mean, which would leave off all the high values you dont care about. standard deviation is normally used with "normal distributions" and using it with whatever a benchmark turns out to be may not produce results we could understand or predict right now

    just think about the requirement (expressing useful information to potential video card buyers) and find a math tool from the massive library of math tools to manipulate your data set (ideally instantaneous frame rates like from fraps, because using pre-calculated fps is like cooking with flavored sardines) into what will help you
    ah, makes more sense... also sounds like something somewhat would use to predict performance on a part before its mature for retail....

    All along the watchtower the watchmen watch the eternal return.

  22. #47
    Xtreme Addict
    Join Date
    Mar 2009
    Posts
    1,116
    SKYMTL: once you do the leg work to see how the products from each company compare in each benchmark (which you've done), you can express the results with just a couple benchmarks. so for instance, if you want to compare some cards with the new drivers, you may only need the far cry 2 benchmark (which favors nvidia the most) and the dirt2 benchmark (which favors nvidia the least, if my guess is right). between the two benchmarks, the reader should have an idea of the worst case they'll be dealing with, whichever card they're looking at. the more benchmarks you add past that, the less useful each benchmark is, and the more time you've wasted.

    there are two things the potential buyer should be looking for. (1) some general expression of performance to represent all games, present and future. (2) pretty specific performance information about the exact game(s) they play. you can hit #1 with probably two benchmarks. hitting #2 requires exhaustive coverage of benchmarks, resolutions, system configurations... so maybe 5x-10x the work for not much more usefulness.

  23. #48
    I am Xtreme
    Join Date
    Dec 2007
    Posts
    7,750
    Quote Originally Posted by bamtan2 View Post
    SKYMTL: once you do the leg work to see how the products from each company compare in each benchmark (which you've done), you can express the results with just a couple benchmarks. so for instance, if you want to compare some cards with the new drivers, you may only need the far cry 2 benchmark (which favors nvidia the most) and the dirt2 benchmark (which favors nvidia the least, if my guess is right). between the two benchmarks, the reader should have an idea of the worst case they'll be dealing with, whichever card they're looking at. the more benchmarks you add past that, the less useful each benchmark is, and the more time you've wasted.

    there are two things the potential buyer should be looking for. (1) some general expression of performance to represent all games, present and future. (2) pretty specific performance information about the exact game(s) they play. you can hit #1 with probably two benchmarks. hitting #2 requires exhaustive coverage of benchmarks, resolutions, system configurations... so maybe 5x-10x the work for not much more usefulness.
    dirt2 likes nvidia cards too, cause of the extra tessellation power

  24. #49
    Xtreme Addict
    Join Date
    Nov 2007
    Location
    Illinois
    Posts
    2,095
    bamtan: Yeah, but to get an accurate average you need at least 5 runs. And that's just for a making sure you have a good linear fit. The sample size for statistically relevant average + 3 SD rule would be LARGE.

    Extrapolating from 3 runs gives you no useful information at all.
    E7200 @ 3.4 ; 7870 GHz 2 GB
    Intel's atom is a terrible chip.

  25. #50
    Xtreme Guru
    Join Date
    Aug 2007
    Posts
    3,562
    We do three runs and as it is, benching each card takes about 5 hours straight. Multiply that by five or six cards and things start getting a bit nuts when upping that to 6+ runs. I know I said it shouldn't come down to time but there comes a situation where so much time is invested redoing benchmarks when drivers are released that no reviews are actually posted...

Page 2 of 3 FirstFirst 123 LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •