PDA

View Full Version : Still we have current hwbot marking anomalies in 3dmark



zakelwe
02-22-2007, 12:06 PM
I'm puzzled that hwbot can suggest a new points system on hwbot with 40 members per team while they still cannot get right the present system for individuals. As per my previous post :-

http://www.xtremesystems.org/forums/showthread.php?t=133203

I am still waiting for a realistic explanation why I got 4 times as much score got for a rig I just threw together in 5 minutes than hours and hours and hours and hours and hours and hours of benching FX 5600U and 5700U and 6200 and the list goes on and and on and on

I got this from massman

"Within a few years, you'll see the difference."

Sorry, but I want my propper ranking now, not in 2011 or whatever you seem fit.

Who came up with this scoring convention? Who decided what should score what ? Futuremark have a white paper on each of their 3dmark tests telling you the marking scheme and why, I expect the same from hwbot if it is going to be taken seriously in 3d results.

For instance from Futuremark

http://www.futuremark.com/companyinfo/pressroom/companypdfs/3dmark03_whitepaper.pdf?m=v

Can a hwbot person post up something similar so I can compare and contrast on your marking scheme for 3dmark ?

Regards

Andy

mtzki
02-22-2007, 12:18 PM
Point systems explained here:
http://www.hwbot.org/forums/viewtopic.php?id=547

Current system + plans for changes both there. 3DMarks get no special treatment in hwbot.

HeavyH20
02-22-2007, 12:38 PM
Actually, your hardware points look just fine. I got 2 whopping points for the number 1 spot with 8800 GTS SLI. Which is acceptable, since my global points come from the 8800 GTX runs and 8800 GTS SLI results are few and far between. If you link and compare those results, there is little to no competition for those 2 points. They are basically "free" points. Just ask dm about those:

http://www.hwbot.org/user.do?userId=5432

Again, since 99 % of the results will be on current enthusiast hardware (8800 series, 7900 series, X1900, etc. ), the 2 points is reasonable for a hardware point award. As noted, even 8800 GTS SLI is not popular, so the points are limited and justifiably so. The "in a few years" comment is valid. Since there are plenty of runs for the current cards, running an "old" card two years from now means you will get a higher hardware a point award. It just will not happen for the current antique cards like the 5600 or 5700. Makes perfect sense since there is no history on these older cards. If there were more than 5 results, the hardware points would have neen higher. Right now, each person who posted ANY result with a 5700 got hardware points. Not sure what your issue is with that.

zakelwe
02-22-2007, 01:32 PM
Well you are writing off history before hwbot came into being in effect because it does not accurately represent old scores. It takes into account modern popularity. Note thats not total historical popularity on benching because there are far more GF4 scores than 8800GTS scores in existence.

What are you measuring here, popularity on modern hardware or benching skills?

Like I said before, 3dmark has a white paper on how the benchmark runs and how it is judged. I'd like hwbot for 3dmark to do the same, or at least explain why it is valid for a #5 place 7900GS to score as much as 5 other cards at #1without "popularity" being the deciding factor.

Lets cut to the chase, with hwbot old stuff is no news and is something to be forgotten . With new stuff who decides the marking results and how do we know it is respresentative and accurate ?

Where is hwbots white paper on how they work ?

Regards

Andy

And in case anyone couldn't be bothered reading I'll say it again in bold

What are you measuring here, popularity on modern hardware or benching skills?

HeavyH20
02-22-2007, 02:05 PM
Well, that is true, there are plenty more GF4 results than 8800, but they are not yet submitted to the HWBOT result base. So, according to their results, the GF4 is the least popular card, ever and the 8800 GTX, the most popular. Also, Futuremark results include the non-OC world as well. HWBOT is frequented by hardware enthusiasts, so the most popular hardware will not be the 7300 GT, but the 8800GTX, even though there are likley 30 times the 7300 GT cards in existence. They are simply not in enthusiast systems. So, bench a 6200 and there are three others doing the same. Bench a 8800 GTX, and there are 173 others doing the same. How would it be fair to give 10 points to the best 5700 and 10 points to the best 8800 GTX when there is no competition for the 5700 result? It certainly took a heck of a lot more work to be the fastest 8800 GTX than it did for the 5700 since no one else showed up to the race. That in turn, takes no skills to accomplish. HWBOT has NO history. It is current and forward. Now, if all those 5700 results were somehow posted on HWBOT, the points would rise. But, that is highly unlikely. I for one, deleted my 5700U and 5800U results a long time ago.

It is NOT a popularity contest, but people are awarded for their skill to overcome competition. If you rank in the top 5 amongst 200 results, bravo. If you rank No. 1 amongst 5 , I am not sure you should get any points. There should be a 20 result minimum to get any hardware points on a particular platform.

zakelwe
02-22-2007, 02:24 PM
Well, that is true, but they are not yet submitted to the HWBOT result base. So, according to their results, the GF4 is the least popular card, ever and the 8800 GTX, the most popular. Also, Futuremark results include the non-OC world as well. HWBOT is frequented by hardware enthusiasts, so the most popular hardware will not be the 7300 GT, but the 8800GTX, even though there are likley 30 times the 7300 GT cards in existence. They are simply not in enthusiast systems. So, bench a 6200 and there are three others doing the same. Bench a 8800 GTX, and there are 173 others doing the same. How would it be fair to give 10 points to the best 5700 and 10 points to the best 8800 GTX when there is no competition for the 5700 result? It certainly took a heck of a lot more work to be the fastest 8800 GTX than it did for the 5700 since no one else showed up to the race. HWBOT has NO history. It is current and forward. Now, if all those 5700 results were somehow posted on HWBOT, the points would rise. But, that is highly unlikely. I for one, deleted my 5700U and 5800U results a long time ago.

That means there is no incentive for trying to beat old scores. That means you cannot bench an old card just for fun. That means you are pretty elitest and say you have to be prepared to pay top dollar to get anywhere.

Is that really what you want ? It's not what I want.

Regards

Andy

bazx
02-22-2007, 03:24 PM
No one can sit on there past achievements

that’s the point

You must work at it day after day

Otherwise you would grow old a world champ

DDTUNG
02-22-2007, 03:38 PM
Andy,

The truth of the matter is, people benching old equipment doesn't help manufacturers sell the latest expensive stuff.

If you want recognition for true skills, post your results at forums and you might still find others who appreciate it. Post them on HWBot and you get your token 2 points at best.

DDTUNG:cool:

[XC] riptide
02-22-2007, 03:54 PM
Leave the historical records alone. You guys should have more foresight. Yes there are many x1900 / 8800 results and fewer Mach64 results (for example). But as time moves on think what happens. 5 years from now? Yes the X1900XTX and 8800 will be the new Mach64 of the bench world... but there will always be that high number who benched them. The effects you speak about are only due to the relative age of HWBOT.

HeavyH20
02-22-2007, 05:16 PM
As HWBOT matures, the results database will be filled with current hardware results that will soon mature and be considered old. With that, you will be competing with hundreds of results. I am not being an elitist, but simply a realist. It is hard to consider any 5700U result great, average, or poor with 5 submitted results. It is becoming a trend, but there are more than a few individuals in the top 20 scores simply because they have posted the one and only result for a particular antiquated CPU or video card. That is hardly fair and belittles the current efforts of many people attempting to get decent results OCing current hardware. I contend that if there are not at least 20 posted results in a given category, there should be no hardware points granted. Limiting some of the obtuse hardware results would keep the leaderboard pure and a good representation of the top OC talent. Right now, that is rapidly decaying.

Maybe a better way would be to ramp up the hardware points as the category expands, much like they do already. They may simply need to define a new minimum with .5 points to start instead of 2. So, a single unchallenged result is .5 points instead of 2. Simple and reasonable.

Gorod
02-22-2007, 08:04 PM
It is becoming a trend, but there are more than a few individuals in the top 20 scores simply because they have posted the one and only result for a particular antiquated CPU or video card. That is hardly fair and belittles the current efforts of many people attempting to get decent results OCing current hardware. I contend that if there are not at least 20 posted results in a given category, there should be no hardware points granted. Limiting some of the obtuse hardware results would keep the leader board pure and a good representation of the top OC talent. Right now, that is rapidly decaying.

Maybe a better way would be to ramp up the hardware points as the category expands, mcuh like they do already. They may simply need to define a new minimum with .5 points to start instead of 2. So, a single unchallenged result is .5 points instead of 2. Simple and reasonable.

I agree 100% with every single word :clap: :toast:
I personally can hardly find it fun benching old non popular hardware (with zero or barely any competition at all) just for getting loads of cheap points for that , but thats the easiest way earning the points sadly :( Does my #8 ranking represent or has anything to do with my skill ? lol ... heck no :lol: , probably half of my points are earned by benching some old unpopular hardware , that obviously doesn't make me a great overclocker it just means that by benching A LOT you earn LOTS of points ... - now whats the point in doing that i ask myself ? :confused: Except getting your team lots of point of course :) I can hardly find it fun and challenging anymore , it's becoming all about of how much hardware you can bench ... sadly but thats true :(
Please make this competition be fair and more about skill then numbers of hardware somebody can bench . Something like HeavyH20 suggested will do the trick :


I contend that if there are not at least 20 posted results in a given category, there should be no hardware points granted.


define a new minimum with .5 points to start instead of 2. So, a single unchallenged result is .5 points instead of 2. Simple and reasonable

Some funy facts about cheap and easy point "whoring" i earned that way with stock clocks hardware :) :

Stock Celeron 2.0 @ 2.0GHz : +9.2 points
Stock Willamette 1.5 @ 1.5GHz : +12.4 points
Stock NorthWood 2.4 @ 2.4GHz : +8.8 points
Stock Core Duo T2050 1.6 @ 1.6GHz : +12.0 points

I am sorry ... but thats not a fair and challenging system , IMHO

An theres lots of Pentium1/2/3's , Durons , Old socket 370 Celerons out there - i guess i just should get and bench em all :rolleyes: and get another +300-400 points :lol: I won't do that ... but somebody will , just wait and see ;) Give it another 4-6 months and somebody will outscore both K|ngp|n and Onepagebook ... Hipro and Overklokk just by benching huge numbers of old , stock clock hardware ... just wait and see :fact:

mrlobber
02-23-2007, 12:30 AM
Andy, to answer your question in bold - the product of hwbot popularity of a particular piece of hardware and your benchmark score (hwboints = popularity x benchmark score -> your skills) is the measure of your benching skills on hwbot.

Of course, at the moment the result is skewed because of hwbot age, however, it must be obvious that with hwbot becoming older and gaining more popularity, the skewness is going to diminish rapidly.

I bet, in a year people with new powerful cpus will be able to smash some present single 8800GTX (at that time already a mid range card) records in 3dmark03, for instance - but there will be many of them while the top guys will be benching 9900GTX Ultra or whatever already. But then the 8800GTX class will again have some solid competition, and the best of the midrange bencher crowd will get some solid point reward for their effort.

I know, you probably are the best low-end video card bencher in the world and want to see your past effort evaluated accordingly, however, if you do not like the present system, you must come out with a good proposition how to improve the scoring system so that it reflected the things you want to see in. Maybe I've missed it, but I've not seen any from you :confused: And the point is, the line must be drawn somewhere - at the moment, the line is the beginning of hwbot when hardware results started coming in. Without such a line, one might start saying, "I got 100 mhz out of a 386 cpu and it took me a week to hardmod the motherboard with my own PLL's and so on back in 1989" - how would you be able to compare his and your effort with a unified algorithm but without the weight of popularity and time factors involved?

Personally me, I'm trying to have a mix of both hardware points as try to competing in some newer hardware segments so I find the current scoring system pretty adequate.

HeavyH20
02-23-2007, 06:58 AM
Well, a very novel approach to the hardware points issue has been posted on HWBOT's forum. Instead of arbitrary points being awarded for being in the Top 5 hardware ranking, you get scaled percentage based on an average score. But, I would say that may skew things a tad. It is better to base the points on a median versus an average as the median will become the effective zero point system. From this median, which is quite easy to determine, the points increase upward to the highest score. This would also address benching stock hardware or submitting single results with NO competition since stock IS the median and a single result represents the median. Sounds like a very good method of control AND improved reward to me. This would also adress Zakelwe's issue. If he submitted a 5700U record that was 150% faster than the median 5700U result, he would get rewarded accordingly. Now, if his record was 102% faster than the median, then it would be a simple average score, and also be rewarded appropriately.

Eldonko
02-23-2007, 07:36 AM
Stock Celeron 2.0 @ 2.0GHz : +9.2 points
Stock Willamette 1.5 @ 1.5GHz : +12.4 points
Stock NorthWood 2.4 @ 2.4GHz : +8.8 points
Stock Core Duo T2050 1.6 @ 1.6GHz : +12.0 points That is a perfect example. Running a bunch of old hardware @ stock and beating someone that benches their ass off with newer hardware in hwboints should not be the way it works. :nono: Thus, HeavyH20's idea of minimum 20 results per hardware to get points should fix up the system alot.

I also agree with riptide. All of the old hardware will eventually disappear and todays hardware will become the old stuff over time. Then there will already be results, eliminating the problem totally.


Well, a very novel approach to the hardware points issue has been posted on HWBOT's forum. Instead of arbitrary points being awarded for being in the Top 5 hardware ranking, you get scaled precentage based on an average score. But, I would say that may skew things a tad. It is better to base the points on a median versus an average as the median will become the effective zero point system. From this median, which is quite easy to determine, the points increase upward to the highest score. This would also address benching stock hardware or submitting single results with NO competition since stock IS the median and a single result represents the median. Sounds like a very good method of control AND improved reward to me. This would also adress Zakalwe's issue. If he submitted a 5700U record that was 150% faster than the median 5700U result, he would get rewarded accordingly. Now, if his record was 102% faster than the median, then it would be a simple average score, and also be rewarded appropriately.:clap: :fact:

mrlobber
02-23-2007, 09:24 AM
I like the idea HeavyH20 just described as well :)

Just one question: if somebody submitted a single result for a single hardware combination nobody else has benched so far, and not on stock clocks but already under DI for example, hwbot wouldn't be able to determine the difference, would it?

HeavyH20
02-23-2007, 09:29 AM
Not initially, but as the results for the same hardware came in, the merit of the result would be rewarded.

HeavyH20
02-24-2007, 08:42 AM
Your work is justified if there others to compare. Other than that, it is a simple shot in the dark. No one notices. And, with a median driven reward system, four stock scores and one OC'd score (like Andy's) would get ALL the hardware points available for that category. NONE of the stock scores would get any points. That way it is a reward system for the OC, not the stock runs. It is , however, dependent on someone showing up for the race, however. So, a if you bench a 6200 and it is the one and only result, then that would not yield any hardware points. But, if a few others show up with a 6200 result, the points begin to rise as the median is defined.

mtzki
02-24-2007, 09:24 AM
It's just impossible to make the system conform absolutely to the idea of how hard some result has been to achieve. How could we even say that, theoretically speaking? We all have trouble understanding the level of other's results we see unless we have tried to bench the same hw ourselves. And even after that it's often just a vague feeling. But who knows, maybe fifty years from now we have an army of bots (hwbots right...this must have been Frederik's idea all the time...) relentlessly benching every single hw setup out there and then measuring the difficulty of result levels with some AI algorithm...:p:

What we can do is providing a well-working and fun competition. The subjective notion of "benching skill" just gets too much attention often. I'm afraid zakelwe's way of concentrating to take a few midclass cards to their absolute limits just can't be the way to rule hwbot's overall competition. And the comments here about old hw getting low boints atm being mainly due to the bot's age are right of course. :)

btw that 10 boints for 7900 GS SLI included global part (it was the highest score zakelwe had in the bot then). 5th hw place could not have given that much, the max for it atm is 2 to be more precise.

wittekakker
02-24-2007, 10:09 AM
G70 in 3D Mark01 record for example. 1 Year ago we had the 50k battle between KP and Hipro5. At this very moment G70's have not much trouble to get +50k with the help of a Conroe CPU.

In 5 years, current top 8800GTX 3D Mark scores will be much easier to reach because of influence of the total system specs. So... in fact it's a bit funny to reach out points in each cathegorie of 3D accelerators in benchmarks that are NOT thrue 3D accelerator benchmarks.

mtzki
02-24-2007, 10:19 AM
G70 in 3D Mark01 record for example. 1 Year ago we had the 50k battle between KP and Hipro5. At this very moment G70's have not much trouble to get +50k with the help of a Conroe CPU.

In 5 years, current top 8800GTX 3D Mark scores will be much easier to reach because of influence of the total system specs. So... in fact it's a bit funny to reach out points in each cathegorie of 3D accelerators in benchmarks that are NOT thrue 3D accelerator benchmarks.
True. One reason why the popularity weights must be limited to reasonable levels. And global boints must remain relatively high compared to hw boints.

massman
02-24-2007, 10:31 AM
G70 in 3D Mark01 record for example. 1 Year ago we had the 50k battle between KP and Hipro5. At this very moment G70's have not much trouble to get +50k with the help of a Conroe CPU.

In 5 years, current top 8800GTX 3D Mark scores will be much easier to reach because of influence of the total system specs. So... in fact it's a bit funny to reach out points in each cathegorie of 3D accelerators in benchmarks that are NOT thrue 3D accelerator benchmarks.

The only, impossible to implent, solution is to work with time/avg based points. So a score would gain points based on the date it was published compared to the date the card was available and the average score.

But that's just impossible ... :(

That's why, like mtzki already mentioned, we need to keep the hardware points pretty low.

wittekakker
02-24-2007, 10:47 AM
Why would you want to do that?
Compare C2D vs Northwood/Prescott, it's a joke when it comes down to performance.

mtzki
02-24-2007, 11:27 AM
Comments on new rev2 situation:
http://www.hwbot.org/forums/viewtopic.php?pid=5989#p5989

Top40 idea dumped.


separating the platforms is not an option?

Intel /AMD
There will probably be some lists later showing some rankings for these. No affect for boints tho.

mtzki
02-25-2007, 06:59 AM
LOL this guy is really to fun for words :rolleyes:

"so, all but one team is fine with the changes and you decide not to do them. Hmmm....................That doesn't add up............................"

As where it stands about the changes atm opinions-wise, it's apparent XS is strictly (very, if i may say...) against it, OCX strictly for it...and for both of these it's pretty easy to see how come.


aaargh :slapass:
You're taking words from two different guys there, lower part mine. What are you exactly trying to say?? :confused:

mtzki
02-26-2007, 08:51 AM
tbh i knew what you meant...just that i bet to most people (who have not been at the bot forum) your message must have looked like an attack to my direction...

mtzki
02-26-2007, 09:07 AM
You guys have al the right to do with it whatever you want, and i think you will notice teh effect and see yourself when it's wrong eh :)
Yah, sure looks like that. :D

zakelwe
03-01-2007, 01:44 PM
So, in summary, the Orb has hwbot licked when it comes to anything apart from the top guys or the most popular hardware benched.

Until you guys do per class results it will always be playing second fiddle to the Orb for 99% of video cards benched.

Why buy a hamburger when you can get a sirloin steak? :confused:

Regards

Andy

wittekakker
03-01-2007, 02:04 PM
A reminder, ORB is nothing like hwbot.

zakelwe
03-01-2007, 02:08 PM
A reminder, ORB is nothing like hwbot.

Exactly right, that is where Hwbot is going wrong. Thanks for backing me up on this significant point :toast:

Regards

Andy

G H Z
03-01-2007, 08:02 PM
The only difference is the boints. ORB has HOF, hwbot has HOF. You don't get in the HOF at either place with a 7600GS. Like the ORB you can sort results in any card or processor category and find out who's on top. In fact, hwbot has more complete and accurate processor listings than the ORB.

jmke
03-02-2007, 04:20 AM
very correct G H Z:) HWBot encourages competition no matter what hardware you have, overclock it, post results, get points, more points if you bench with popular hardware, less if you bench with unknown hardware.

read through mrlobber's post very attentively because that is how the HWBot is designed to work, collect results from different hardware and offer rewards for getting high scores in popular categories as well as global awards when you look overall.

Witte is completely right, ORB is nothing like HWBot, and thank god for that :)

imho HWBot > ORB.

I benched ORB 5 years ago and it lost its appeal over time, HWBot made me bench again. A different approach will which be more rewarding over time.

jmke
03-02-2007, 04:35 AM
last time I checked, XS = English.

HWbot can become most user friendly app in the world for hardcore benchers as it is made and supported by them. There are plenty of ways to do this but it needs investement as nothing comes for free; Richbastard's spare time is not for us to abuse and he's not getting paid a single dime from all this, he's running in RED every month for hosting costs and covering for the dedicated server hardware.

He has the capabilities to write a HWBot bench program, you link benchmarks to the HWbot program (which can potentially check for cheating tools) , you start the benchy through HWBot proggie, after benchmark the results are shown in the HWbot program and you can upload them with a single click to the website.

WPrime32 is but a very early sample of what's possible. but it takes time, and time=money. Without money... no time:)

jmke
03-02-2007, 07:16 AM
then why is sherlocky writing in Dutch ;)

server hardware €2000+
hosting costs/month €200-300

massman
03-02-2007, 09:02 AM
So, in summary, the Orb has hwbot licked when it comes to anything apart from the top guys or the most popular hardware benched.

Until you guys do per class results it will always be playing second fiddle to the Orb for 99% of video cards benched.

Why buy a hamburger when you can get a sirloin steak? :confused:

Regards

Andy

Rather dissappointing reply, Andy, not really what I'm used to see of you. I guess your just fed up with the lack of points you get for you old benchmark results?.

luihed
03-03-2007, 08:28 PM
Just joined Hwbot today and I think its pretty cool..... Cheers to the people behind it :toast:

The only thing so far that I see wrong is how people can submit a 3dmark score without a valid link.... Ive probably seen more screenies as validation than actual compare link in the categories that I was in..... on one of them it was just a score with and Generic VGA in the description..... I checked the orb and they werent even submitted....

My suggestion is to make the "compare url" to futuremark as the only way to validate a score.... I just find it very weird for somebody to get a score worth saving as a screenie but not submit it....

HeavyH20
03-03-2007, 09:57 PM
Well, with the free version, you can only have one published result. Makes it hard for those who do not have the professional version.

jmke
03-04-2007, 12:17 AM
exactly HeavyH20, if FM changes that policy we can "force" link validation

luihed
03-04-2007, 04:46 PM
Well, with the free version, you can only have one published result. Makes it hard for those who do not have the professional version.

Lol.... so much for that idea... I have around 60 pages of 3dmark01 benches in my project manager, I think I benched 3dm03 around 5 times, 3dm05 3 times and 06 twice.... Im still living in the past and just bench 3dm01 lol....

jmke
03-08-2007, 12:54 PM
JFYI I'm not spamming this thread, posts were deleted between my replies;)

S_A_V
03-09-2007, 08:24 AM
My suggestion is to make the "compare url" to futuremark as the only way to validate a score....
Totally agreed.


Well, with the free version, you can only have one published result. Makes it hard for those who do not have the professional version.
You absolutely don't need professional version to publish as many scores as you wish. Anyone can create as many free ORB-account as needed and use one account per one benched videocard. Today I have ~25 free ORB-accounts and can keep published almost all of my scores online in the same time.


if FM changes that policy we can "force" link validation
Forcing link validation at hwbot would be great improvement - with or without any changes in futuremark policy.

jmke
03-09-2007, 08:30 AM
I don't think it was FM goal for people to create Nth accounts ;)