New scoring system being beta tested [Archive]

RickH

11-01-2006, 04:29 PM

According to this announcement (http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=9545) on the WCG forums, WCG has begun beta testing a new scoring algorithm which is intended to improve scoring fairness and punish attempted cheating.

Instead of throwing out the high and low scores of the 3-result quorum and granting the middle score to all 3 members, the new method will discard any scores which appear to be statistical outliers (method for determining "outliers" not described), and average the remaining scores.

Now, those still using the optimized clients, pay attention: furthermore, any claimed score which is significantly higher than the awarded credits (again, "significantly higher" not defined) will be granted only half of the computed credit award.

This is all anyone knows at this point, so if you have any further questions, they should probably go to the WCG forum.

I expect that the exact details of how "outliers" are determined, and how attempted cheats are defined, will be fine-tuned during the beta testing. Beta testers should pay close attention to how these new beta units are scored, and be ready to point out any problems they see in the WCG beta tester forum to make sure this new system works correctly. The test has already started; I've got a couple of the beta WUs queued up now.

[XC] flat-four

11-01-2006, 04:43 PM

I'm sure they will be picking a random sample out of say 25 different WU's(or run the same WU on 200 different machines), and then figuring out the average standard deviation for the different results. If a result is say 2 standard deviation units away from the average mean, it will assume that the WU was done by cheating. There will always be a certain amount of variation, but statistically it should be within a certain number.

[XC]Atomicpineapple

11-01-2006, 04:49 PM

I dont see this working well with our high specced, overclocked to the max systems.

[XC] flat-four

11-01-2006, 05:07 PM

I think it should work out pretty well. The given points between a highly overclocked c2d or a64 should be within the standard deviation limit. The older axp's and p4's will be on the lower limit and we'll be on the upper limit. If the new points system is similar to what I've said, then we'll actually get the points that we claim.

[XC] mysticmerlin

11-01-2006, 05:13 PM

It would be nice to get credit for what we do.

Yoxxy

11-01-2006, 05:34 PM

Seems fair, but I don't really understand the whole program. I have tons and tons of pending validation links and such. I understand why I just think the whole quorom and scoring validation needs to be reworked.

RickH

11-01-2006, 06:00 PM

The problem with all these theoretical statistical measures in WCG is that all of the WCG projects so far have had very non-linear, unpredictable completion times. It's not simple like SETI; the time required to crunch varies a lot depending on exactly what the science app finds as it's crunching that particular WU. You can't just say that because these other 100 HDC units took about 2 hours, this next one should too. My BoincView history currently shows HDC units taking anywhere from 26 minutes to 3 hours. Earlier HDC units were generally around 1 hour, with a few shorter or longer ones mixed in, but recent WUs have been much longer, in the 2-3 hour range. FAAH WUs seem more consistent than that, but back when we were doing HPF, they varied widely as well.

I don't really see how WCG is planning on reliably determining outliers in this situation, with a quorum of only 3 results for the WU in question, and limited ability to compare against other WUs. I'm crunching a couple of the beta WUs now and will be watching closely how they end up being scored.

[XC] flat-four

11-01-2006, 06:02 PM

Is it right that every WU is only sent to 3 hosts to complete?

RickH

11-01-2006, 06:11 PM

Yep, unless there's a problem. They used to send a WU to 4 hosts and score it once the first 3 results came in (the 4th copy was just "insurance"). To reduce wasted crunching and improve overall throughput, a couple months ago they changed it to initially send to the minimum required 3 hosts and only send to a 4th (or 5th...) as needed to make up for failures.

rob725

11-01-2006, 06:31 PM

The score is determined by multiplying your benchmark factor by the time it takes, with the goal that different machines will score about the same. So as long as their system is loose enough to account for the shortcomings in their benchmarking, what they propose should be fine.

RickH

11-01-2006, 11:20 PM

Yes, thanks, I know that. My point was that even on the same machine, there's no way to predict in advance how long a particular WU should take, and therefore, no way to predict what the "correct" score is, and what should be considered outliers. My HDC WUs, for example, have legitimately earned anywhere from 10 to 70 BOINC points, and WCG doesn't know when it sends them out, which it will be. You can't just say that anything from 0-90 points, or whatever, is valid and the rest are outliers; that's useless, and won't catch any true bad results except those on the very longest WUs.

The only way I know of to judge the correct score for a WU is to crunch it and see what you get. With a quorum of only 3, it's hard to tell which are the legitimate scores and which are outliers, if any. If a WU quorum returns claims of 10, 20, and 22 points, it could be that the 10 is a bad score (maybe a Linux box) and the correct average is 21. Or it could also be that 10 is the correct score, and the 20 and 22 scores come from optimized clients with inflated claims. How can you tell? What do you do if the claims are 10, 20, and 30? What if two Linux boxes end up in the same quorum and you get claims of 9, 11, and 21? Does the correct 21 claimer get penalized for cheating and receive only 5 credits (half of the average 10 of the two non-outliers), while simultaneously shortchanging the Linux boxen just because two of them ended up together?

I just don't see how they can reliably identify outliers in a sample population of 3, or how they can usefully compare different WU results to try to get a larger sample population to derive stats from. But the beta is just starting, so hopefully we'll find out soon.

[XC]melymel

11-01-2006, 11:22 PM

How will this affect P4's? Opti's will cause it to class as a cheat but if your granted what the boinc benchmarks dictate then P4's will get a thorough beating, that was the whole reason we/I used opti clients in the first place, th quorom would actually be fairer :confused:.

Can anyone shed some light? :toast:

RickH

11-01-2006, 11:32 PM

If the optimized client just raises your claims up into the same range as other hosts, it won't register as cheating and things will continue as before. If it raises your claims significantly above other hosts, then it really is cheating and will likely be caught and penalized as such.

If you quit using the optimized client and your claims are abnormally low, it should have pretty much the same effect as the previous system. Your lowball claim would be considered an outlier and discarded, and then everyone would get the average of the remaining good claims.

Unless two hosts with abnormally low claims end up in the same quorum, in which case the poor third host with the correct claim might get tagged for cheating, while the two lowball claims get averaged together.

Which is why so much depends on how WCG identifies "outliers" in this new system.

Martijn

11-01-2006, 11:50 PM

If this is what's going to happen, I'm out. I hate 30-point claims for FAAH WUs taking 5 hours :mad:. This will be a complete waste of the D820. :slapass:

I have some more useful stuff to do :rolleyes:

RickH

11-02-2006, 12:06 AM

It shouldn't really be any different than before. Just like before, if 2 lowball claims end up in the same quorum, everybody gets shorted, but in the great majority of cases the lowball claim just gets thrown out and all 3 get normal credit.

The only real impact should be on those trying to use the optimized clients to cheat and claim significantly higher credit than the WU is really worth. Before, the inflated high claims would usually be thrown out and everyone would get normal credit, but sometimes the cheater would get lucky and end up in a quorum with another cheater, and then everyone would get inflated credit.

With the new system, cheaters would usually get half credit, and then sometimes get lucky and get inflated credit, but probably not enough to make up for getting half credit most of the time.

I'm generally in favor of cheaters getting spanked, as long as honest users don't get smacked around at the same time.

Movieman

11-02-2006, 12:31 AM

I'd suggest that we look at this for at least a few days-week to see what the effects really are. If everyone is using the stock client, then there shouldn't be an issue. If your not, then I'd really appreciate it if you were on the stock client, as not only is it overclaiming, it effects any internal competitions that we have here between us.
Persoanlly, I don't care what method they use as long as it's fair to all.
I use the stock 5.4.11 and don't notice any problems.

I dont see this working well with our high specced, overclocked to the max systems.
Am I missing something? Our machinery should be computing faster as I see it, not claiming more per work unit. It's the fact that we can do more work units in a given time than the average Joe that gives us our advantage.
IE: Joe Dell does four 6 hour work units a day, claimes 400 points each, gets 1600 points a day.
RAMMIE's frozen FX-57 does twelve 6 hour work units a day, claimes the same 400 points per unit, gets 4800 points a day..
Is my logic correct?
(can't resist this one:D )
Victor takes his box of new parts, calls Jin117, has a 3 stage shipped in, freezes that 6700 to -200C, cranks it up to 6000mhz, does 60 WU a day, claimes the same 400 points per WU and gets 24,000 points a day!:ROTF:
Sorry, had to do that!

[XC] Adywebb

11-02-2006, 07:33 AM

Movieman has it figured exactly right - no-one using the standard boinc client will see any difference in their points :thumbsup:

The good news is that Linux users should see a nice increase to Windows parity :cool:

EDIT: just had a report in on the beta testing from one of PCReviews members:

BTW the new system looking at the beta units has yielded more points per unit that under the old system. So far based on 5 units.

I am runnig an Intel 805 o/c to 3.5 and an Intel 915 stock.

Will review tonight what the difference is on 15 units, but so far nothing to complain about.

[XC]Atomicpineapple

11-02-2006, 07:48 AM

The point I was trying to make is that it depends how tight they set their outlier tolerances. If they get it wrong its possible that our high benchmarking machines may get singled out as cheats when they are merely guilty of stupendous speed!

Got Chow

11-02-2006, 08:14 AM

Before everyone goes burning down on WCG, remember that they are pretty darn good at working out issues that teams or individuals have. When the day comes, and people are having issues, then just e-mail them and they will surely tweak things around to make things right.

A good example was a member had some sort of a super computer, and in a few hours the computer crunched his daily quota of WUs and stopped sending that PC anything to crunch. So WCG recognized that his results were good, and that he had an especially fast machine, then manually raised his quota to 120 WUs per day.

Plus, it's just points. :P;)

[XC]Atomicpineapple

11-02-2006, 08:39 AM

True, true,but after recent events its important that the 'c word' never even comes close to XS. After recent incidents you never know what the BS masters would cook up if we contacted WCG and said 'um look, change your system because our legit machines are identified as cheats'. Perhaps theyd concoct some make believe fairy tale where the WCG staff have a secret pact to bring down Easynews by fiddling the points system. Of course no-one would take them seriously, just just having to see that c**p happening would be a drag.

Got Chow

11-02-2006, 09:18 AM

No I understand... I'm not part of the XS team, but I've seen enough on here to realize a lot of crap that you guys get. I typically don't put that much faith in large projects or support staff and such, but WCG is one of those companies where I really believe they do take care of their supporters, which is why I'd say you guys shouldn't worry about it. :)

Jose

11-02-2006, 09:55 AM

Please remeber that WCG does a good job of iding the hardware that is behind the crunching and the OS. So, the outliners will be basically defined not asa a whole but as compared to similar configs. And yes, more than 2 standard deviations from the mean is probably the standard thaey will use.

[XC]Atomicpineapple

11-02-2006, 10:06 AM

Jose, so youre saying they'll have an E6600 @3GHz predicted score range, and then, say, an Opty 170 @2.8GHz predicted range. Look at your hardware to see which category youre in, then judge your scores and apply outliers accordingly? If thats the case my worries are over and all is well.

RickH

11-02-2006, 10:21 AM

If they get it wrong its possible that our high benchmarking machines may get singled out as cheats when they are merely guilty of stupendous speed!
Just being fast won't cause any problems. All they're looking at is how many credits you try to claim for each WU, not how fast you crunched them. If the BOINC benchmarks are working right, then no matter how fast you crunched, you should still end up trying to claim about the same number of points as your other result quorum members.

Please remeber that WCG does a good job of iding the hardware that is behind the crunching and the OS. So, the outliners will be basically defined not asa a whole but as compared to similar configs. And yes, more than 2 standard deviations from the mean is probably the standard thaey will use.
As I said above, each WU is unique, and can't be usefully compared with results for other WUs. All HDC WUs, for example, start with the same "amount" of specified work, and yet are legitimately worth anywhere from 10 to 70 points. So they're stuck trying to compute stats from only 3 data points (the results for this particular WU), not stats from all WUs.

The most they might be able to do is compute the claimed credits per hour for that particular config to see if they look bogus, but even then, they only ID the CPU type, not the actual (overclocked) speed. You can't compare results from my 3.5GHz E6400 with those from stock E6400s, or all the other possible speeds between and above.

uOpt

11-02-2006, 12:08 PM

I don't understand. They just mix three machines and average them out? What happens to my 3.6 GHz Conroe?

[XC] Malkec

11-02-2006, 12:14 PM

I don't understand. They just mix three machines and average them out? What happens to my 3.6 GHz Conroe?

It's increasing the average :D

an0nym0us

11-02-2006, 12:38 PM

I dont see this working well with our high specced, overclocked to the max systems.
cause we're XTREME!!!!!!!!!!!!!!! someone pass me a mountain dew and hop in my jacked up Wrangler! :banana:

sierra_bound

11-02-2006, 12:39 PM

I will support any new scoring system that doesn't punish Linux users.:)

Fr3ak

11-02-2006, 12:54 PM

On a sidenote, how does WCG know what cpu someone is using anyway?
I cant even tell what pcs are on my account and what benches they get.
The current credit system is way from being fair for pcs that are clocked way over average and I have the feeling the new system is not really any better..

All those science project staffs seem to know a hell of a lot when it comes to their field, but at computer science they simply suck.

[XC]Atomicpineapple

11-02-2006, 01:00 PM

Really the points system that would suit us best is to take the arithmetic CPU bench from Sandra (as theyre generally recognised as being correct), chuck that in BOINC, and then say that higher benchmarking systems get more points. Alternatively just say that all HDC WUs are worth X points, all FAAH are worth Y points, and then the machine that chucks results out the fastest gets the most points, simple and fair, but it wont happen.

[XC] Adywebb

11-02-2006, 01:34 PM

Alternatively just say that all HDC WUs are worth X points, all FAAH are worth Y points, and then the machine that chucks results out the fastest gets the most points, simple and fair, but it wont happen.
Unfortunately you can't do that because they vary in length even amongst the same type.

Martijn

11-02-2006, 01:51 PM

Unfortunately you can't do that because they vary in length even amongst the same type.
Everything can be solved with forumulas :fact:

;)

Fr3ak

11-02-2006, 02:21 PM

Without thinking too much about it, maybe 2minutes, why dont they do something like this.

Instead of giving scores based on whatever, boinc benchmark and the like, they could use a dummy WU.
That dummy WU is being computed by a ordinary PC, ie a AXP 2000+ and say it needs 15mins to compute it and it would get x credits for it.
Now a C2D only needs 7mins 30 secs, so it gets 2x credits.
A p3 500 needs an hour for it, so it only gets 1/4x credits.

That way of a benchmark would be already better than the inbuild boinc one.
Now in case they want to use a quorum of 3, whatever a quorum is, they could add all 3 credits and divide it by 3, so everyone gets the average credits.

I dont see how anyone could cheat that way, but in case someone is able to, they would need to know at least what cpu someone has and what mhz it runs at to compare it to the database.

I am aware of WUs not being comparable in complexity, but I am sure with a kinda benchmark mentioned above, there is a way to grant credits according to cpu time used.

[XC] Adywebb

11-02-2006, 02:32 PM

Everything can be solved with forumulas :fact:

;)
Very true - but that wasn't the issue I was answering, I was merely stating why you can't have every HDC or FAAH unit worth the same amount of points ;)

Jose

11-02-2006, 02:53 PM

People . Let us Chill

Please allow the people at WCG do their beta testing. And please let's not start trying to outguess what they will do and are doing.

We are running the risk of overheating and be ripe for over reactions that will help no one. We are getting close to the hysteria that preceded the Rosetta Debacle.

That said:

There is sufficient data collected to do cluster analysis. And that includes a good idea of the configurations of the machines we run, the types of wus we have done , etc. When we sign up, we id our computers to their servers.

Cluster analysis can yield a good estimate of the credits that have been granted to relatively similar machines doing relatively similar work. It is from those estimates that the allotment of credits (a range) can be selected.

That said: Why dont we allow the WCG to inform us of what they are doing, contact them and ask questions. They have shown they are accessible and also they have shown they wont do stupid things like the people in the other project did.

The WCG people saw what happened at Rosetta: I dont think they will risk a brouhaha like that one.

Movieman

11-02-2006, 02:56 PM

After reading, I still think that no matter what system they use, we will be on top UNLESS they see what we claim as being "outliers" and if that does happen, we talk to them,show them what we are running, get a sample WU from them and run it to death to prove our point. This is also why it is so damned important that everyone is on the stock boinc client.
I can't stress that strongly enough.
I can make a case for anything we do but only if on a stock client.

[XC] Adywebb

11-02-2006, 03:07 PM

Jose is right - we are getting in a tiz without even knowing what the outcome is likely to be, and I'm pretty sure it won't be to the detriment of XS.

And as MM has just said, everyone needs to make sure they are using the stock client, so that if the time comes it at least gives him a chance to make a case for change if need be :up:

uOpt

11-02-2006, 03:10 PM

Jose

11-02-2006, 03:14 PM

As long as we hold to stock client settings we will not be the outliners. Please remeber that there is another issue in the credits granted in WCG and that is the quorum. That is the great equalizer.

And finally: the advantage we have in the WCG war for credits is in the speed of our machines. WE do more work per hour. Thus we get more credit. That is the reason we have been able to rise so fast in the standings and gain on monster sized teams like clubic, gay and even EasyNews. So please let us NOT panic. It is not worth it.

MAn I feel dumb

[XC] Adywebb

11-02-2006, 03:18 PM

Shouldn't that be 'please let us NOT panic' Jose :p:

Jose

11-02-2006, 03:21 PM

While it is true that they cannot take the actual WU time computation, they could take one particular WU and make that into a benchmark, using the real software. No more tinkering with the client since it's benchmark is not used.

Granted, it wastes some resources, because that WU will now be computed 80,000 times.

But sure as hell it will be better than that nonsense to take drystones and wetstones as indicator of the speed of a modern CPU (that's what BIONC does and why fiddling with the compiler is so effective) :rolleyes:

I have a farm ready but I want a working scoring system before I commit a couple hundred bucks worth of electricity.

The credit granting system at WCG is a tad more complex than the typical Boinc benchmark.

Using a single wus benchmark is indeed a waste of resources and believe me the people at WCG are tied enough by the fact that they are required to run quorum (they were able to reduce the number but still it is more than one) the wont go that way. That is why I think they would go the cluster analysis way.

That said:

Hey YOU !!!!!:slapass: :slapass: :slap: :slapass: :slap: Get that farm in to gear: XS has a lot of teams that need your help , specially this one that is facing what has top be the monster teams (in sheer size) of DC.

Crunch or your overgrown chicken will be served at Thanksgivin dinner.

Jose

11-02-2006, 03:26 PM

Shouldn't that be 'please let us NOT panic' Jose :p:

Byte me!!! :slap:

I do not painc. LOL LOL LOL

But lets say I was burned once.

Movieman

11-02-2006, 03:32 PM

While it is true that they cannot take the actual WU time computation, they could take one particular WU and make that into a benchmark, using the real software. No more tinkering with the client since it's benchmark is not used.

Granted, it wastes some resources, because that WU will now be computed 80,000 times.

But sure as hell it will be better than that nonsense to take drystones and wetstones as indicator of the speed of a modern CPU (that's what BIONC does and why fiddling with the compiler is so effective) :rolleyes:

I have a farm ready but I want a working scoring system before I commit a couple hundred bucks worth of electricity.
I agree on this. I hate the BOINC method. Something work based and I'll put my machine up against most(ok, a little ego left in my old age:D )
I think they are trying just what you suggested with sending out that beta work unit 800 times, what does concern me is with a 1/4 million users it never hit a machine at XS. IF that happens, I will ask them to send us one that we can run here on all the machines. That shouldn't be a big problem.

Byte me!!! :slap:

I do not painc. LOL LOL LOL

But lets say I was burned once.

Jose-->:flame: <--Baker...............:ROTF:
Sorry, My Friend, couldn't resist.
Just remember, this isn't Rosetta and we learned a lot there.;)

uOpt

11-02-2006, 04:11 PM

Just so that I get that right: I am not supposed to use an "optimized" BOINC?

Movieman

11-02-2006, 04:27 PM

Just so that I get that right: I am not supposed to use an "optimized" BOINC?
Correct..Use the stock 5.4.11 or 5.4.9 and I think there is a newer 5.6 thats a stock client. Keeps us all on a level playing field.

joshd

11-02-2006, 04:27 PM

Just so that I get that right: I am not supposed to use an "optimized" BOINC?

nay, MM wants us all on stock, so there is no way people can call us cheaters.

EDIT: ^^^ exactly, MM

Movieman

11-02-2006, 04:50 PM

nay, MM wants us all on stock, so there is no way people can call us cheaters.

EDIT: ^^^ exactly, MM
Amen to that! I'd rather give all my points to anyone than be called a cheat.
That gets my Irish up.:D

phicks

11-02-2006, 05:07 PM

I'm not even on the team and I switched to stock when Movieman requested you guys to. :D

-Acid-

11-02-2006, 05:10 PM

we are keeping it simple so the haters have nothing to cry about and have to eat humble pie when we take 1st

leviathon-nz

11-02-2006, 05:19 PM

Ahh pie, so tasty:D but back to the point I dont see this as being anything to worry about really, as long as things are kept fair across the board they can change as much or as little as they want.

[XC] mysticmerlin

11-02-2006, 05:25 PM

nay, MM wants us all on stock, so there is no way people can call us cheaters.

We as a team not just MM the :cheer: needs to do it so when / if the fesses hits the air conditioner we have a valid leg to stand on.

Movieman

11-02-2006, 05:27 PM

I'm not even on the team and I switched to stock when Movieman requested you guys to. :D
And here I was thinking it was my irresistable charm,devilish looks and poetic prose that convinced you! :ROTF:
Oh well, I'll try harder next time.;)

We as a team not just MM the :cheer: needs to do it so when / if the fesses hits the air conditioner we have a valid leg to stand on.
I'll add some to that: Other benefits: I just plain FEEL better doing it this way.
The points are fun, but thats all they are.
Honor,Integrity, and Trust mean so much more to me.
Think on this: If your trying to catch me in WCG you can be damned sure that you and I are on equal footing and all is fair. Forget the rest of the DC world, this has to do with honor amongst ourselves.
I'd hate to think I'm trying to catch someone on this team and was handicapped because they were using a modified version of BOINC.
I have enough of a damned handicap just being on the Xeons with you guys on your overclocked conroes!
But not for long!:D

-Acid-

11-02-2006, 05:30 PM

lol your wern't trying lolololol :P

phicks

11-02-2006, 05:32 PM

And here I was thinking it was my irresistable charm,devilish looks and poetic prose that convinced you! :ROTF:
Oh well, I'll try harder next time.;)

Well maybe it was. :D

Movieman

11-02-2006, 05:40 PM

Well maybe it was. :D
Aww shucks Missus, now I'm all flushed and golly gee embarrassed!:D

phicks

11-02-2006, 05:48 PM

Aww shucks Missus, now I'm all flushed and golly gee embarrassed!:D

:D Well that was easy. :p:

[XC] leviathan18

11-02-2006, 05:53 PM

toby go back to the kitchen bad toby

uOpt

11-02-2006, 06:01 PM

nay, MM wants us all on stock, so there is no way people can call us cheaters.

Cheaters?

It is WCG (and Rosetta) who chose to do the braindead thing of using a different application's benchmark (actually a non-benchmark) instead of doing their own measurements on their own code. And now we are cheaters?

I will under no circumstances work on a project that discriminates Linux (or FreeBSD for that matter). Anybody has the numbers ready? What are these clients' 32 and 64 bit performance numbers on the different OSes?

Fr3ak

11-02-2006, 09:49 PM

Movieman

11-02-2006, 09:55 PM

Dont be too mad about the client issue.
There is no real 64bit client to begin with.
Furthermore a higher benchmark is somehow useless, as the higgest and lowest credits are put into the trash and everyone gets the middle credits with that quorum system. So unless you are lucky and the wu you just computed is being computed by 2 other guys with opti clients and high end rigs, you wont benefit at all from it.
The only reason to run a faster than average PC, is that it can compute more wus in the same time, so it gets more credits.
But I still dont like that kind of scoring.
I would be running a lot more PCs, but I dont feel like its worth the money at this point.
The scoring system isn't exactly my cup of tea either.
Some day one of these decent apps will invest the time and money and make an Intel optimised app,one for amd,one for windows,linux,etc where you choose the one that fits what you own and they (if they have a good app) will have people running at them to sign up.
Bow we wake up and look at reality, choose the best app that does the best research and we live with the scoring system they have.
This isn't perfect, but it can't be too bad cause look at where we are after what? 2 months? Against teams that have been here how long?
I don't see an issue.

Fr3ak

11-03-2006, 05:41 AM

Apart from complaining, one cant do anything against it anyway.
Sure we have to live with it, but I like things to be perfect.
Of course it will never be perfect.

The scoring system is one thing, the implementation of latest SSE instructions is another thing, which would help computing more WUs as well, so it would be good for science.
There is a lot of computing power going down the drain(?), that could be used to find a cure sooner.

rob725

11-03-2006, 06:18 AM

Agreed; I would much rather see their programming resources being used for getting the most out of the machines they have crunching, than for scoring-tweaks and screensavers.

brot

11-03-2006, 06:38 AM

the problem is that for example the faah workunits are processed with autodock (see http://autodock.scripps.edu/ ) so the guys @ wcg cant really modify the files. But i read that there will be a new autodock version soon. hooray ;)

uOpt

11-03-2006, 07:23 AM

Fr3ak

11-03-2006, 09:21 AM

From what I understood the middle result.
So the only advantage running a faster than average PC is that it needs less time to compute a WU.

Movieman

11-03-2006, 09:37 AM

So it only leaves you with the standard 32 bit client which in Linux gives substantially less benchmark numbers than Windows, if I understand that correctly.

I still didn't get that three results thing: do they take the average or the middle result?
Correct 100%..Current version of BOINC is 5.6. With version 5.8 release they will fix the "hit" that Linux takes. Just got that from their website yesterday.

uOpt

11-03-2006, 11:23 AM

So the only advantage running a faster than average PC is that it needs less time to compute a WU.

I don't think this is correct.

AFAIK the score is the multiplication of benchmark score with CPU minutes used. I am pretty sure about that. Can somebody clarify.

So the only reason to have a faster PC is that you think there might be somebody even faster.

Movieman

11-03-2006, 11:56 AM

I don't think this is correct.

AFAIK the score is the multiplication of benchmark score with CPU minutes used. I am pretty sure about that. Can somebody clarify.

So the only reason to have a faster PC is that you think there might be somebody even faster.
Look at it this way, your monster takes 2 hours to do the unit and claims 100 points, the other 2 in the quorum are P3-600's that take 12 hours and one claims 90 points, one claims 95 points, everyone gets the middle score of 95 points, BUT your doing 6 WU for every one they do so in 12 hours they get 95 points and you finish 6 WU in that time and get 6X95=570 points...

rob725

11-03-2006, 11:59 AM

Isn't that how it should be? If my cpu is twice as fast, and so processes twice as many wu's in the same amount of time, shouldn't my only reward be twice as many points in the same amount of time?

Movieman

11-03-2006, 12:04 PM

Isn't that how it should be? If my cpu is twice as fast, and so processes twice as many wu's in the same amount of time, shouldn't my only reward be twice as many points in the same amount of time?
Yup!:D That simple..of course it isn't, even identical machines will claim small differences but in theory it should be that simple.
To be honest, I set it, run it and forget it. I average 40-45 points an hour due to HT and having a couple old DX2000/512/400's on my account plus using anything that hit's the house for work for a day or 2.
I check the numbers on my faster machines and I see no problems, I'm in the ballpark for claimed points versus the others, just doing the WU in a much smaller timeframe.

Fr3ak

11-03-2006, 12:29 PM

Here is what I found on the WCG page. Doesnt go into much details, unfortunately.
Sources:
http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=winpointscalc

http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=linpointscalc

How are points calculated?
Points are calculated and awarded each time a work unit is completed and a result is successfully returned to World Community Grid Servers. Points are totaled across all machines aggregated under a specific World Community Grid Member.

Points are based upon the strength of your machine(s), measured against World Community Grid Comparison Device. First, the strength of your participating machine(s) is calculated by measuring the following parameters of your machine against World Community Grid Comparison Device:

* CPU Power: The software periodically runs diagnostic tests to establish the processing power of your hardware configuration. These values are averaged and then divided by the CPU-Power value of World Community Grid Comparison Device. The averaged value is then multiplied by the run time used to complete the work unit and return the results to World Community Grid Servers.
* Random Access Memory (RAM): The software recognizes the amount of RAM in your hardware configuration. Each time the software starts, it detects any changes to the amount of installed RAM. This value is divided by the RAM value of World Community Grid Comparison Device. The result of this calculation is then multiplied by the run time used to complete the work unit and return the results to World Community Grid Servers.
* Hard Disk Storage: On your preferences page, you set the megabytes of hard disk space allocated and available to World Community Grid projects. The lesser of the amount of hard drive space allocated and the amount of total space available on your hard drive partition, is divided by the Hard Disk Storage value of World Community Grid Comparison Device. The result of this calculation is then multiplied by the run time used to complete the work unit and return the results to World Community Grid Servers.
* Effective Upstream Throughput: The software runs a diagnostic test on a regular basis that measures the upstream throughput of your hardware configuration, when communicating with World Community Grid Servers. These values are averaged, and the result is divided by the Effective Upstream Throughput value of World Community Grid Comparison Device. The result of this calculation is then multiplied by the run time used to complete the work unit and return the results to World Community Grid Servers.

The final values for all four parameters are weighted, totaled, and factored to generate a whole number of points greater than or equal to 1 for each result returned. While any individual parameter can overachieve the corresponding parameter for World Community Grid Comparison Device by any level, no work unit completed by any machine will earn more than twice the total number of points World Community Grid Comparison Device would earn for that same work unit. Note: The slightest variance in any of the five parameters coupled with the inherent differences across multiple applications and work units within one project will result in different point values being assigned per work unit completion.

How are points calculated for the Linux/Mac agent?
The Linux and Mac agents use BOINC. BOINC points are calculated in a two-step process. First, the points (also called credit) claimed by a host are determined. BOINC points are calculated based on a benchmark that is run periodically by the BOINC client. This benchmark is then run through a calculation that determines how much credit per second of run time that device should earn. More information about that formula is available at the following sites: http://boinc.berkeley.edu/credit.php and http://en.wikipedia.org/wiki/BOINC_Credit_System#Cobblestones

Second, once validation has been completed, BOINC gives the same credit for a result to every device that worked on the same work unit. BOINC calculates how much credit this should be by taking the claimed credit for each result that was determined to be valid, eliminating the low and high values and then averaging the rest.

This process eliminates the ability for malicious users to artificially claim higher points for their work.
Return to Top

Why does my Linux/Mac agent show different points than the web site?
The Linux/Mac agent, which is using BOINC, and the Windows agent, which is using the UD software, compute points differently. In particular, BOINC points are much lower than UD points. As a result, World Community Grid multiplies the points granted to a user for a result by 7 when the statistics are imported into World Community Grid’s web site. The BOINC client is not aware of this multiplication, and it thus reports the points that were granted by BOINC.

Edit: I have something to add. I am still trying to understand the meaning of it, but is sounds to me like the opposite of what Dave just said.

What are points?
Your PC contribution is shown in three measures-points, total run time and results returned. The term points is simply used as a way of measuring the amount of computation your PC has contributed. If your PC works for three days on one work unit, or in those same three days completes 5 work units, you will accumulate the same number of points assuming that your PC worked at about the same level of effort in each scenario.

There is more to read, I am still at it.
Source: http://www.worldcommunitygrid.org/help/viewTopic.do?shortName=points#26

Movieman

11-03-2006, 01:04 PM

"What are points?
Your PC contribution is shown in three measures-points, total run time and results returned. The term points is simply used as a way of measuring the amount of computation your PC has contributed. If your PC works for three days on one work unit, or in those same three days completes 5 work units, you will accumulate the same number of points assuming that your PC worked at about the same level of effort in each scenario"
The confusion here is that paragraph is talking to the "same" machine.
IE: your machine, not in comparision to another.
My point stands on comparing slow PC's to what we run, we do more WU in the same given time, we earn more points.
All you have to do is look at the WCG average across the project of 28 points per hour and then look at someone on the XS team and compare.
Off the top of my head, my machines are app 42 an hour and DDT's are like 66 an hour and someone with a few real fast machines like RAMMIE is over 100 I beleive. The logic does hold true..Trust me! I really have looked at this.:D

Fr3ak

11-03-2006, 01:09 PM

OK, thanks for pointing that out. Was a misunderstanding on my side then.

Movieman

11-03-2006, 01:17 PM

OK, thanks for pointing that out. Was a misunderstanding on my side then.
No Problem. Hey, the way that WCG has turned what should be simple into complex is what happens when you get damned bookkeepeers involved.
Weighing upload bandwidth as part of the equation? Gimme a break!
What in God's name does it matter if you send the WU back at 10mbit or on a 128k line? Nothing and it shouldn't be a factor at all.
Hard drive space? Allot a gig and thats it..No weighing it as a factor. Rediculous.:p:
I just checked, BOINC is using 167MB on my drive and I have 4 WU running so why does 1 gig or 20 gig matter?
Bottom line is I look and my DX3600 gets substantially more than my DX2400 so I know something in the formula is correct.
I'm not going to pick it apart as to me, it isn't worth the energy to do so.;)
I'd rather spend the time trying to find a deal on some clo..oops..hardware..

rob725

11-03-2006, 01:24 PM

I believe the first points explanation applies to the wcg client only, but not boinc.

Movieman

11-03-2006, 01:32 PM

I believe the first points explanation applies to the wcg client only, but not boinc.
Hey Rob! Perfect timing, whats your points per hour with the dual woodcrest showing just for comparision?
My DX3600 shows this using the last 12 days compiled:
daily average:4612 PPD
divide that by 24 and you get 192.19 points per hour

Fr3ak

11-03-2006, 01:33 PM

Ye the formula is plain stupid.
When I read that free hard drive space is a factor, I was shocked.
IBM has smart guys. Why cant one of them correct the scoring system. Doesnt even take 10mins for a genius to come up with a good formula :P

Movieman

11-03-2006, 01:36 PM

[XC] mysticmerlin

11-03-2006, 01:49 PM

I'd rather spend the time trying to find a deal on some clo..oops..hardware..

Yep I saw that :slapass: :banana:

Fr3ak

11-03-2006, 01:54 PM

They forgot to take the amount of cold cathodes into the formula :hrhr:

Or the amount of fresh unpoluted air the pc may breath. :rofl:

Movieman

11-03-2006, 01:57 PM

They forgot to take the amount of cold cathodes into the formula :hrhr:

Or the amount of fresh unpoluted air the pc may breath. :rofl:
Damn, thats why I'm not hitting 200PPH! I'm a smoker!:ROTF:

Yep I saw that :slapass: :banana:
Saw what?:rolleyes:

Fr3ak

11-03-2006, 02:22 PM

You better close the windows and lock the door Dave ;)

[XC] mysticmerlin

11-03-2006, 04:21 PM

Damn, thats why I'm not hitting 200PPH! I'm a smoker!:ROTF:

Well smoken light's don't help. I am only @ 64.3PPH :slap:

rob725

11-03-2006, 04:25 PM

Hey Rob! Perfect timing, whats your points per hour with the dual woodcrest showing just for comparision?
My DX3600 shows this using the last 12 days compiled:
daily average:4612 PPD
divide that by 24 and you get 192.19 points per hour

While that is a practical way to look at it (especially the longer the timespan used), it suffers from the whacky inconsitency of daily crunching total caused by the quorem system. I have several machines that do absolutely nothing but crunch; but some days they are given credit for doing half a days work and sometimes for a day and a half and everything in between. And this does not always average out in the short or mid term. So what I like to use for comparison is points/runtime.

So, looking at the woody chart below, the total runtime for all 4 cores is roughly 12 days 10 hours or 288 core-hours. The points produced are 23,996, so points/core-hour are 83.32 giving 333.28 per machine hour. So, on those days it's runtime equals 4 core-days, I would expect to see 7999 points. Again, it will be a little higher because I processed some wu's before upping the speed to 2.33.

Also, I'm running this one on w2k3 server. It benchmarks well below the same setup running XP, so is always under-claiming, so I expect the XP woody to score a little higher because it won't always be dragging down the quorem. It will be interesting to see how much.

Edit: Add Chart, doh.

STEvil

11-04-2006, 12:02 PM

Here is what I found on the WCG page. -snip-

oh my god. :slap: :slapass: :slap: :slapass: :slap: :slapass: :slap: :slapass: :slap: :slapass: :slap: :slapass: :nono: :mad:

Fr3ak

11-04-2006, 12:06 PM

Whats wrong with me posting about it?

Haltech

11-04-2006, 01:49 PM

Im losing approx 2000 pts a day with this new scoring...

Movieman

11-04-2006, 02:18 PM

Im losing approx 2000 pts a day with this new scoring...
I wasn't aware it was in place yet. Just that they'd sent one WU out 800 times to get a database of the different claims to analyse.
When did you start seeing the loss?

Whats wrong with me posting about it?
I don't think Stevil was talking about you, but them for using those factors to determine scores.

STEvil

11-04-2006, 03:29 PM

yes, the factors.

rob725

11-04-2006, 05:24 PM

They are still doing it the old way.

Haltech

11-05-2006, 12:06 AM

Well, my points have dropped a few thousand. I guess ill change to only cancer wu's?

Jose

11-05-2006, 05:00 AM

Well, my points have dropped a few thousand. I guess ill change to only cancer wu's?

Your points dropping is divine punishment for not whoring bandwidth you know where ....

First you points will drop then your you know what will die :eek: :eek: :eek:

PS

Dont shoot me !!!!! :rofl:

[XC] Malkec

11-05-2006, 08:30 AM

OC-lab's production dropped significantly :

3 days ago we did like 52k points with 146 results returned.
2 days ago we did only 39k points with 140 results.

What is goin on? :stick:

rob725

11-05-2006, 09:11 AM

OC-lab's production dropped significantly :

3 days ago we did like 52k points with 146 results returned.
2 days ago we did only 39k points with 140 results.

What is goin on? :stick:

Nothing to worry about; just the oddities of wcg and the quorum system. Global "production" usually drops off on the weekend; not sure why. Maybe a bunch of corporate machines are shut down for the weekend, so less quorums are completed or maybe it is just something to do with wcg server maintenence.

[XC] hipno650

11-05-2006, 10:40 AM

my points have been about 5000 per day all the time and just to days ago it was around 7500 and once they come out with the new boinc for linux my score should go up more than 500 per day.

uOpt

11-06-2006, 05:01 AM

Look at it this way, your monster takes 2 hours to do the unit and claims 100 points, the other 2 in the quorum are P3-600's that take 12 hours and one claims 90 points, one claims 95 points, everyone gets the middle score of 95 points, BUT your doing 6 WU for every one they do so in 12 hours they get 95 points and you finish 6 WU in that time and get 6X95=570 points...

I'm sorry, this is not correct. You don't get score by Wu. They cannot give score by Wu, since each Wu is different but they cannot know in advance what a fair score is.

It is true that you get points for a Wu after you are returning a Wu.

But the amount of points are not corrosponding to the number of Wus returned. So what BIONC does is run a benchmark to establish CPU speed and then the score you get is benchmark multiplied by CPU seconds spent. The time of adding to your score is when you return a Wu. But the amount of points has nothing to do with number of WUs, only CPU time spent.

So, you 3.6 GHz Conroe, taking 255 Watts, is useless unless somebody comes along with an even faster box in your group of three.

rob725

11-06-2006, 05:07 AM

For the most part it is true becasue your 3.6ghz conroe will bench higher and process wu's faster, so you'll get about the same points per wu and process more wus, thus scoring more points.

Movieman

11-06-2006, 05:08 AM

I'm sorry, this is not correct. You don't get score by Wu. They cannot give score by Wu, since each Wu is different but they cannot know in advance what a fair score is.

It is true that you get points for a Wu after you are returning a Wu.

But the amount of points are not corrosponding to the number of Wus returned. So what BIONC does is run a benchmark to establish CPU speed and then the score you get is benchmark multiplied by CPU seconds spent. The time of adding to your score is when you return a Wu. But the amount of points has nothing to do with number of WUs, only CPU time spent.

So, you 3.6 GHz Conroe, taking 255 Watts, is useless unless somebody comes along with an even faster box in your group of three.
Yes, your right. I just used the same number for each WU as an example to show that the faster machine will get more points in a given time frame.
The points on each individual WU will be different, but the logic of what I said is true.
On the last part of your statement, your right and wrong I think. Granted that you won't see the full credt claimed unless your matched up against 2 other machines identical to yours, but your machine is working the WU in a shorter timeframe and that is where you get your advantage: In time needed to process the WU, you do more of them in a given timeframe, you have to get more points per hour of time worked than a slower machine.
On wattage: Your using X amount for Y time to process the work unit. The guy with the older machine will be using less wattage per minute but will need more minutes of that given wattage to process so that may be a wash. I have not compared the elec use of say a C2D to a older P4-2000 on the same WU to see what is more efficient. My guess is the C2D should be much more efficient.

uOpt

11-06-2006, 05:42 AM

For the most part it is true becasue your 3.6ghz conroe will bench higher and process wu's faster, so you'll get about the same points per wu and process more wus, thus scoring more points.

Sorry, sorry, sorry, but you guys don't understand how the scoring works.

You get awarded (benchmark * CPU time), and that averaged out with other people. Nothing else.

There is no points benefit from finishing WUs faster as such as you imply. You get your points awarded after the WU finishes but that does only have to do with the timing of score, not the amount of score.

Your high benchmark is flattened out by the averaging and there is no benefit from submitting more finished WUs, since all that counts is your CPU time contributed. But that CPU time contributed is not multiplied by your score, it is multiplied by the benchmark of some air wimp. So your CPU time is calcuted down and you have no benefit at all from your faster computer (unless you happen to be in a group where somebody has an even faster one).

It was also news to me that Linux/Unix are using an entirely different scheme. That is actually a bigger deal than the averaging.

Movieman

11-06-2006, 05:57 AM

Sorry, sorry, sorry, but you guys don't understand how the scoring works.

You get awarded (benchmark * CPU time), and that averaged out with other people. Nothing else.

(1)There is no points benefit from finishing WUs faster as such as you imply. You get your points awarded after the WU finishes but that does only have to do with the timing of score, not the amount of score.

(2)Your high benchmark is flattened out by the averaging and there is no benefit from submitting more finished WUs, since all that counts is your CPU time contributed. But that CPU time contributed is not multiplied by your score, it is multiplied by the benchmark of some air wimp. So your CPU time is calcuted down and you have no benefit at all from your faster computer (unless you happen to be in a group where somebody has an even faster one).

(3)It was also news to me that Linux/Unix are using an entirely different scheme. That is actually a bigger deal than the averaging.
(1)I think we're pretty much in agreement just that we're saying it different.
The theory is that every guy is entitled to the same points for doing a WU as each person has done the same "amount" of work in order to finsh that WU.
They simplify this by awarding the middle of the 3 claims.
(2) Cut to the chase, you have a faster machine, you will get more points in a day as your machine will do more work.
That's why XS with 170 guys is doing 1/3 of the points per day of Easy News that has 4000+ members. That's not exact as you have to factor in guys here with multiple machines I doubt Easy News has a DDTUNG on the team.:D
Look at the points per hour. This proves my point. We get app 80 I beleive, where the WCG average is 28 points per hour of runtime.
(3) Totally agree. The Linux users have been taking an unfair hit here. That is being corrected with version 5.8 that is being released soon.( not sure how soon)

[XC] riptide

11-06-2006, 06:35 AM

So anyone think that HT enabled machine would benefit. logical cores = x 2 real CPU time :hehe:

Movieman

11-06-2006, 06:49 AM

So anyone think that HT enabled machine would benefit. logical cores = x 2 real CPU time :hehe:
Actually with HT it gives you more total points as your working 2 WU per core but drops your points per hour as WCG sees one core working 24 hours as one day, so HT shows as 2 days per core.
That's why all my DX's show only 42 points per hour of worktime averaged across the group of them.

uOpt

11-06-2006, 09:30 AM

(2) Cut to the chase, you have a faster machine, you will get more points in a day as your machine will do more work.

No, since you get rewarded for the CPU time spent, and that doesn't take the CPU speed into account. That is how Rosetta at the time worked, too, along with all thae BOINC projects.

That's why XS with 170 guys is doing 1/3 of the points per day of Easy News that has 4000+ members. That's not exact as you have to factor in guys here with multiple machines I doubt Easy News has a DDTUNG on the team.:D
Look at the points per hour. This proves my point. We get app 80 I beleive, where the WCG average is 28 points per hour of runtime.

Do you have a few links? If we really get more points per running hour that would proof it but I don't think it's the case. Maybe we get more points since the faster machines actually do draw up the average in that group.

uOpt

11-06-2006, 09:32 AM

So anyone think that HT enabled machine would benefit. logical cores = x 2 real CPU time :hehe:

No, because the base is "CPU time spent", as reported by rusage() in the OS (or Windoze equivalent). Those functions in the OS properly report the actual CPU time spent in a process, so this aspect is fair.

rob725

11-06-2006, 09:56 AM

uOpt

11-06-2006, 10:36 AM

Machine A benchmarks at X and takes T time to finish the wu.
Machine B benchmarks at 2X and takes .5T to finish.

Both machines score TX points for the wu, but Machine B finishes 2 wu's in the same amount of time as Machine A finishes 1, so B scores twice as many points in the same amount of time.

This holds true for all of my machines and c2d's clocked faster than other c2d's score more points.

Hmmmmmm

The Linux and Mac agents use BOINC. BOINC points are calculated in a two-step process. First, the points (also called credit) claimed by a host are determined. BOINC points are calculated based on a benchmark that is run periodically by the BOINC client. This benchmark is then run through a calculation that determines how much credit per second of run time that device should earn.
[...]
Second, once validation has been completed, BOINC gives the same credit for a result to every device that worked on the same work unit. BOINC calculates how much credit this should be by taking the claimed credit for each result that was determined to be valid, eliminating the low and high values and then averaging the rest.

Let's say 4 machines work on the same wu:
- A: 1 GHz, benchmark 10000
- B: 2 GHz, benchmark 20000
- C: 3 GHz, benchmark 30000
- D: 4 GHz, benchmark 40000

Let's say they take this amount of CPU time for that WU:
- A: 4 hous
- B: 2 hours
- C: 1.333 hours
- D: 1 hours

Then under normal BOINC the score would be:
- A: 10000 * 4 = 40000
- B: 20000 * 2 = 40000
- C: 30000 * 1.333 = 40000
- D: 40000 * 1 = 40000

Get the idea? Same score for same actual work done.

%%

Here is what I thought WCG is doing:
- eliminate top benchmark
- eliminate lowest benchmark
- take average
==> use that as benchmark.

So our benchmark would be 25000.

So our scores would be:
- A: 4 * 25000 = 100000
- B: 2 * 25000 = 50000
- C: 1.333 * 25000 = 33333
- D: 1 * 25000 = 25000

%%

However, I was wrong. What actually happens is:

Now let's look at what WCG is doing:
- eliminate top scorer in points
- eliminate lowest scorer in points
- take average of points:

So the average in points is 40000.

So we score:
- A: 40000
- B: 40000
- C: 40000
- D: 40000

So indeed, you are right. Since the 4 GHz machine took only 1/4th the time you can score 4 times the points in a day by owning the 4 GHz box.

%%

Let's look at what happens when there is a "real" cheater, for example somebody who messes with his system clock during the benchmarking, clocks higher during benchmarking etc:

So the benchmarks are:
- A: 1 GHz, benchmark 10000
- B: 2 GHz, benchmark 20000
- C: 3 GHz, benchmark 50000 (cheater)
- D: 4 GHz, benchmark 40000

But the time in production, after benchmarking are:
- A: 4 hous
- B: 2 hours
- C: 1.333 hours
- D: 1 hours

So in normal BOINC the points would be:
- A: 10000 * 4 = 40000
- B: 20000 * 2 = 40000
- C: 50000 * 1.333 = 66666
- D: 40000 * 1 = 40000

So the cheater would get the direct reward for his actions.

In the WCG scheme this looks like:
- eliminate top (66666) and bottom scorer (40000)
- average the rest (both 40000)
==> score is 40000

So points allocated are:
- A: 40000
- B: 40000
- C: 40000
- D: 40000

For the same actual work.

So that looks like it should.

%%

I don't quite get why they leave out the lowest score, but I guess if only a minority of hosts are Linux then that works against the unfair benchmarking on Linux. In the default Linux client, the benchmark is lower although the box doesn't return less actual work during the WU computation.

%%

This look pretty fine to me now, except that I'd like to see what they plan to do about the Linux discrimination. Obviously I cannot participiate in a project that discriminates Linux, or FreeBSD for that matter, as that would go direcly against my real-life interests.

Anybody has a link to the Linux plans?

Movieman

11-06-2006, 10:41 AM

Only thing I can give you is a comment made by one of the admins at the WCG forum that they were aware that Linux was taking a bath and that the guy(s) that do BOINC would have it adjusted in ver 5.8

uOpt

11-06-2006, 10:55 AM

Only thing I can give you is a comment made by one of the admins at the WCG forum that they were aware that Linux was taking a bath and that the guy(s) that do BOINC would have it adjusted in ver 5.8

The question is:

why don't we just select a client of our own, and we pick an existing Linux client that gives us the same score as a Windows client on the same machine?

That sounds fair and is quickly done.

The actual WCG binary, is it available for 64 bit Linux or is it always a 32 bit binary even if you have a 64 bit BOINC?

Fr3ak

11-06-2006, 11:02 AM

How about we post something or send a email to one of the guys that knows about all this to make sure the scoring system is as fair as it can be.
And if not, we could make some sugestions as we have the fastest stuff out there.
As we have nothing to lose, its worth a try in my eyes.

Fr3ak

11-06-2006, 11:05 AM

Movieman

11-06-2006, 11:39 AM

The question is:

why don't we just select a client of our own, and we pick an existing Linux client that gives us the same score as a Windows client on the same machine?

That sounds fair and is quickly done.

The actual WCG binary, is it available for 64 bit Linux or is it always a 32 bit binary even if you have a 64 bit BOINC?
We're trying very hard to stick with a stock BOINC client for a lot of reasons.
The BS that went on at Rosetta is #1 on that list. I'm not prepared to go through another round of being called a cheat by anyone. Since the boinc client doesn't do any work, but merely serves as a host for the project and a bencemark, using anything that inflates the bench without effecting the science app leads to that kind of BS. I don't care if another client will give me more points, I'm using the stock client.
Linux is a different issue. My suggestion is that if your not happy with the points that you'd get with the current BOINC,and I don't blame you a bit for not being happy with that, is to wait for ver 5.8..My guess is 2-6 weeks and it will be out.

How about we post something or send a email to one of the guys that knows about all this to make sure the scoring system is as fair as it can be.
And if not, we could make some sugestions as we have the fastest stuff out there.
As we have nothing to lose, its worth a try in my eyes.
Been there, done that. Posted on this until I was blue in the face. They know all the issues. I get the sense that changing the science app is a big deal and something they are looking into but not done yet.
There has been talk of 64 bit science apps but thats in the talking stage from what I see.
It may also be a money issue. What would the costs be to bring in a team of software designers to totally overhaul the science apps? I am clueless on that.
I also think that there is a bit of " It's working and lets not break it" thinking and IBM foots the bill for the whole thing and doesn't get any return from it at all. Strictly a cost center.

[XC] Teroedni

11-06-2006, 11:48 AM

Yea hopefully the 2.8 version fix this BIG differenze

BY the way
Have any of you linux guys tried to run the Windows Binary in Linux true .wine?
I wonder if that would work:P
Ive tempted to remove my 64 bit and Install a normal 32 bit +.wine +Boinc Windows:D

uOpt

11-06-2006, 12:02 PM

We're trying very hard to stick with a stock BOINC client for a lot of reasons.
The BS that went on at Rosetta is #1 on that list. I'm not prepared to go through another round of being called a cheat by anyone. Since the boinc client doesn't do any work, but merely serves as a host for the project and a bencemark, using anything that inflates the bench without effecting the science app leads to that kind of BS. I don't care if another client will give me more points, I'm using the stock client.

Linux is a different issue.

That's what I meant: no fiddling with the Windows client at all. But for Linux we just pick a binary that scores the same benchmarks as the stock(!) Win32 client instead of the braindead stock Linux binary.

That should be fair for everybody, no?

rob725

11-06-2006, 12:02 PM

My understanding is that wcg and boinc are aware of linux disparity and are addressing it.

@riptide: HT is a great tech for interactivity, but for crunching, I think it will carry a small penalty. You still only have one cpu crunching, so it is only working on one wu at a time, but switching quickly between them. The overhead for the switching has to cost something, so it will take longer to process the same two wu's in parallel rather than sequentually.

[XC]Atomicpineapple

11-06-2006, 12:10 PM

That's what I meant: no fiddling with the Windows client at all. But for Linux we just pick a binary that scores the same benchmarks as the stock(!) Win32 client instead of the braindead stock Linux binary.

That should be fair for everybody, no?

Sounds fair, but you know that any manipulation of anything by XS will be used to found accusations of cheating, so its not worth it, especially when a legitimate, authorised fix for the issue is on its way.

meshmesh

11-06-2006, 12:42 PM

Hello everyone. I leave for two weeks and come back to this thread. May I inquire why are we still bothered about the scoring system. I thought everyone was ok with it. My bad.

Ok: the scoring system that involves RAM, hd space, bandwidth,etc is the United Devices system and has NOTHING to do with the BOINC system. The UD system rewards lower scores for the same amount of work, a fact that is well known.
Boinc uses the quorum system as pointed out correctly in the detailed example by uOpt. It is fair and does not penalise the faster machines in any way. (ok tecnically there is a small catch, but very small).

And yes, it is necessary to remove both the high as well as the low scores in a quorum system, so that the machines (linux?) which normally claim low do not reduce the quorum. It is good for them as they get awarded the median score (typically as stock windows machine) which is fair. it also eliminates the need for Linux participants to use Trux and other special clients. The quorum will award them higher. End of story.

Now, here is my hunch: WCG will try to fix the problem with the UD client, since most of the participants are using it.
They will also try to make the quorum more consistent, rather than depend on three results per WU.

But the issue is (as pointed above) that WUs take varying lengths to complete. However, if we look at the CPU time, it is pretty consistent +/- 20% among the WUs of any ONE project batch. It is not 1 hr vs 5 hours as stated above. The one hour units are cancer project WUs. FAAH WUs are the ones that take 4-5 hours to complete. So here is the delema:
1) use a quorum of three and accept some variability in claimed vs granted on each WU depending on who is with you ion the quorum. Evens out over time, but does not help the UD client participants and does not solve the linux/mac problem once and for all.
2) use a quorum of everybody and get a more consistent result per WU but with variable length that evens out over time.

The second approach is probably what they are aiming for. The way to do it may be like this (my guess):
Put out the WUs to be crunched by both the UD and the BOINC clients.
Get the majority of the BOINC WUs back after being crunched (don't need to wait for all of them, 75% will do just fine).
Eliminate the top and bottom 25% of the returned BOINC WUs and average the claimed credit of the middle 50% (or take the median).
Apply this credit to every WU in the batch including the UD participant results as well.
Check the top claimed credits (maybe 3 standard deviations) and flag out for manual review. Slash their credits if needed.

Would work for me. End result will be the same as now and probably a bit better. Why? because there will not be the case when the under performing machines rob me occasionally from some credit (minor but happens).
Side effect: Will see that variable length WUs get the same credit. No problem, it evens out very quickly within a dozen WUs or so.

Now, they have not explained yet what they are doing. But I am willing to bet that these guys are not stupid. They will put their minds to it and give it thorough good look. It will not be implemented unless it is correct. I have faith.

So why don't we just hold back for now and let them do their thing? Forget about posting or emailing to inquire. If someone is participating in the beta test, please keep us informed regarding how it is panning out. That's all.

uOpt

11-06-2006, 01:25 PM

Movieman

11-06-2006, 01:32 PM

I think the RAM and disk thing is insane. I have never seen anything BOINCish take more than a couple dozen megabytes. And harddrive space? Why give people more points for space than boinc and wcg don't use?

Why are they happily computing everything 3 times, anyway? Just for consistency checking?
No arguements from me that there are things that could be improved but it is what it is and that gives all( on windows) a fair playing field so not a big issue to me.
When they make the adjustment to BOINC 5.8 then Linux will get a fair shake also. Until then we wait. Not really worth arguing.
Help me figure out where to get another 1.5 mil points a day.
Thats my big concern now.:D

[XC]melymel

11-06-2006, 02:01 PM

I think the RAM and disk thing is insane. I have never seen anything BOINCish take more than a couple dozen megabytes. And harddrive space? Why give people more points for space than boinc and wcg don't use?

The UD client that this was a problem on is a very old client from the dawn of DC :p: so it has many issues and would probably be quickly swept under the rug if it wasn't still being used on such a grand scale.

Basically you guys all know that I don't like the stock boinc clients benchmarks (never new A64 had a 100%+ clock for clock advantage on netburst :rolleyes: ) but I really can't see much of a point in stiring up trouble over it, the quorum system is as fair as we can realistically hope unless you are unfortuate enough to be paired with 2 486's (my nightmare :p:). The linux issue I understand and as has been said they are working on it and should have a fix soon so I thik it's best to sit and wait for them to throw up their new release.

Out of interest what do cpu's score on linux relative to windows? If it is a large differance then it may be worth putting the opti's on until the fix else you'll most likely be the middle man in the quorum and unfairly dragging yours and the other members in the quorums marks down.

Anyways chill all lets not get worked up over nothing :toast:

rob725

11-06-2006, 02:05 PM

Btw, the protein folding project, which is online for united client, will be available to the boinc client in a week or two. For that project, my understanding is there will be no quorum as they use techniques similar to rosetta for determining bad results and don't need the triple-verification.

mike047

11-06-2006, 04:02 PM

I just found this, it may be old news to some;

http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=9545

example;
Workunit Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
faah0881_ bdb130_ mx2bpy_ 06 Valid 11/04/2006 18:42:55 11/06/2006 22:53:04 4.74 64 / 59
faah0881_ bdb130_ mx2bpy_ 06 Valid 11/04/2006 18:42:51 11/05/2006 13:46:58 6.94 54 / 59
faah0881_ bdb130_ mx2bpy_ 06 Invalid 11/04/2006 18:40:19 11/06/2006 20:39:28 5.56 178 / 30

meshmesh

11-06-2006, 04:36 PM

Ok. So the system is now implemented and out of beta. It looks good. And the most important and benifical aspect to this is that it will put an end to all the filth of dragina$$ and the likes which for me is great.

Now we check our results pages and see if it is consistent. Even some discrepancy is ok as long as it saves us the headache of the :banana: :banana: guys.

Jose

11-06-2006, 05:04 PM

After the Rosetta Debacle: I promised me not to be involved in any credit issues. Also I promised not to develop any attachment to any poject in particular.

I am glad I did

STEvil

11-06-2006, 07:36 PM

[XC] hipno650

11-06-2006, 08:19 PM

STEvil

11-06-2006, 08:46 PM

noticed the granted points are 1/2 the granted for a valid result in your quote. Seems to me if you're using an optimization you're only getting 1/2 credits, if thats what that last one is?

rob725

11-06-2006, 09:18 PM

Exactly. From the few results I've seen, I like the new scoring as low scores are kind of ignored and the two higher ones averaged.

ShootStraight

11-06-2006, 09:23 PM

I dislike the methods used to derive the machine benchmark scores.

I believe any "patches" or updates which do not fix this fundamental flaw within the basic infrastructure of the whole of all the DC programs encompassed by BOINC are useless and only released because they do not wish to fix their problem.

Still sticking to WCG for now of course, just venting some frustration :(

Rest assured you arent alone in your sentiments.

I think this is just mere window dressing to give the illusion to the disgruntled masses that theyre doing something about cheating, when they really arent as there are no changes to the quorum

It would seem to me a very good way to get rid of any use of an obviously optimized client while driving the cheating to even darker places. The fundamental BOINC flaws are still there and so long as they remain, there will be cheating and it will still be lucrative, just a little less noticeable/objectionable. It is just a matter of time before someone compiles a version with a calibrated optimization to max received credits while flying under the "Outlier Radar." It will work just as often as an optimized client would in the quorum and would become more lucrative over-time as it propagates. Or they could do it the old fashioned way and edit the xml's.

Much to do about nothing as far as cheating is concerned. A little smoke, a mirror or two, and a red silk hanky would be more pragmatic than a new scoring algorithm in the war on cheating. :slap: :slapass: :p:

-SS

rob725

11-07-2006, 05:26 AM

Plus, as I pointed out to them on their forum, they were fixing something that wasn't broken, while the main problem was the benchmarking inconsistencies. I had looked at my last sixty validated results and out of the 180 claims, only 3 had excessively high claims, and in each case the high claim was ignored for the "normal" middle value, thus having no real effect. So all of this was an unnecessary appeasement of misguided complaints.

uOpt

11-07-2006, 06:48 AM

Why can't they have a list.

AMD64 = 1.0 * clockspeed
Core2 4MB cache = 1.2 * clockspeed
Pentium4 = 0.6 * clockspeed

There are really not that many CPUs out there.

Movieman

11-07-2006, 06:54 AM

Why can't they have a list.

AMD64 = 1.0 * clockspeed
Core2 4MB cache = 1.2 * clockspeed
Pentium4 = 0.6 * clockspeed

There are really not that many CPUs out there.
A real simple guess: They don't because either they never looked at it that way, they aren't smart enough to see it that simply or there are factors we're not aware of that making "correcting" the issues that simple to do.

Martijn

11-07-2006, 07:05 AM

A real simple guess: They don't because either they never looked at it that way, they aren't smart enough to see it that simply or there are factors we're not aware of that making "correcting" the issues that simple to do.
They can't do that because BOINC tends to report wrong CPUs every now and then... Just what happened with Conroe/Kentsfield... Now of course new versions will be out, but this also happens with other people that have 'ordinary' CPUs.

Fr3ak

11-07-2006, 08:40 AM

Martijn

11-07-2006, 08:50 AM

But I dont even care much about that, cheaters will always find a way to cheat, but whats making me think is the unoptimisation of the WCG units. Would be much better for science to make use of latest hardware extensions.
Yeah, imagine how much more work can be done when using 'only' SSE2... It doubles the benches. That could mean twice as much work done in the same time :fact:

Movieman

11-07-2006, 08:53 AM

And there we are being faced with one of the main problems in my eyes again.
There is no way you can see what someone is running, neither the cpu speed nor the CPU one is using.
Of course Boinc reports the CPU, but looks like WCG aint using that information.

But I dont even care much about that, cheaters will always find a way to cheat, but whats making me think is the unoptimisation of the WCG units. Would be much better for science to make use of latest hardware extensions.
Agreed 100%. I think the issue, at least from what I can get from them, is the time involved to rewrite everything and the cost.
That I got from a conversation a few weeks ago but then this past week I heard that they were seriously looking at redoing. A little confused.
I THINK what happenned is that due to the dagomouth flame wars a guy high up at IBM came in and started reading and maybe, just maybe what we were saying sunk in.I'd mentioned some of the equipment we were bringing online in the next few weeks. I think that may have caused someone to do some looking at what we do and realise that we are offering a huge potential to them. Maybe we are the seed that will make this happen.
Just thinking outloud here but it does make sense. The tone there changed 2 days after this IBM guy came to the WCG forum..

uOpt

11-07-2006, 09:04 AM

Yeah, imagine how much more work can be done when using 'only' SSE2... It doubles the benches. That could mean twice as much work done in the same time :fact:

Yeah, if instead of keeping the source they would allow people to recompile it (the WCG client, not boinc), then people could do all kinds of optimizations like that. 64 bit just for starters.

It wouldn't double the throughput. The only reason the benchmarks in BOINC are doubled is that BOINC chooses the worst CPU benchmarks ever created by humankind, drystones and wetstones. I remember vividly how I dismissed them as utterly useless benchmarks when testing a SPARC 10 against my SPARC2. That was in 1992.

Use of drystones must violate a U.N. human rights charta or two.

Jose

11-07-2006, 09:34 AM

so seeing this if we are still accused of cheating which we are not. that would mean we are winning with a faction of the points per wu. thus meaning we are still the best:D

More than the best!!!!

And we are in the process of beating them under their rules. That is even more awesome.

Fr3ak

11-07-2006, 11:50 AM

uOpt

11-07-2006, 12:21 PM

Well , I am no hardcore programmer, but to my understanding all you need to make an app SSE2 or 3 supported is to compile the sourcecode (creating the app) with a SSE2 compatible compiler and the use of the right flags.
There is no real work that has to be done as you type in the command and the computer compiles it..

Can be both. Just recompiling I never got more than 10% out of it.

The problem is that SSE is not just a faster general purpose computation unit in the CPU. It is a parallel (SIMD) unit with a base speed for linear execution not much better than the main FPU.

Existing compilers are notoriously bad to proof that a given piece of code is indeed safe to execute in parallel. The C/C++ languages make it much worse since they have too much rope to hang these things. So they usually bail out after very obvious and trivial things.

To get the large speedup out of SSE you have to split up things manually and code them in assembler or a macro frontend for SSE. Pretty ugly business.

Fr3ak

11-07-2006, 02:08 PM

I know, but I like assembler.
I never did any bigger stuff in C/C++. I am a uni theory whore.

So its a little more complicated, but 10% for almost no effort would already be a good start.

uOpt

11-07-2006, 03:38 PM

So its a little more complicated, but 10% for almost no effort would already be a good start.

There is no question that trying a couple dozen compiler settings (and compilers such as icc) would give the total (real) throughput a major boost.

I figure as long as they compute every WU 3 times they can't be too serious about throughput anyway, or maybe there is something I don't quite understand.

Movieman

11-07-2006, 03:48 PM

There is no question that trying a couple dozen compiler settings (and compilers such as icc) would give the total (real) throughput a major boost.

I figure as long as they compute every WU 3 times they can't be too serious about throughput anyway, or maybe there is something I don't quite understand.
My understanding on the quorum of 3 is that the chances of 3 machines making the same error are so slim that it allows them the confidence to know that all the work is done correctly. Whether thats true or not I don't know but I'll give them the benefit of the doubt on that point.
Competitions may be fun but not at the risk of sending any lab bad data.

meshmesh

11-07-2006, 04:39 PM

My understanding on the quorum of 3 is that the chances of 3 machines making the same error are so slim that it allows them the confidence to know that all the work is done correctly. Whether thats true or not I don't know but I'll give them the benefit of the doubt on that point.
Competitions may be fun but not at the risk of sending any lab bad data.
I had a discussion about this with the admins of the malaria Boinc project a while back, and that is what they said too. they are worried about variations in the results between different versions of OS and hardware platforms. The guy assured me that there are some differences in the way the math is handled and that he is not in a position to verify 100% that all results are correct everytime he introduces a change in the code. The guy simply cannot do it without extensive testing every time.

So for them they take that route. It is not the issue whether they are seeing a lot of variation, they are worried that if they do a quorum of one, then they have no way of knowing if something creeps in.

And this is a different case that R@H that searches a result domain then verify the lowest result themselves.

I see where they are coming from. They rather get more results, but it would mean zip if they cannot guaratee that the outcome is correct. To them verification is dead serious, and I understand them being a bit careful.

rob725

11-07-2006, 04:45 PM

uOpt

11-08-2006, 09:29 AM

Some of the same reasons their concerned with compiler opti's; they're scientists and need to feel absolutely confident of the integrity of their data, so they see a lot of testing before they'd be comfortable releasing new code.

Variances between machines shouldn't matter. Granted, a PowerPC and PC don't come up with the same answer, but they will be inside the IEEE spec for floating point. That's the most they can expect either way, even Intel and AMD chips for i386 come up slightly different.

I think the 3-time re-run for everything specifically targets instable hardware. Not overclocked. Most hardware out there in the wild of U.S. households is not totally stable when you have 100% CPU load for extended periods of time. ECC RAM is also pretty rare and the confidence in the results ends right there.

Not using better compilers or compiler flags is not likely to be anything else than a waste of CPU time unless you get too aggressive.

RAMMIE

11-08-2006, 05:26 PM

Exactly. From the few results I've seen, I like the new scoring as low scores are kind of ignored and the two higher ones averaged.

I see all 3 in the Quarum averaged.

meshmesh

11-09-2006, 12:34 PM

I see all 3 in the Quarum averaged
Oh man.:eek: This way the occasional low claims like 50,44,.. which are usually ignored will drag the other two down with it. What this will accomplish is that bad people may try to slowly push the scores up over time. It also rewards the overclaimers by guaranteeing that their "slight" overclaim will produce 33% reward, rather than being ignored in the old system. BAd, bad, bad.

And even the poor linux users who used to get the middle (usually the typical windows) will now get less, as their underclaimed credit will reflect in their score as well as the other two guys.

If 50% of the time I get one of those 15 points lower guys, then I expect about maybe 5% reduction overall in the daily scores. It also means honest crunchers would not mind when they see one of those "slight" overclaimers, you know the 80 - 85 to balance things up (they don't show often, but now people may then wish that they do show up 50% of the time also from here on!). This is not good.

What a strange idea. I cannot believe that this could be what they came up with! Maybe those few ones you saw looked this way, I haven't checked myself. I truly hope not.

For me, I have saved a snap shot of the list of valid results in a PDF at mid night 7 Nov just before implementation. All scores were done with the old quorum system. Will wait till 17 Nov (enough for the whole validated set to be replaced by the new system), then take another snap shot.

Been planning to post totals for: hours run, claimed credit , granted credit for both old and new system in a spread sheet and post it here for comparizon.

In the meantime, I hope that this is not what they are doing really.

Martijn

11-09-2006, 01:08 PM

Oh man.:eek: This way the occasional low claims like 50,44,.. which are usually ignored will drag the other two down with it. What this will accomplish is that bad people may try to slowly push the scores up over time. It also rewards the overclaimers by guaranteeing that their "slight" overclaim will produce 33% reward, rather than being ignored in the old system. BAd, bad, bad.

And even the poor linux users who used to get the middle (usually the typical windows) will now get less, as their underclaimed credit will reflect in their score as well as the other two guys.

If 50% of the time I get one of those 15 points lower guys, then I expect about maybe 5% reduction overall in the daily scores. It also means honest crunchers would not mind when they see one of those "slight" overclaimers, you know the 80 - 85 to balance things up (they don't show often, but now people may then wish that they do show up 50% of the time also from here on!). This is not good.

What a strange idea. I cannot believe that this could be what they came up with! Maybe those few ones you saw looked this way, I haven't checked myself. I truly hope not.

For me, I have saved a snap shot of the list of valid results in a PDF at mid night 7 Nov just before implementation. All scores were done with the old quorum system. Will wait till 17 Nov (enough for the whole validated set to be replaced by the new system), then take another snap shot.

Been planning to post totals for: hours run, claimed credit , granted credit for both old and new system in a spread sheet and post it here for comparizon.

In the meantime, I hope that this is not what they are doing really.
I am seeing exactly the same. The average of the 3 scores is counted, when the average has one or more decimal numbers, it is rounded up. Have a look at that :).

uOpt

11-09-2006, 01:19 PM

Average is also bad because now the cheater gets rewarded again. He just needs to cheat more to get the same result.

The original plan to take the middle one is obviously correct.

Martijn

11-09-2006, 01:31 PM

Average is also bad because now the cheater gets rewarded again. He just needs to cheat more to get the same result.

The original plan to take the middle one is obviously correct.
Nope, I was still using the optis before the change, and I will now make a screenshot of what happens if you claim too much :D

Martijn

11-09-2006, 01:31 PM

meshmesh

11-09-2006, 01:50 PM

Average is also bad because now the cheater gets rewarded again. He just needs to cheat more to get the same result.

Agree.

...What this will accomplish is that bad people may try to slowly push the scores up over time. It also rewards the overclaimers by guaranteeing that their "slight" overclaim will produce 33% reward, rather than being ignored in the old system. BAd, bad, bad.

The original plan to take the middle one is obviously correct.

The middle (aka median) is always the right approach. I was hoping that they would be planning to take the median of the whole population. that way, the claims of optimised clients as well as the slight (read manual) overclaimers would be ignored. Also the crippled net burst machines and the linux underclaimers would be ignored as well because all those would be on the ends of the spectrum of the population. Then the median would be actually your typical windows machine, probably a bit lower that what an XS top spec setup may claim, but that is OK. maybe then a few percentages drop but not important if this will stop the winers.

But if they are taking averages of a quorum of three, it would be really bad, and as you said is inviting trouble. What would prevent some peaple now from joining the "slight" overclaimers club on the justification of adjusting for the low netburst and linux claims in order to bring things back to the scoring of the "typical" machine?

We have a saying (I try to translate): "...wanted to put eyeliner, ended up blinding her...".:)

meshmesh

11-09-2006, 01:55 PM

Nope, I was still using the optis before the change, and I will now make a screenshot of what happens if you claim too much :D

Well Ok. Those who still use the opti :slap: will get their score slached in half. But the concern is those who "slightly" overclaim, you know the 80 - 85 claims vs what I would claim 62 - 67. These guys will now have no reason not to keep screwing the system up.

Martijn

11-09-2006, 01:55 PM

Meshmesh, I guess I replied while you were writing yours, but I think when looking at the screenshot, all questions are answered. No cheaters anymore :D

EDIT: I think we need to know where the 'overclaim' limit actually is... Personally I don't care if someone has 80 point-claims at all...

meshmesh

11-09-2006, 02:39 PM

Ok, I just looked at my scoring. A bit odd. I see some look like average, but some don't. I wonder whether we are still in some kind of a transitional phase, between scoring those WUs that were out before the new system but were scored after, and purly those which were distributed and scored after.

Here are two example that are NOT the average. What is common in these two particular ones is that all the WUs were dispatched after the new system. The rest of my results which look to be scored according to quorum average appear to have some of it's units dispatched before the new system even though they were scored after.

First: average 58.2 ==> granted 53
This result average was over inflated by one of those "slight" manipulators, but was granted lower.
http://www.xtremesystems.org/forums/attachment.php?attachmentid=52736&stc=1&d=1163111508

Second : average 46.6 ==> granted 52
this result average was dragged down by a very bad linux? setup, but was granted higher.
http://www.xtremesystems.org/forums/attachment.php?attachmentid=52737&stc=1&d=1163111508

My guess is that the new system must be averaging the whole population after removing the opties. Which is not bad. Would have rather they have taken the median so that the rediculously low claims (27 see above) :mad: does not drag poeple down. But at least it is much better that averaging the quorum. :)

I would suggest we wait until all the old units get validated and we are looking a WUs that were fully distributed after the new scoring system was implemented, before quantifying the impact. This may take 10 days to have a valid sample. In the mean time maybe we wait a bit.

Movieman

11-09-2006, 02:41 PM

Said to no one in particular:
I want you all to listen to me for just a minute:
I love to compete. I enjoy winning as much as the next guy BUT if you have to cheat to win then you've won nothing.It is meaningless.
Like trying to drink a glass of air when your thirsty my friends.
I would rather come in dead last and know I'd done my best but fairly.
We have the best machinery. We have the best people. The hell with the scoring system. I just spent 10 mins looking at my results and all I came away with from it is they have now effectively penalized the cheaters by taking the 2 good results of the 3, averaging those 2, the same points to those 2 guys, and then taking the guy with the crazy point claim and giving him 1/2 of what they have the other 2 guys. Pardon my French, but about god damned time! The first effective way to stop the crap.
I've campaigned for everyone to use the stock clients and if my rants didn't get through to some of you, this should.
Winning by cheating means nothing.
Do it clean and the win means everything.
(end of rant, old guy goes and bangs his head against the wall hoping this sunk in)

last point: Reading a comment: I thought that WU were sent to machines with the same OS, IE: quorum of 3 is all windows users or all Linux users..I don't think they mix unless I missed something.

meshmesh

11-09-2006, 03:24 PM

last point: Reading a comment: I thought that WU were sent to machines with the same OS, IE: quorum of 3 is all windows users or all Linux users..I don't think they mix unless I missed something.

I don't know MM maybe they do split them up. But the important point is, in the second example, this odd machine running a WU for 4.X hours (same as mine) which means it is not an old clunker but a new machine, and then stupidly claims 29 points at the end means to me that it is a software / OS benchmark issue. Whether linux, old BOINC version, or some strange windows setup I don't know. Maybe Boinc benched the guy's machine while he was playing a game and got highly interrupted by background activity.

The point is, this new scoring system seems not to be taking the average of the quorum and thus he will not drag me down. Similarly the "slight" overclaimer in the first example did not appear to influence the granted credit. If this is going to be the way they do it, I am happy. We just wait and see.

By the way, looking at these two examples alone, I say that while I usually get on a weekly average 85% of what I claim under the old system, this one which may be driving the scoring from the average of the universal claimed scores, appear to be granting me 80% of my claim. This is totally understandable given that my machine is an AMD X2 while the majority of the population would be Intel P4s. So never mind, still Ok.

Movieman

11-09-2006, 03:50 PM

I don't know MM maybe they do split them up. But the important point is, in the second example, this odd machine running a WU for 4.X hours (same as mine) which means it is not an old clunker but a new machine, and then stupidly claims 29 points at the end means to me that it is a software / OS benchmark issue. Whether linux, old BOINC version, or some strange windows setup I don't know. Maybe Boinc benched the guy's machine while he was playing a game and got highly interrupted by background activity.

The point is, this new scoring system seems not to be taking the average of the quorum and thus he will not drag me down. Similarly the "slight" overclaimer in the first example did not appear to influence the granted credit. If this is going to be the way they do it, I am happy. We just wait and see.

By the way, looking at these two examples alone, I say that while I usually get on a weekly average 85% of what I claim under the old system, this one which may be driving the scoring from the average of the universal claimed scores, appear to be granting me 80% of my claim. This is totally understandable given that my machine is an AMD X2 while the majority of the population would be Intel P4s. So never mind, still Ok.
Yes, possible that 4 hour machine could be doing something else when it benched. I just think this is a positive thing.All I've ever wanted is a level playing field and this is a step in the right direction. When they patch/fix whatever they are going to do for Linux we'll be a lot closer to what we want.
Not perfect, but then what in life is.

mike047

11-09-2006, 04:12 PM

Yes, possible that 4 hour machine could be doing something else when it benched. I just think this is a positive thing.All I've ever wanted is a level playing field and this is a step in the right direction. When they patch/fix whatever they are going to do for Linux we'll be a lot closer to what we want.
Not perfect, but then what in life is.

My CAT :p:

[XC] 4X4N

11-09-2006, 04:37 PM

Maybe Boinc benched the guy's machine while he was playing a game and got highly interrupted by background activity.

This is the main thing I don't like about boinc. My main rig is a htpc set-up, and I seem to have bad luck that the benchmark runs a lot of the time while I'm recording. I wish that there was a better program. As for the new scoring, I'm almost always the highest in the quorum, and the scores seem about the same for me.

meshmesh

11-09-2006, 04:59 PM

This is the main thing I don't like about boinc. My main rig is a htpc set-up, and I seem to have bad luck that the benchmark runs a lot of the time while I'm recording. I wish that there was a better program. As for the new scoring, I'm almost always the highest in the quorum, and the scores seem about the same for me.

If you leave your machine on 24/7, there is a solution to make sure that it benches according to it's real potential. Remember, the majority of the time it will be running uninterrupted. Do this:

First thing in the morning say 6:00 AM, invoke the "run bemchmark" command from the boinc manager. Then leave it running 24/7 afterwards. Because it automatically benches exactly every five days, and because the time it first benched is typically a time it is not in use, it will always bench afterwards correctly.

Regarding the scoring: What is your rig? AMD?

[XC] 4X4N

11-09-2006, 05:17 PM

That's a good idea. I leave for work at 5:30 every morning, so I can do that. I run an opty 170 @ 2.9ghz, with 2gb ballistix. The one in my sig is now my daughters. I'm in the process of a C2D build. I'm going to list some stuff for sale for team members here in the next few days to fund it. Always been an amd guy, so this will be a change for me.

Movieman

11-09-2006, 05:18 PM

DDTUNG

11-09-2006, 05:29 PM

meshmesh

11-09-2006, 05:48 PM

As far as I can tell, the new system started to be implemented on the 7th of November midnight UTC.
If you may have missed a few office machines, then their scores will be halfed.
If I may ask, what percentage of your production is done by AMDs?

DDTUNG

11-09-2006, 06:13 PM

rob725

11-09-2006, 06:24 PM

My daily points total has been down significantly for the past week, though partly due to some network problems at the office. Since I am running non-optimized 5.4.9 or 5.4.11 on all my machines(might have missed a couple of office P4s somewhere), I suspect the new system works against high powered rigs.

DDTUNG:cool:

Maybe, but I've seen some wierd stuff as far as point fluctuations before the new changes. The extreme lag due to the lomg time allowed to return results creates statistical anomolies that I would have thought would even out due to the large population, but often don't. I have looked at a fair number of "new" result sets, and haven't been able to pick up a penalty.

Often point drops are simply a result of lower runtime for the day. This runtime has nothing to do with the day in question but is the aggregate of the runtime for the quorums that were completed that day.

To see if the point system is having an effect, check the device stats for a machine. Calculate points/runtime before and after the change.

Edit:
Another thing I was wondering is whether the change in cancer wu's is having temporary depressing effect. I noticed about two weeks ago that they got longer in general, about 1.5-2x. I believe this might have caused a statistical wave slowing down finished quorums, especially for machines that process a large number of cancer wu's. If so, there will be a corresponding "catch-up" I believe.

meshmesh

11-09-2006, 06:31 PM

26 AMD dual core rigs.
31 Dual Xeons.
1 Conroe.
The rest are P4 HTs.

DDTUNG

I have a feeling that under the new system the AMD X2s will get some more hit. If you look at the two pictures in my post above, you will get my drift.

I have usually noticed that my rig claims on the higher side than the other two computers in the quorum most of the time. This meant, that when another similar higher claiming PC happens to be in the quorum with me, the whole quorum benefits.

I don't know if this is an AMD thing, but I am guessing. Now under the new system, which uses the average, this will always be deluted bu those claiming 4X.

So my PC used occasionally to be granted the claimed 6X either because of a similar high quality PC happens to be with me, or because a higher (optimised or overclaiming) happens to come along and my PC is the middle PC which is applied to all three.

But now, under the new system, which uses the average, this will never again happen. The granted credit will always be diluted by the lower 4X and the occasional 5X PCs.

Initial hunch 5% to 10% reduction in weekly granted credit. WIll be able to give exact figure in ten days or so.

Off coarse the 4X PCs will end up doing a bit better too, but since they are the majority (say 2/3), then their benefit will only be half as much.

So if your fire power (ie percentage of returned WUs) is mostly low end you benefit a bit as a whole. If your main fire power is mostly on the high end, you will loose a lot more. Just my opinion.

DDTUNG

11-09-2006, 07:29 PM

rob725

11-09-2006, 07:39 PM

meshmesh

11-09-2006, 07:52 PM

Before I joined this project I was given the impression that the project Admins didn't care much about the comments of the whiners. Now they go and overhaul the points system in response, and put my high end rigs even at a bigger disadvantage. They can be sure that I will not be adding or upgrading to Conroes or Woodcrests so long as this system remains in place.
In fact, any more stupid moves from the Admins and I'll be ready to pack up.:mad:

DDTUNG:cool:
Cool down big guy. Wait ten days. Then we get exact numbers. Now, should there be a significant overall hit we will sure feel it since many rigs here are high end. But we need to have solid numbers reflected in our stats first.

Yes I agree, that the quorum system was fine. All they needed to do was half the abusive scores and leave the system as it was otherwise. But they took this averaging extra step, as a way to level the field maybe.:rolleyes:

The only benefit I see from it is that no one will be able to say anything any more since it is an average system. The way I look at it, is even if there is some loss, I can live with it because silencing the winers has a major benefit to my blood pressure.

What we do now? My opinion: crunch, collect data then see what we have.

Movieman

11-09-2006, 07:52 PM

Before I joined this project I was given the impression that the project Admins didn't care much about the comments of the whiners. Now they go and overhaul the points system in response, and put my high end rigs even at a bigger disadvantage. They can be sure that I will not be adding or upgrading to Conroes or Woodcrests so long as this system remains in place. Once we give the new points system some time to settle down, and if we can show then that it disadvantages XS as a whole even though we are already running the standard clients, we should make a stand and demand a more level playing field.

In fact, any more stupid moves from the Admins and I'll be ready to pack up.:mad:

DDTUNG:cool:
I understand how you feel but I'm still back to the timeframes involved.
If one of your xeons(per core) is doing 8 WU a day and getting 400 points points and Mary's P4-2000 is doing 4 work units a day and getting 200 points there is our advantage.
Averaging or using a median number is not very different from what they were doing with giving the points claimed by the middle of the 3 in the quorum and we still maintain the speed (time)factor. There may be a small downside in the points but that same percentage should be equal to all the players.

uOpt

11-09-2006, 07:56 PM

Sooo, it is possible they just goofed a little and took the average, not the middle of the 3 WUs? That would explain why DDTUNG gets screwed.

rob725

11-09-2006, 07:58 PM

I believe the penalty (assuming there is one) will hit everyone, because everyone in the quorum gets the same score, so the one who drags down the average also would have scored more in the previous system.

I think the only real disadvantage to us from this is that we are having to make up points on teams that had a longer run using the median system.

DDTUNG

11-09-2006, 08:12 PM

Movieman

11-09-2006, 08:17 PM

Sooo, it is possible they just goofed a little and took the average, not the middle of the 3 WUs? That would explain why DDTUNG gets screwed.
If you look here you see that 3 legit people ran the unit, they averaged the points and all got the average. Thats as it is supposed to bem you do the same work, you get the same points. This is my DX3600 doing 4 units at a time. Assuming that the other 2 machines are also not doing 4 WU at a time you see the advantage I have.
On the second pic you see that someone got "creative" on their claim, WCG averaged the score of the remaining 2 of the quorum and then gave 1/2 that amount to the creative guy.
Damn well discourages people being "creative"

littleowl

11-09-2006, 08:18 PM

meshmesh

11-09-2006, 08:29 PM

Looking at your charts above, I believe what they did is throw out the unpenalized outliers and then averaged the rest. (I am not sure they are still doing this as I noticed it on ealier new wu's, but then saw pure averages on later ones with similar data). I also think they are either rounding off or truncating precision for the chart display. So, in the first one they threw out the 94 and in the second they ignored the 29.

Still, your hunch of 5-10% may be right. I looked at 4 machines, 3 days prior compared to the last 3 days. 1 c2d, 2 a64s, 1 X2. Here are the points/coreminute: before : after : delta

c2d:......2.096 : 2.042 : .054
a641:....1.722 : 1.467 : .255
a642:....1.430 : 1.401 : .029
X2:.......1.473 : 1.384 : .089

Might not yet have enough days to nail down the drop %, but since all 4 dropped it looks like there will be some effect. I would guess closer to 5% than 10%.
Good spot Rob. You are absolutely right. Ok, to clarify: they are taking average per quorum, but they are applying the eliminations both to the optimized and the "slight" overclaimers as well as the very poor claimers. Then they average the rest, and everyone gets it including the "slight" overclaimers and the poor performers. In addition, as a "vindictive" measure, they half the score of the opties (a somewhat unnecessary childish move, but that's ok).

So applying this to my two above examples:
First example: 43, 64, 94 claimed, the system rightfully spots the 94 as a "slight" overclaimer. So it is eliminated from the average. The average is then (43+64)/2 = 53.5 ===> granted 53 to all machines

Second example: 47, 29, 0, 48, 44, 65 claimed, the system rightfully spots the 29 as too low. So it is eliminated from the average. Good move. The average is then (47+48+44+65)/4 = 51 ===> granted 52 to all machines

Note:
1) I think the differences of +/- 1 between the calculated average and granted credit is probably because the fractions are not listed in the claimed credit, but is taken into account when doing the average math.

2) It is good that the "slight" overclaimers are not penalized, so that no high performance machine gets wrongly cought.

3) The irony is that, although the slight overclaimers do not directly benefit from their overclaim, they inadvertently reduce the averaging to two values, thus reducing the delution just a bit. :)

4) In the first example, under the old system, I would have been the governing value and thus everyone would have got the true credit of 64. But under the new system, we all got 53. Had the overclaimer not been there but rather another 43 claimer, we would have all got 50. So his presence actually helped just a bit !

5) With regards to the amount of hit, I agree that high performance machines will see a hit. Maybe 5% maybe 10%, probably somewhere in between. I will wait till all the six pages of "Verified Results" are based on the new system (probably 10 days) then compare with the other file I saved on the 7th.

6) The amount of hit will depend on the percentage of machines that claim the 40s and 50s vs those who claim the 60s, and whether your machine is one of those who claim the 40s vs the 60s.

@Rob: My AMD X2 usually claims in the 60s for FAAH. Is this the case also with the C2Ds and the other newer Intels that you guys have?

rob725

11-09-2006, 08:56 PM

You guys are missing the point here. There is no perfect point system. What I am mad about is the Admins messing with it to pacify the whiners. What they should be doing is developing crunching software that utilizes the extra power of modern processors, so that more science can be done. Only then will they realize how much work the XS firepower can accomplish, and we vindicate ourselves. Most of us here are driving Ferraris in a race where they limit us to 2nd gear.:(

DDTUNG:cool:

Agreed. I tried to make your first point on their forum here (http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=9549).

But, it was just a day before they went live, so even if it made sense to anyone there, I think they were already committed to going forward.

STEvil

11-09-2006, 09:32 PM

Rather simple formula can be applied to fix this whole benchmark fiasco.

A default work unit is sent out to all PC's and is calculated on each PC. This work unit should be re-calculated once a week or so.

The time taken to calculate this work unit creates a performance multiplier which is applied to cpu/gpu run-time per day to generate credits per hour reguardless of WU length.

For example a Core 2 Duo at 3ghz may run the benchmark in 1 minute. This would result in a score of "100." A P4 at 3ghz may take 1 minute and 20 seconds and score "70." Each would earn that many (respective) points per hour of WU computation.

Notes: Benching WU time to completion should not include duration when the CPU is not under load (WU pre-empted). If the bench WU is pre-empted for too long, the benchmark should restart to reduce chances of error in time calculation.

Simple, effective, no complex algorithms to screw around with.

Note the numbers are pulled out of the air, just for examples.

rob725

11-09-2006, 09:35 PM

It sure would be nice to get such a sample wu so we could tune our machines for best performance and get immediate feedback.

meshmesh

11-09-2006, 11:18 PM

Tell what guys I was thinking about this whole thing a bit more, and I don't think we will see real impact anyway. Maybe not even the 5%.

Just wait and see.

Martijn

11-09-2006, 11:21 PM

Tell what guys I was thinking about this whole thing a bit more, and I don't think we will see real impact anyway. Maybe not even the 5%.

Just wait and see.
Agreed on that. It'll even out eventually, when the cheaters get beack to the standard client, just like me :D

meshmesh

11-10-2006, 12:04 AM

STEvil: What you are suggesting is to find the CPU time needed for calculating a fixed set of operations. But at the end, it still need to be calibrated againest a standard machine to drive my granted credit per hour. Ok, after that it is easy to figure out the granted credit for all the other work units I crunch during the week based on their CPU time in prortional to the "benching" small WU.

Now here is the question: that benching WU will eventially need to be run at WCG on some "trusted" machine and it's claimed credit used to estimate my and everyone else granted credit thereafter.

Will it be a netburst machine, intel, AMD, windows, linux,...etc. The claimed credit on this machine will vary depending on it's specs exactly like all the variations you see in the claimed credits for THE SAME WU we crunch in a quorum. Back to square one.

STEvil

11-11-2006, 07:17 PM

It will not be calibrated against a standard machine. It is there to calibrate your machine against a known work load to produce a productivity multiplier. This multiplier is relative to the performance of your machine and does not need to be calculated on a trusted platform. This is the exact same as "DBENCH" but actually effects your points produced rather than just showing you how your machine performs on the given work load the DC project is supplying.

Higher end the machine, the higher the multiplier you're going to get. If a machine is penalized due to architecture or setup (net burst, amd-xp, amd-64, shared L2, windows, linux, etc) it will be due to poor coding of the DC app or a limitation of your OS or CPU. Those can only be fixed by the DC project's program coders.

meshmesh

11-11-2006, 08:42 PM

Ok. By numbers. Say a fast C2D and a slow AMD attach to faah. They download and run a small WU that represent the different types of operation of faah (not easy to get a WU to represent the percentage of different operations, but ok for now).

C2D finishes in 10 minutes, AMD in 20. C2D is twice as fast.
They download different WUs. C2D finishes in 2 hours, AMD in 4.

The benchmark is accurate and representative of machine setup.
The awarded BOINC credits of the C2D will be now be equal to the AMD. Consistency acheived. Twice as fast in half the time. Same work. So far so good.

Question: what is exact value of the BOINC credit that will be awarded to the C2D?

Haltech

11-11-2006, 10:27 PM

Why not a consolidated CPU map? Each WU is tested on each available cpu arch..

K6 P1/PII multiplier of 5
P3 & K7 Multiplier of 6
P4W & K7 Socket muliplier of 7
P4N & AXP muliplier of 8
P4 Last Gen & A64 multiplier of 9
C2D & K8 multiplier of 10

It doesnt matter how many cores you have. Since 2-4 WUs will be completed at a time.

Latest technology always gets the highest multiplier.. Take the time in minutes that each arch finishes a work unit and multiply it by the the multiplier. all a DC project has to do is update the CPU tables when the newest thing arrives.

STEvil

11-11-2006, 11:27 PM

Mesh - that would be an arbitrary value based on the multiplier. The multiplier is not an exact number and will vary per machine within architectures because no two machines perform exactly the same. WU's have a set value and do not influence final score. Run time is what will be scored. 4 hours at 40 points/hour = 160 points. 4 hours at 20 points per hour = 80 points (C2D and AMD respectively, random numbers..). The multiplier is the points per hour earned.

Haltech - because the latest technology doesnt always perform better and the code doesnt always run the fastest on the newest architecture.

meshmesh

11-12-2006, 01:51 AM

Ok. Sounds good so far. We are on the same wave length.

But isn't this what BOINC does in an indirect way?

C2D is fast, benchmark = 4 bananas
AMD is slow, benchmark = 2 bananas

So WU credit = benchmark * Constant * CPU time

Set constant to arbitary number say 10.

C2D WU credit = 4 * 10 * 2 hrs = 80 credits
AMD WU credit = 2 * 10 * 4 hrs = 80 credits

Consistency and fairness guaranteed.

After running for 4 hours:

C2D 2 WU finished each at 80 ==> 160 credits in 4 hours
AMD 1 WU fiished at 80 =======> 80 crdits in 4 hours.

That is correct and reflect machine strength. Ok.

At the end of the day it is the same outcome. It is just a difference in terminology. That said, your idea has some important strength not found in BOINC. You use a WU as a benching test rather than flops and iops. This is excellent because it actually benches the machine using the specific set of instructions that are relative to the project application. The result of this will be total consistency, ie: every machine will earn (no need to claim any more) exactly the same amount of credit if they crunch the same WU irrespecive of OS, CPU, etc...and the discrepancy we see will disappear.

There are two small problems:
1) making a sample benching WU truly representative is not very easy. Some WUs will behave differently than others (not in time spent) but in operations mix. may introduce some small disrepancy although much better than BOINC. As long as they keep sending new benchmarking WUs everytime they change the application code it will be very accurate.

2) you lose completely the inter project comapability. Every project runs its own WU benchmark and assigns it's own arbitrary scaling value. There is nothing wrong with that. Boinc could have been envisioned as purly a tool for WU distribution but should not have been used for credit or benchmarking in the first place. This means, that 1000 credits of WCG has nothing to do with 1000 in QMC. In fact a project may elect to save themselves the hassle and run no WU bench at all, and gives no credit. Fine.

We end up with a situation where each project has it's own ranking chart and that's it. No BOINC combined statistics. Like the WCG points chart, or F@H that has nothing to do with BOINC. When I think about it, it actually is a good thing and poeple join projects they like based on merit , compete in it, then move on. Unfortunately this is not what is happening now.

Seeing that the project leaders and Berkley are insisting on a combined interoperable scoring system, the BOINC built in bench will be used to score all projects, even though we agree it does not measure the strength of the machines acurately. SO a mahine may bench high and take more time to complete the WU, thus ending claiming higher that it is supposed to and vice versa. Thus the mess we have.