Check systems please, reports of high failure rate with some new 4.97 HBLR_x.x WU's :mad:
Rosetta Link
Printable View
Check systems please, reports of high failure rate with some new 4.97 HBLR_x.x WU's :mad:
Rosetta Link
Another new version of rosetta already? I thought 4.83 was only released like last week lol...
Looks like I have about 40 of those coming up soon - I hope to god you're wrong :(
Yikes !!
My X2 box is loaded with them starting in about 8 units :(
hmm, I'll see what the first one does
Quote:
Originally Posted by gpcola
Since most people had some 4.83 WU's waiting in line, most errors start popping up right now when they start crunching the new ones... and it doesn't look like they'll be able to fix the problem on short notice :(Code:April 7, 2006
The rosetta application was updated to include some new scientific code.
The application version numbers have been changed to be consistent with Ralph@home.
Since error rates have decreased significantly, we decided to increase the default cpu run time to 4 hours.
The previous default was 2 hours.
woke up, and had not gained any credits.
Looked at jobs, 8 had failed, using my CPU all night. #""!!!"#?`?"#¤#"arrrrrrrrrrghhh ffffffffffffffffffffffffffffffffffffffffffffffff
I had claimed 300 credits, didn't get one ####=?)==)(/&/%&%&#%#¤&#¤#¤/aaaaaaaaaaaaaaaaaaargggggggh.
Looked at time, it were up to 4 hourhs, was lost.
Looked for errors. arrrghhhhhhh
I'm really upset here fffffffff########"""""##
Ok so they just screwed it.
I'm calm, not to worry.
Easy now, yes easy now....... ok i'm totally calm........
Just checked this laptops messages, looks like every 4.97 WU failed :(
My Linux rig seemed to only have 1 4.97 WU fail though
every box has em here. and every box has been erroring since they started.
this is bs! i run a lot of machines and pay a lot of money for electricity. i wish these people would wake up to the fact that when they waste people's resources they waste their tolerance and patience.
i aborted all on 4 boxes. hit "retry communications" and just get a bunch more of the same ones.
jeez, this sh*t pisses me off.
I had just begun to feel the hunger for roasted beef, and now i get sick cows.
DAAAAAAAMMMNNNNN
Hmmm, seems I've crunched two of these already without realising - one on my X2 and the other on a P4 - both completed without fault.
[edit] actually maybe not - crunching an HBLR_1.1 right now on the P4 but it's a v4.83 not 4.97.
On a more serious note, i have stopped it, and are awaiting some confirmation about error free crunshing
Damn, I lost all overnight WUs to errors and thought it was my RAM dying - but SP2004 / SuperPi checked out ok this morning. Yes, they're HBLR_* units... http://boinc.bakerlab.org/rosetta/re...?hostid=190981
Got 2 more running on that box now, fingers crossed.
Wow, I've still got about 20 units till I get to the HBLR_1.* units, should I abort them just in case? It'll take this slow pos a few days/weeks to get to them, hopefully there will be a fix.
serlv, you're right.Quote:
Originally Posted by serlv
They should do some error check, before they send new stuff out, we are using electricity here.
two boxs down and off.
HBLRs aborted, queue empty ( aborted all HBLRs queued ) and i don't feel like DLing more garbage work that will error.
On the other two, I aborted all queued work, suspended network communications and if the two that are running error, then they get shut down, too.
Now on to the other nine *&(^$$#!!!!
Somebody PM me when they get this fixed.
Looks like I'm probably going to be shutting down all machines but two. I would have two on even if I didn't run DC projects.
I still have some 4.83 WU's on two rigs... but I'm not shutting down when I get to the 4.96 units... I am however switching all rigs over to 1 hour runs, some 4.96 WU's will surely finish with a limited runtime of 1 hour.
They shouldn't have said that :p:Quote:
The rosetta application was updated to include some new scientific code. The application version numbers have been changed to be consistent with Ralph@home. Since error rates have decreased significantly, we decided to increase the default cpu run time to 4 hours. The previous default was 2 hours.
I have one box with some 4.83s left. The one that failed lots of 4.97s last night just failed another after an hour, I'm aborting and abandoning 4.97s on all boxes until this gets fixed. (I do 4 hour runs, not gonna change that). One failed after 61 seconds. O_O
Bummer. :mad:
THus far my first two failed but after that I haven't had any failures.. hopefully I don't have to deal with alot of problems for that to be fixed.
Ok, running a 4.97 one more time, if this fail, i'll spare the electricity bill untill futher notice.
I have the same problems, was wondering what the hell was going on with my opty. Nothing but errors now while yesterday it was clocked higher at same vcore (testing what rosetta could take) and all was well.
built a new opty 165 box yesterday and here i thought it was the box, all the 4.97 wu's errored out, whew it wasnt just mine, was going to have to troubleshoot
Lucky you, I did troubleshoot, for no reason at all.Quote:
Originally Posted by odb
That makes you twice as mad. :mad: :D
from the rosetta forum, doesnt sound good to me itll be fixed right away
Quote:
Originally Posted by David Baker
Well I can go on for only 24 hours on both of my rigs, time for a pauze :p
so far i have had 3 4.97's finish fine and 1 fail.
1 that could have shown the cows respect. :DQuote:
Originally Posted by Jarrod1937
I'm trying one out, 68% no fail, but i'm shaking after 8 down.
If this one fails, i'm going to shoot those cows without any work done.
Maybe it's a secret DPC plot to load XS team members with bad WU's... :stick:
EDIT: The last few WU's I completed were 4.97 HBLR's and they gave me the usual credit.
If you still have some 4.83 WUs to crunch then maybe setting your 'Target CPU run time' AS HIGH AS POSSIBLE would be a good idea right now - that way each good WU you have will run for 24hrs and by the time you run out they will have, hopefully, sorted out the mess... ;)
WUs have been failing constantly for the last 3 hours and I have Absolutely no idea what is going on..
I'm not showing any problems here at all. I'm still on 4.83 with another 20-30 queued till it hit's 4.97..
Thanks for the heads up, will watch for this..;)
I have to add this: Don't let this hiccup get you down. The goal is to be way ahead of the DPC by the end of April. That's ALL I want to see and if I have to keep an eye on the machines over the weekend thats not so big a deal. I know if you have remote units thats impossible to do.
I guess what I'm trying to say is look at the bigger picture and don't let this get you down.
98% done FAILED that's nr 10
Another 4 hours work gone.
I can't find words AAAARRRRGGGGGGGHHHHHHH
Ahh that helped.
I just started last evening and had all WU's fail!:( I'm going to abort all units and wait for the fix!
Wow, sorry to hear that... super bad timing to join into the project :(Quote:
Originally Posted by 426hemi
Hopefully this won't be too discouraging.
Looks like my rigs have started into 4.97. Far too many to watch, so I'll just wait for dips in output... We were on course to do about 56,000 today, too. :(
If anyone wonder why i wrote 4 hours.
They are up on 4 hours with the 4.97, funny enough to prevent errors.
HAhahhhaaahhhahhaaahahhahahahaaha That's not funny
You can go into your preferences and change back to 2 hours which under the circumstances might be a good idea.Quote:
Originally Posted by Frisch
has anyone tried changing it to 8 hours or somethin higher to see if that removes the erroring? if they went to 4 to alleviate errors, going higher may actually do so.
Good idea, but i have to say that it's the last try before i wait for an update on this project.Quote:
Originally Posted by Movieman
I wouldn't dare, i mean imagine 98% on that one, and failing ouchQuote:
Originally Posted by Bloody_Sorcerer
I'm still baffled by the incredibly bad timing of releasing a new version. Why would they do something so stupid?
A) it was before a weekend.
B) it was on a weekend that David Baker was going on a family outing.
Were they that confident in this obvious massive failure of a release? I was perfectly happy with 4.83 myself.
I'm nearly at my wit's end with this project.
Well, i have set it to 1 hour.
I understand your frustration. Beleive me when I tell you that this pisses me off more than you can beleive. I've spent the last 3 weeks trying to bring in new members and to see this happen when the guys are coming in gets my Irish temper boiling!:mad:Quote:
Originally Posted by Frisch
All I can ask is to stick with this and hopefully it will fix by Monday afternoon.
Just saw this: gpcola had a great idea! Anyone with 4.83 units left, change your preferences to 10 or more hours. That will maximise the points you get on them and make them last the weekend.:D
I'm going to try to ride this problem out..
Even if I only complete ONE WU.. it is one more WU to help the group...
I have a bottle of red wine next to me, it is nearly empty at the half time i use to drink. (maybe i have set the timer down on that one too)Quote:
Originally Posted by Movieman
The only thing that wonders me is why the heck is the new version not 4.84...
Btw in case that matters, I havent had a single wu stuck or fail on linux from what I can tell...
The next wu I am calcualting is a new one, so hopefully it works.
Freak, set your time down to 1 hour with the first 4.97, the failure rate is big.Quote:
Originally Posted by Fr3ak
I have read at the Rosetta site that the linux users don't appear to be having the same problem.
I've set it to 4 hours and I am still getting a failure..:mad: Died in the last 7 minutes:mad:Quote:
Originally Posted by Frisch
Poor you:( :DQuote:
Originally Posted by nn_step
The 4.97 are 4 hours default, set it to one, i'm almost finished with an one hour, and that's my first error free for almost a day (except 2)
PS
I died on a 98% 4 hours too. ouuuuucchhh.
Anyone here of a fix yet?
They replaced 4.97 with 4.98. 4.98 is just the old 4.83 client with a newer version so it will auto update. Everything should be back to business as usual.Quote:
Originally Posted by 426hemi
They got some more work to do before they try the code in 4.97 again.
Is that version available for download at the rosetta site?
Just 'reset' to clear any bugged 4.97 WUs and download the new app and WUs. Rosetta updates itself automatically so you never have to download/install anything manually.
If you've been doing work and letting BOINC connect, you should already have 4.98. Open Task Manager and see if Rosetta_4.97 or Rosetta_4.98 is running.
Thanks, its downloading 4.98 now.
I lost more than 3 days worth of work. Argggggggggggggggggggggggggggggggggg
I aborted all the weird looking work units and now my computer is happy. I hope the WU it is working on doesnt o into its erroneous ways.
I've also had trouble since then with WUs named "7424_largescale_large" (not the actual name, but the closest I can recall) getting stuck around 50%. Aborted all instances to be safe.