PDA

View Full Version : Beware of Stuck Downloads



DAK1640
07-26-2009, 06:15 AM
I have observed a number of my rigs with stuck downloads. Now I just got my 5th stuck download. I simply abort the stuck download & re-update which seems to cure it, but what a PITA...:mad:

shadowwind
07-26-2009, 06:34 AM
yup it is a pain i noticed it last night i was thinking they was having second thoughts on sending me Wus for all the errors i was putting out.

Chumbucket843
07-26-2009, 10:07 AM
right now i have 8 WU's retrying guess their servers crashed

Naja002
07-26-2009, 04:27 PM
Yeah, this seems to be something new.

My quote from the other thread:


You are in the transfers tab. I have seen it, but it's only lasted for seconds...and I've only seen it very recently, because my network is down and I am having to manually UL/DL 2 rigs. My guess is that the server is just busy for a minute...maybe a bit longer. You should not need a reboot. Clicking update should do it.

I'm not experiencing it ATM, but with the network down...I only have one rig to check. The others I do manually, but I've not had this long term "retry". Maybe a server issue related to the 6.6.38 release. :shrug:

Plus it is the weekend....


EDIT: I just manually ULed and DLed 2 WUs from 1 rig w/o any issues or delays. So it may have been a temporary issue. If things go wrong on the weekend--it oftentimes takes them longer to get it fixed....sometimes until monday morning!

Paladin
07-26-2009, 09:18 PM
I had it on two rigs. Aborting once on one rig got a successful WU d/l, other rig had to abort two back-to-back before one got through. Probably a weekend issue (no admin monitoring) with excess Goose feathers in the server gears.

INFRNL
07-26-2009, 09:38 PM
I do not think it has any relation to the weekend; This has been an everyday issue on one of my rigs all week that I can remember.....sorry memory not as good as it once was. This issue only happens on my Vista rig though that I am aware of. my win7 rigs seem to hold up just fine.

I dont know why you cant manually reload it, but if you abort the downloads; it works fine at getting new ones in place of the stuck downloads...no sense in this and pissing me off. no telling how much work we lose out on with this issue

123bob
07-26-2009, 10:18 PM
Thx for the heads up. I'm seeing it too on a file starting with "91-KASHIF_HIVPR". Will try to jiggle it loose....

That rig runs 64 bit ultimate with BOINC 6.6.36. I have not seen this yet on the other two running BOINC 6.6.20. 260-216s in all three. I can check vid drivers if need be....

Bob

123bob
07-26-2009, 11:46 PM
OK, on farm-11 I had a stuck WU as reported above. Reset machine, no help. Aborted stuck task. I got two WUs in reply. One is p195000-IBUCH...., the other is 34-GIANNI_DOPb.... SO, these aren't the same as reported above.

Is it a project problem?

MikeB12
07-27-2009, 12:04 AM
it's gotta be a server thing or corrupt download packet. I have not seen one in the last couple weeks, but have seen then before.

you probably dont need the reboot, I posted that in my sequence just cuz I did it. probably not needed. I just did it to reboot in case that helped. you can probably just abort the problem dl line and update from the projects tab manually and fix it.

.02

MikeB12
07-27-2009, 12:18 AM
couple of gpugrid forum threads posted recently on this error: looks like it's a problematic HIV wu and they cancelled them yesterday, but there are remnants left. so we have to abort them manually until the remnants are gone.

http://www.gpugrid.net/forum_thread.php?id=1231&nowrap=true#11292
http://www.gpugrid.net/forum_thread.php?id=1231&nowrap=true#11292
FROM GPUGRID users:

I have had two issues today where a HIV workunit (635688 and 582818) stopped downloading with a http error, on all files in the package afaik. Even after letting it run its course it did not download. Eventually I had to cancel the workunits to keep going. Is this a workunit related issue ? The connection with the server was fine, other packets right before it and after it downloaded fine. What is the best course of action in cases like these ?

Two more cases today. I have noticed that the issue seemingly is caused by THREE download threads being started simultaneously whereas normally only TWO threads are allowed. Hope this provides some insight into the issue.

26/07/2009 6:55:32 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-pdb_file: HTTP error
26/07/2009 6:55:32 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-pdb_file
26/07/2009 6:55:32 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-par_file has wrong size: expected 8402771, got 0
26/07/2009 6:55:32 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-par_file
26/07/2009 6:55:33 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-par_file: HTTP error
26/07/2009 6:55:33 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-par_file
26/07/2009 6:55:33 PM GPUGRID [error] File 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc has wrong size: expected 872, got 0
26/07/2009 6:55:33 PM GPUGRID Started download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc
26/07/2009 6:55:34 PM GPUGRID Temporarily failed download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc: HTTP error
26/07/2009 6:55:34 PM GPUGRID Backing off 1 min 0 sec on download of 76-KASHIF_HIVPR_dim_ba5-24-myfile.enc

FROM :
TG
Forum moderator
Project administrator
Project developer
Project scientist

Posted 26 Jul 2009 18:20:59 UTC - in response to Message 11344.
We stopped some HIV WUs two days ago, but they left behind remnants. Please abort them at will.
We'll try to cancel them server-side asap, thanks for your patience.

MikeB12
07-27-2009, 02:02 AM
ok, I just got one of these.

1. abort task from "task tab" (highlight the task and click abort)
2. update project from "project tab" (highlight gpugrid and click update)

and it goes away. no reboot needed.

DAK1640
07-27-2009, 02:18 AM
MikeB12 is da man!!!!:up: Sorry MTM:rofl:

Naja002
07-27-2009, 09:10 PM
couple of gpugrid forum threads posted recently on this error: looks like it's a problematic HIV wu and they cancelled them yesterday, but there are remnants left. so we have to abort them manually until the remnants are gone.


Glad You've located the issue, Mike! :up:
My network is back up now, but prior I UL/DLed 5 WUs with zero issues. Hopefully the remnants will be cleared before too much longer. :up:

This stuff happens, Folks. Not this stuck DL issue, but "stuff" generally. Gpugrid is still considered Beta, but none of the projects are perfect--even WCG, as well as it runs, has unannounced, unexpected issues (rare, but they are there.) Keep in mind that the future WUs are a direct result of what we are processing now. I know that they do test them, but how much--I'm not sure.

With the CUDA upgrade coming (Thurs?)--EXPECT problems. Maybe there won't be any or just a few...we'll all just have to wait and see. I have no idea how this upgrade is going to effect individual hosts, but keep in mind that there may be project problem, or problems on your end. So, put your seatbelt on, folks--there may be a lot of "fun" coming real soon.....maybe not. Where's the "fingers-crossed" smilie? :p:

http://i50.photobucket.com/albums/f347/Naja002/Smilies/fingerscrossed-1-1.jpg