MMM
Results 1 to 13 of 13

Thread: WCG tasks die "Task xxx exited with zero status but no 'finished' file"

  1. #1
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242

    WCG tasks die "Task xxx exited with zero status but no 'finished' file"

    Every few hours, all of the WCG science tasks running in BOINC die simultaneously with the message in the title, and then restart automatically. This has been happening every few hours for some days. Not a single task (WU) has returned an error or invalid result.
    The system almost freezes during the turmoil.
    It has been happening only since starting to run DDDT2 tasks (4 of), but also happened at least once running 4 x "c4cw" Beta WUs. The machine has run a mix of FAAH/HFCC for months, without one hiccup.
    DDDT2 tasks reserve 500MB of pagefile each (VM size in task mangler), and I thought that might be the problem, but it's got 4GB pagefile allowance, and you get a different effect if you bump into that limit when suspending a task. (Been there, done that. Fun. The error logs have the right diagnostic messages too.)
    Physical RAM should not be an issue

    I thought it might be my overclock settings, as I suspect that the DDDT2 program (CHARMM) is more critical than the other WCG science programs. I've upped Vcore and Vmch. Decreasing FSB speed 1 MHz at a time does not seem to alter the frequency at which this is happening, though I haven't backed right off. It's only happening on 1 machine - the others are OK.
    Prime 95 (1 x small, 1 x InPlace, 1 x Custom 513MB, 1 x Custom 97MB) is stable last time I tried it.

    Has anyone else seen this?

    [Am posting here, not WCG, because of the overclocking]
    -----------
    Q9650, 448x9 = 4.03GHz, Vcore = 1.24,
    Asus P5Q dlx, Vmch = 1.34, Vtt = 1.24
    2 x 1GB Corsair PC2-8500C5, 5:6 (1074MHz), 5-5-5-13 PL=8, 2.06V
    500GB WD green, no video.
    XP-64 SP2, all updates applied. BOINC 6.2.19, 32-bit.

  2. #2
    Xtreme Cruncher
    Join Date
    Mar 2009
    Location
    kingston.ma
    Posts
    2,139
    Do you have the BOINC folders *excluded* from your AV?

  3. #3
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    > Snow Crash: Do you have the BOINC folders *excluded* from your AV?
    Nil antivirus. The only stuff that comes into this machine is from WCG, or it has come in via other machines in my network and has been checked there, so I don't think I need one. The machine normally runs headless, though it's got an X300 vid card in it at the moment, and I maintain some non-WCG software on it.

    The only things that I've changed in the system recently are:
    • Installed all of MS's OS patches from August Black Tuesday.
    • Installed Spybot S&D 1.6.2 + latest malware database and ran it manually. System was clean.
    • Updated VLC media player from v1.1.1 to 1.1.2
    • Updated Java runtime from Platform SE 6 U20 to U21
    • Today installed and ran ftp://time-a.nist.gov/pub/daytime/nistime-32bit.exe. Today discovered that Windows time autoupdate doesn't work on this machine and the diagnostics are useless. Before I installed nistime, I synchronised time manually. It was 24sec slow. In view of other posts re these BOINC errors, this is suspicious. Suggestions appreciated re. this, too ...
    • Today disabled Windows autoupdate of system time by clearing the tickbox in "Adjust Date/Time" Internet Time tab.
    • Today disabled Windows Time service using Admin Tools >> Services.
    • Today disabled Drive Indexing on local HDD partitions.
    • Today stopped and restarted BOINC service & manager.
    • Killed aacenter.exe (Asus power management daemon)
    • Tried running 2 x HFCC with 2 x DDDT2 tasks
    • Ran a virus scan of HDDs using Avast! 4.8 with current virus database
    • After seeing a suggestion in a forum thread Re: temporary short freezes after installing new hdd , in CtrlPanel Power Options, I set Power Scheme = Always On, which sets Turn off hard disks = Never.
    • Ran the HDD performance diagnostic program HDDScan on the HDD, but there were no unreadable sectors or sectors that were readable only after considerable delay. Slow sectors could cause Windows to hang, waiting for the HDD driver to time out.
    • Shut down to mains power off & restarted. WCG task exits seem to be less frequent - about 1 per 2hrs vs about every hr before. I also observed Windows going ga-ga (GUI very slow, CPU usage % erratic, System Idle Process % high), but with the WCG tasks surviving. This may have been happening ever since Windows installation, but with me not noticing it.
    • Uninstalled BOINC 6.2.19 32-bit protected application, then installed BOINC 6.2.19 64-bit as an ordinary user program. (Not without some drama). Running with boinc.exe assigned "above normal" priority instead of its default of "normal". WCG tasks have exited 1 time in 8hrs, indicating a probable improvement. I also noticed one Windows ga-ga event which did not cause the WCG WUs to crash.


    For a firewall, I rely on the one in my Netgear DG834v3 router.

    Thanks for the AV suggestion. Please keep thinking ...
    Last edited by BlindFreddie; 08-20-2010 at 10:53 AM.

  4. #4
    Xtreme Cruncher
    Join Date
    Jun 2007
    Location
    SK, Canada
    Posts
    836
    Here's a good explaination on what it means and things to try: http://boincfaq.mundayweb.com/index....116&language=1 Since you say it's happening every few hours, I'd try turning time sync off first.
    i7 3970X @ 4500MHz 1.28v
    Asus Rampage IV Extreme
    4x4GB Corsair Dominator GT 2133MHz 9-11-10-27
    Gigabyte Windforce 7970 OC 3-way Crossfire
    Windows 7 Ultimate x64
    HK 3.0-MCP655-Phobya 400mm rad
    Corsair AX1200i
    Sandisk Exrtreme 240GB
    3x2TB WD Greens for storage
    TT Armor VA8003SWA





  5. #5
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    > Fallwind: I'd try turning time sync off first.
    Done**. Also dropped FSB to 447MHz, RAM to 5-5-5-14. Killed Nero Media daemons & have disabled them, using Chameleon Startup Manager 3. Still happening.
    ** Not done, just tickbox cleared in "Adjust Date/Time". @ 12:20am discovered in the Services tool that "Windows Time" still had "started" status. Now killed & its startup disabled.

    More info:
    Each Task-exited message is followed by the helpful suggestion:
    If this happens repeatedly you may need to reset the project.
    This message in mentioned in Fallwind's linked boincfaq.
    Most recent occurrences (local time), from BOINC messages tab:
    Day 1 - 4:37am, 5:20am, 6:20am, 7:34am, 8:59am, 11:57am, 1:39pm, 3:15pm, 5:03pm, 5:55pm, 6:42pm, 7:56pm, 8:59pm, 9:47pm, 10:33pm,
    Day 2 - 12:03am, 12:49, 12:50, 1:45am, 2:35, 3:17, 5:15, 9:27, 10:20, 11:53am, 12:47pm, 1:44, 2:43, 3:43, 4:37, 5:31, 6:47, 7:18, ...
    At Day 1 - 9:47pm & 10:33pm, we were running 3 x DDDT2 + 1 x HCMD WUs.
    Day 2 - 12:49 and 12:50am were after re-starting BOINC @ 12:39am.
    There is no apparent association with BOINC file transfer activity in the problem machine or in the machine that piggybacks the connection via the bridged 2nd Ethernet port.

    [Edit]: The boincfaq Item 4 suggests disabling Drive Indexing. Done. Another suggestion there is to upgrade chipset drivers, especially IDE & SATA controllers. I may try this, but they were OK before.
    I'd try Linux, but don't know about networking with it. I have the 2nd ethernet port on the m/b bridged to the 1st port, so that another machine piggybacks onto the network connection and I can run 5 machines on a 4-port router.
    [Edit 2]: Just found a Sekerob's Fabulous FAQ entry on the subject: "Zero Status" & "If this happens repeatedly...." Messages . Main suggestions are to disable system time updates. (Done, but not helping).
    Failing that, shut down BOINC & restart it. Will try. Done.
    Failing that, Reset the project using the button in the BOINC Projects tab. Won't do that, as all 3 days' of DDDT2 tasks in the cache would be lost (!) I may disallow new tasks and reset when the work is done.
    Glad I don't have any DDDT2 Type A's as they checkpoint less aften than the restarts are happening.

    [Edit 3]: At 12:45am, BOINC Manager GUI would not scroll the messages, so I switched to Windows Task Manager. The WCG science programs were still running, but the CPU usage in all cores was flickering down & up. It recovered to 100% for about 1 sec, then the WCG programs crashed & restarted. WTF??!!

    [Edit 4]: I found a thread dealing with this error in the WCG forum and posted a link to here there. CA Sekerob has responded already & states "anything that virtually monopolizes the CPU time for > 30 seconds can cause this". Something other than BOINC is trying to hang the system, showing up like the delayed scrolling mentioned in my Edit 3, and I don't know what it is. A driver, maybe?

    [Edit 5]: Sorted Task Mgr Processes tab on CPU usage (reverse order) so I can see if a strange process starts hogging CPU time. WCG tasks (2 x DDDT2, 2 x HFCC) are sitting happily @ 25% each. A strange one, aacenter.exe had used 38 sec CPU time, but on 0% then. It seems that it can give CPU-hogging problems, so I killed it for now using Task Manager.

    [Edit 6]: The problem is not aacenter.exe and the WU crashes are still happening. One crash happened during the 2 x DDDT2 + 2 x HFCC run. New is that I have observed several crashes in Windows Task Manager and have captured screen shots leading into a crash here (crash 1) and pagefile usage thru a crash (crash 2) here. The Processes tab during crashes shows that the System Idle Process is assigned the difference between 100% and the total CPU usage. The CPUs seem to be waiting for something , but what?

    Will keep you posted.

    Keep those cards & letters coming in, folks ...
    Last edited by BlindFreddie; 08-18-2010 at 02:46 AM.

  6. #6
    Xtreme Cruncher
    Join Date
    Oct 2007
    Posts
    1,638
    I would try uninstalling boinc and deleting the boinic data directory and then reinstalling; start from scratch and see if that helps. You will lose your WU's in progress and your computer will show up as a new host.
    XTREMESupercomputer: Phase 2
    Live up to your name - November 1 - 8
    Crunch with us, the XS WCG team

  7. #7
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    Thanks, trn, but since it's taken about 5 months to download 3 days of DDDT2, I think I'll run it off before resetting the cache.

    I may swap to another version of BOINC, eg the 64-bit one, but keep the same data by copying it out before the install & copying it back in after, but I don't think that BOINC is the problem - more likely hardware or OS. After all, the people in Redmond WA still consider interrupts and multitasking sinful, I'm sure. You should have 8+3 uppercase filenames, and poll the keyboard, otherwise the nearby Mt St Whatsit might explode. Look what interrupts did to it a few years back.

    BTW, I think that if you reinstall BOINC from scratch, WCG recognises the computer by its MAC address and/or OS installation (or something like that) and will reassign the same device name to it. WCG also recognises changes in a device's name.

    I have shut down BOINC (in admin Tools >> Services) & re-started it.
    I also thought I had killed off the Windows Time service earlier, but discovered in Services tool that it was still bright-eyed, bushy-tailed & awake. So I nailed it & have disabled its startup. I'll try nistime-32 instead.

    [Will edit & update my earlier post, for new people coming to this thread].

  8. #8
    Xtreme Cruncher
    Join Date
    Jun 2007
    Location
    SK, Canada
    Posts
    836
    If you re-install BOINC, WCG will assign your machine a new device ID unless you copy over your old BOINC data folder from the previous installtion. Thats where all the information identifying your machine is kept. MAC address or OS installation have no bearing on this matter. With all the issues you have been having, I would hit the "no new tasks" button now, let your current WU cache crunch through, hit update to make sure they all get reported, uninstall BOINC through add/remove programs, go in and delete the 2 BOINC folders then re-install a new version.
    i7 3970X @ 4500MHz 1.28v
    Asus Rampage IV Extreme
    4x4GB Corsair Dominator GT 2133MHz 9-11-10-27
    Gigabyte Windforce 7970 OC 3-way Crossfire
    Windows 7 Ultimate x64
    HK 3.0-MCP655-Phobya 400mm rad
    Corsair AX1200i
    Sandisk Exrtreme 240GB
    3x2TB WD Greens for storage
    TT Armor VA8003SWA





  9. #9
    V3 Xeons coming soon!
    Join Date
    Nov 2005
    Location
    New Hampshire
    Posts
    36,363
    Quote Originally Posted by fallwind View Post
    If you re-install BOINC, WCG will assign your machine a new device ID unless you copy over your old BOINC data folder from the previous installtion. Thats where all the information identifying your machine is kept. MAC address or OS installation have no bearing on this matter. With all the issues you have been having, I would hit the "no new tasks" button now, let your current WU cache crunch through, hit update to make sure they all get reported, uninstall BOINC through add/remove programs, go in and delete the 2 BOINC folders then re-install a new version.
    I got a few of these myself this past weekend..
    Got credit for all..
    Crunch with us, the XS WCG team
    The XS WCG team needs your support.
    A good project with good goals.
    Come join us,get that warm fuzzy feeling that you've done something good for mankind.

    Quote Originally Posted by Frisch View Post
    If you have lost faith in humanity, then hold a newborn in your hands.

  10. #10
    Xtreme Cruncher
    Join Date
    Jun 2007
    Location
    SK, Canada
    Posts
    836
    It's not an issue if it only occurs rarely. BlindFreddie's machine is doing it many times per day and it wastes alot of crunching time.
    i7 3970X @ 4500MHz 1.28v
    Asus Rampage IV Extreme
    4x4GB Corsair Dominator GT 2133MHz 9-11-10-27
    Gigabyte Windforce 7970 OC 3-way Crossfire
    Windows 7 Ultimate x64
    HK 3.0-MCP655-Phobya 400mm rad
    Corsair AX1200i
    Sandisk Exrtreme 240GB
    3x2TB WD Greens for storage
    TT Armor VA8003SWA





  11. #11
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    See updates edited into my post #5 above, including screenies of Task Manager Performance tab during the events in question.
    I think that Sekerob has described the mechanism of the WU restarts, ie that BOINC is killing them after they do not communicate after about 30sec, but the reason for the drops in CPU activity remains unexplained.

    I have disallowed new tasks in BOINC manager.
    I ran a virus scan of HDDs with Avast! 4.8. Forgot to suspend BOINC, but it was unaffected.
    I may try:
    - checking the BIOS IDE/SATA recognition parameters, which may have been altered after overclocking failures. Done. It was the IDE/AHCPI \
    . setting for SATA HDDs that I had in mind. Best to be IDE, and was.
    - replacing BOINC 6.2.19 32-bit with the 64-bit one, while keeping the data. I would install as an ordinary user process.
    - rebooting Windows into safe mode, with networking
    - in normal windows mode, disabling drivers that are not needed
    - if I knew I could easily set up networking, including bridging the ethernet ports and getting remote access, I'd try Linux.
    Last edited by BlindFreddie; 08-18-2010 at 05:33 PM.

  12. #12
    Xtreme Member
    Join Date
    May 2008
    Location
    Sydney, Australia
    Posts
    242
    Progress report. See edits to list in my post #3 above, last 3 entries.

    The startup of the new BOINC installation was a little dramatic. I copied the snapshot of the BOINC data directory that I made after shutting down the 32-bit protected-mode BOINC, into the data directory for the user-mode 64-bit BOINC, and fired it up. Big mistake. Every job in the work cache tried to start up but crashed immediately, showing Computation Error in the tasks tab. They went down like a row of dominoes.
    To easily find what was in these tasks' error logs, I uploaded & reported them to WCG. All said "no child process" (not the exact words).
    New tasks fetched then also failed (!)
    I Detached and re-Attached to WCG, and the new tasks are running happily. I think that the WCG science programs were downloaded again.

    I assume that the problem was that as an ordinary user's task, the new BOINC did not have execute permission to run the WCG science program that I'd copied from the previous protected-mode setup. Comments, anyone?

    I have also removed all files from the 2nd partition (D) of the HDD.

    Next, I think a bit of fdisk-ing, mkswap-ping and mkfs -t ext3 -ing, but your suggestions for the XP-64 setup are still most welcome ...
    Oh, and b-a-r-m-p ! ...
    Last edited by BlindFreddie; 08-20-2010 at 11:01 AM.

  13. #13
    Xtreme Cruncher
    Join Date
    Jun 2007
    Location
    SK, Canada
    Posts
    836
    IIRC you can't copy and re-use your BOINC data folder when switching between 32 and 64 bit. I know this is true when switching Windows versions, but apparently it's true when changing BOINC versions as well.
    i7 3970X @ 4500MHz 1.28v
    Asus Rampage IV Extreme
    4x4GB Corsair Dominator GT 2133MHz 9-11-10-27
    Gigabyte Windforce 7970 OC 3-way Crossfire
    Windows 7 Ultimate x64
    HK 3.0-MCP655-Phobya 400mm rad
    Corsair AX1200i
    Sandisk Exrtreme 240GB
    3x2TB WD Greens for storage
    TT Armor VA8003SWA





Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •