Results 1 to 18 of 18

Thread: WCG running on Ubuntu. All tasks stop at 5 seconds of run time

  1. #1
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53

    WCG running on Ubuntu. All tasks stop at 5 seconds of run time

    The title pretty much sums it up. I installed Ubuntu Server (12.04.3) onto a few VMs today and followed this guide to install boinc and such.
    After editing the remote_hosts.cfg I configured it through the boinc manager on my windows machine.

    The boxes downloaded a few WUs and began working, or so it seemed. After a bit I noticed all the tasks were still at the bottom of the list in BonicTasks. Which is when I noticed the tasks that did complete downloading, are stuck at 5 seconds of run time.

    Boinc Manager says they are 'Running' but the elapsed time isn't going up and there doesn't seem to be any errors in the log. My googlefu didn't find anything related to this and I am stumped.


    Screenshots coming soon
    CRUNCHERS
    Main 920@3.99GHz 12GB DDR3 NVIDIA 480GTX Back-UPS NS 1250
    Dell C6100 XS23-TY3 4 x NODES 2 x 2.4GHz E5530 24GBper node,Total 8 x 2.4GHz E5530 96GB
    Dell C6100 XS23-TY3 4 x NODES 2 x 2.13GHz L5630 16GBper node,Total 8 x 2.13GHz L5630 64GB
    DellServer 2x L5420 (4core) 8GB ECC DDR2
    Server 2x 2GHz Sossaman's 4GB ECC DDR2 (RAID 6 4x2TB Samsung F4 HD204UI 2x2TB WD Red) - under loonym
    SkillsServer 2x AMD 6-core 6GB ECC DDR2 XenServer || Zipples AMD 940 4core
    Sammy 3 2x 2GHz Sossaman's/4GB || Sammy 4 2x 2GHz Sossaman's 1GB ECC DDR2
    PART TIME CRUNCHERS
    Cruncher 4 950@3.06GHz 3GB DDR3 EVGA 275GTX ASUS 260GTX EVGA260GTX || Sammy 5 Supermicro 2x Dual-Core Xeon 2GHz/4GB/80 Server 6014L-M
    Athlon II x2 3 GHz 4GB DDR3 320GB RAID1 + 320GB Backup
    CSHS_CISCO CRUNCHERS http://stats.free-dc.org/stats.php?p...cg&name=828782
    5x (AMD FX 8120 (8core) 8GB DDR3 Running through XenServer ||AMD FX 8120 (8core) 4GB DDR3 2xRadion Cards Intel i7 950 4GB DDR3
    http://folding.extremeoverclocking.c...e.php?u=458094 http://allprojectstats.com/su1429018x5--1-2.png

  2. #2
    Xtreme Addict Evantaur's Avatar
    Join Date
    Jul 2011
    Location
    Finland
    Posts
    1,043
    Sounds like boinc doesn't have writing rights to "slots" folder

    I like large posteriors and I cannot prevaricate

  3. #3
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    Make sure the machine isn't suspending or something silly like that. Also be sure you have all the required libraries installed. You might be able to find more information in the BOINC event logs accessible through the BOINC manager Advanced tab.

    [SIGPIC][/SIGPIC]

  4. #4
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    Not suspended or anything like that and from what I can tell the boinc user does have r/w rights to the slots folder. And the message log doesn't show any errors.

    Another odd thing I noticed in testing.
    I took a new ubuntu VM and installed Boinc to it, changed the remote_hosts file and restarted the client via sudo /etc/init.d/boinc-client restart. After it restated I connected via Boinc Manager and attached the WCG project to it, it downloaded a Clean energy WU and ran properly for a good 5 minutes, then I restarted the VM and once I reconnected Boinc Manager, that WU had reset to 0% and ran for 6 seconds till it hung.
    Last edited by Nirvash; 09-19-2013 at 09:39 PM.

  5. #5
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    Clean Energy work units are funny like that. They REALLY don't like being suspended and cleared from memory.

    It might help if you could open a terminal on the BOINC machine and run top. It will show if anything is hung and how much memory is being used.

    Ubuntu are also getting more and more self-important and fiddling with things that don't need to be changed. In an open terminal enter the command, "ldd boinc" and "ldd boinccmd" and make sure all the dependencies are met properly.

    [SIGPIC][/SIGPIC]

  6. #6
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    The ldd commands just return not founds.
    Code:
     ldd boinc
    ldd: ./boinc: No such file or directory
     ldd boinccmd
    ldd: ./boinccmd: No such file or directory
    Top reads out as following, so it does look like the task is 'running'
    2436 boinc 39 19 331m 95m 8784 R 99.3 9.7 9:16.49 wcgrid_cep2_qch

    But it obviously isn't working properly, as I'm pretty sure this should show a checkpoint or fraction done.

    Code:
    boinccmd --get_tasks
    
    ======== Tasks ========
    1) -----------
       name: E215643_751_C.40.C34H18N2OS2Si.02098189.0.set1d06_2
       WU name: E215643_751_C.40.C34H18N2OS2Si.02098189.0.set1d06
       project URL: http://www.worldcommunitygrid.org/
       report deadline: Mon Sep 30 00:20:03 2013
       ready to report: no
       got server ack: no
       final CPU time: 0.000000
       state: 2
       scheduler state: 2
       exit_status: 0
       signal: 0
       suspended via GUI: no
       active_task_state: 1
       app version num: 640
       checkpoint CPU time: 0.000000
       current CPU time: 0.000000
       fraction done: 0.000000
       swap size: 324718592.000000
    Should I attempt trying an older version of ubuntu? Or a different flavor of linux?
    Last edited by Nirvash; 09-19-2013 at 10:12 PM.

  7. #7
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    I didn't say to enter ldd ./boinc, I said to use ldd boinc
    Code:
    ldd boinc
    with no dot slash

    The dot slash prefix is used to execute a script, you're check for binary dependencies of files that are in your path. Specifically they are in your /usr/bin directory.

    Ok, I just re-read your post. Try using the full path:
    Code:
    ldd /usr/bin/boinc
    Last edited by D_A; 09-19-2013 at 10:57 PM.

    [SIGPIC][/SIGPIC]

  8. #8
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    The output to that command will look something like this:
    michael@Slackboxen:~/BOINC$ ldd boinc
    ./boinc: /lib64/libssl.so.1.0.0: no version information available (required by ./boinc)
    ./boinc: /usr/lib64/libcurl.so.4: no version information available (required by ./boinc)
    ./boinc: /lib64/libcrypto.so.1.0.0: no version information available (required by ./boinc)
    linux-vdso.so.1 (0x00007fff13bff000)
    libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007f18d5376000)
    libssl.so.1.0.0 => /lib64/libssl.so.1.0.0 (0x00007f18d510c000)
    libcrypto.so.1.0.0 => /lib64/libcrypto.so.1.0.0 (0x00007f18d4d2f000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f18d4b2b000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f18d4916000)
    libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007f18d45dd000)
    libXss.so.1 => /usr/lib64/libXss.so.1 (0x00007f18d43da000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f18d41be000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f18d3ebc000)
    libm.so.6 => /lib64/libm.so.6 (0x00007f18d3bc1000)
    libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007f18d39ac000)
    libc.so.6 => /lib64/libc.so.6 (0x00007f18d35eb000)
    libidn.so.11 => /usr/lib64/libidn.so.11 (0x00007f18d33b9000)
    liblber-2.4.so.2 => /usr/lib64/liblber-2.4.so.2 (0x00007f18d31ab000)
    libldap-2.4.so.2 => /usr/lib64/libldap-2.4.so.2 (0x00007f18d2f62000)
    librt.so.1 => /lib64/librt.so.1 (0x00007f18d2d5a000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f18d55fe000)
    libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007f18d2b3c000)
    libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007f18d2938000)
    libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00007f18d2733000)
    libXext.so.6 => /usr/lib64/libXext.so.6 (0x00007f18d2522000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f18d2307000)
    libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007f18d20ed000)
    Ignore the first three lines, they're from me doing something creative to get everything running happily.

    [SIGPIC][/SIGPIC]

  9. #9
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    Code:
    ubuntu.com/
    vmcruncher@VMCruncherC6100N2:~$ ldd /usr/bin/boinc
    linux-vdso.so.1 =>  (0x00007fffbd5fe000)
    libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007ff3d4074000)
    libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007ff3d3e16000)
    libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007ff3d3a3a000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff3d3836000)
    libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007ff3d361f000)
    libXss.so.1 => /usr/lib/x86_64-linux-gnu/libXss.so.1 (0x00007ff3d341a000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007ff3d31fd000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ff3d2efd000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff3d2c00000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ff3d29ea000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff3d262b000)
    libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007ff3d22f5000)
    libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007ff3d20c2000)
    liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007ff3d1eb4000)
    libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007ff3d1c64000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff3d1a5c000)
    libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007ff3d181e000)
    librtmp.so.0 => /usr/lib/x86_64-linux-gnu/librtmp.so.0 (0x00007ff3d1603000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff3d42e0000)
    libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007ff3d13f2000)
    libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007ff3d11d3000)
    libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007ff3d0fb7000)
    libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007ff3d0d9c000)
    libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007ff3d0b5d000)
    libgnutls.so.26 => /usr/lib/x86_64-linux-gnu/libgnutls.so.26 (0x00007ff3d08a1000)
    libgcrypt.so.11 => /lib/x86_64-linux-gnu/libgcrypt.so.11 (0x00007ff3d0623000)
    libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007ff3d0354000)
    libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007ff3d012c000)
    libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007ff3cff28000)
    libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007ff3cfd1f000)
    libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007ff3cfb1c000)
    libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007ff3cf916000)
    libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007ff3cf70e000)
    libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007ff3cf488000)
    libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007ff3cf1e8000)
    libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007ff3cefb3000)
    libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007ff3ced9e000)
    libtasn1.so.3 => /usr/lib/x86_64-linux-gnu/libtasn1.so.3 (0x00007ff3ceb8d000)
    libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007ff3ce97a000)
    libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007ff3ce776000)
    libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007ff3ce572000)
    libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007ff3ce348000)
    libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007ff3ce139000)
    libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007ff3cdeee000)
    libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007ff3cdc4b000)
    libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007ff3cda12000)
    Just a tad longer. And what's infuriating, is that I managed to get one VM working, but I can reproduce it, the only thing I could think of that I had done differently is in the working one, I left the hostname as ubuntu because I was being lazy.
    Last edited by Nirvash; 09-20-2013 at 12:50 AM.

  10. #10
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    I don't see how changing the host name should make any difference, especially if you reboot it afterwards. That would normally just change the name of the device in your device list. What output did you get from top?

    [SIGPIC][/SIGPIC]

  11. #11
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    [Insert dancing bananas here] I figured it out, it was apparently the setting 'assume that the bois clock is set to UTC Time' being set to off. With that set to on, stuff magically works. (Or at least it is right now)

  12. #12
    Xtreme crazy bastid
    Join Date
    Apr 2007
    Location
    On mah murder-sickle!
    Posts
    5,878
    Yes, I tend to forget about the time thing. BOINC is VERY time sensitive. If your clock is too far out it will refuse to work, abort units, and all sorts of other odd misbehaviour.

    [SIGPIC][/SIGPIC]

  13. #13
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    The odd thing is, the BIOS clock is not set to UTC, it's local time. But thank you for your help. Now to see how the new node fairs over the next few days

  14. #14
    Xtreme Legend
    Join Date
    Mar 2008
    Location
    Plymouth (UK)
    Posts
    5,279
    That's cheating.... not showing total of threads


    My Biggest Fear Is When I die, My Wife Sells All My Stuff For What I Told Her I Paid For It.
    79 SB threads and 32 IB Threads across 4 rigs 111 threads Crunching!!

  15. #15
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    Quote Originally Posted by OldChap View Post
    That's cheating.... not showing total of threads
    But it doesn't even all fit on one screen! Currently running 100ish threads

  16. #16
    Xtreme Addict Evantaur's Avatar
    Join Date
    Jul 2011
    Location
    Finland
    Posts
    1,043
    Quote Originally Posted by Nirvash View Post
    But it doesn't even all fit on one screen! Currently running 100ish threads
    what the are you running

    I like large posteriors and I cannot prevaricate

  17. #17
    Xtreme Member
    Join Date
    Oct 2012
    Posts
    448
    Quote Originally Posted by Evantaur View Post
    what the are you running
    A buttload of virtual machines?
    Desktop rigs:
    Oysterhead- Intel i5-2320 CPU@3.0Ghz, Zalman 9500AT2, 8Gb Patriot 1333Mhz DDR3 RAM, 120Gb Kingston V200+ SSD, 1Tb Seagate HD, Linux Mint 17 Cinnamon 64 bit, LG 330W PSU

    Flying Frog Brigade-Intel Xeon W3520@2.66Ghz, 6Gb Hynix 1066Mhz DDR3 RAM, 640Gb Hitachi HD, 512Mb GDDR5 AMD HD4870, Mac OSX 10.6.8/Linux Mint 14 Cinnamon dual boot

    Laptop:
    Colonel Claypool-Intel T6600 Core 2 Duo, 4Gb 1066Mhz DDR3 RAM, 1Gb GDDR3 Nvidia 230M,240Gb Edge SATA6 SSD, Windows 7 Home 64 bit




  18. #18
    Xtreme Cruncher
    Join Date
    Oct 2008
    Location
    Chicago-ish
    Posts
    53
    Quote Originally Posted by Evantaur View Post
    what the are you running
    Quote Originally Posted by yojimbo197 View Post
    A buttload of virtual machines?
    You are right, I do run a lot of VMs, but the threads the VMs run do not exceed the host machine's thread count.
    In total I have,
    3x Sammys (Two soon to be retired) 12 threads
    An AMD server board with 2x 6 cores 12 threads
    2 Dell servers, one with 6 threads, other with 64. 70 threads

    So that's 94, I'm sure I'm missing something. And come winter I probably will turn back on an i7 950 machine too.

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •