PDA

View Full Version : Linux not using 100% CPU usage



trn
09-04-2010, 06:24 AM
Well last night I installed Debian 5 on a rig, I plan to set up many Linux machines for WCG crunchers. I ran into my first odd issue, CPU usage is bouncing around alot and not using 100%. I remembered Jcool had these issues with unbuntru so I just went and checked his thread (http://www.xtremesystems.org/forums/showthread.php?t=252766 see post #9 and #11) In Jcool's case his problem just magically worked itself out :shrug: And yeah, boinc is set up to use all cores at %100. I feel this is caused by other task taking prority over boinc and causing these issues. Its not just a fault system monitor also, my kill-a-watt also shows the wattage bouncing around equally confirming the problem.

Heres my screenshot of the issue: :help:
http://i399.photobucket.com/albums/pp80/trn_photobucket/deb1.png

Also, I installed Debian because it's suppose to be very stable and boinc is suppose to work on it and I think I can do a basic net install (to keep it unbloated.) But the latest version of Boinc in the apt-get was 6.2.xx!!! Quite old; id be happy with anything in the 6.10.xx range :(

Also, this system was running stable with a windows HD, now Linux is in 20 or 30 min (I'm guessing Linux needs a bump in Vcore vs win7 :eek:) Off to a rocky Linux crunching start so far :D

Anyways; anyone got any clue why I can't get a solid 100% CPU usage with boinc?

Kurz
09-04-2010, 06:38 AM
Are there any log messages?

trn
09-04-2010, 06:46 AM
Are there any log messages?

Oh yeah... I got errors :rolleyes:
And this is a hard drive install; currently its working on all CEP2 WU's with a few DDT2 in the que.

I think some of these eth3 errors might be due to the my installing Deb on a Gigabyte x58a-UD3R rev 2.0 but now its running on a rev 1.0 board; and it seems they made some minor Ethernet changes between the two revisions. I also had the same CPU usage problems on the rev 2.0 board though, but they look a little less pronounced, but that L5640 was running at stock, this one is at 3.7GHz.


Sep 4 06:29:57 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.reason
Sep 4 06:30:00 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.host_name
Sep 4 06:30:00 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.domain_search
Sep 4 06:30:00 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.nis_domain
Sep 4 06:30:00 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.nis_servers
Sep 4 06:30:00 deb1 dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth3 for sub-path eth3.dbus.get.interface_mtu
Sep 4 06:30:16 deb1 kernel: [ 72.138502] mtrr: type mismatch for c0000000,10000000 old: write-back new: write-combining
Sep 4 06:35:26 deb1 kernel: [ 622.489739] wcgrid_cep2_6.1[3880]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.564130] wcgrid_cep2_6.1[3881]: segfault at ff1fbff4 ip 80f1320 sp ff1fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.670974] wcgrid_cep2_6.1[3870]: segfault at ff5fbff4 ip 80f1320 sp ff5fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.692862] wcgrid_cep2_6.1[3866]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.708751] wcgrid_cep2_6.1[3873]: segfault at ff5fbff4 ip 80f1320 sp ff5fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.761767] wcgrid_cep2_6.1[3877]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.925665] wcgrid_cep2_6.1[3863]: segfault at ff1fbff4 ip 80f1320 sp ff1fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 622.960103] wcgrid_cep2_6.1[3883]: segfault at ff7fbff4 ip 80f1320 sp ff7fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 623.028101] wcgrid_cep2_6.1[3867]: segfault at ff5fbff4 ip 80f1320 sp ff5fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:35:27 deb1 kernel: [ 623.093664] wcgrid_cep2_6.1[3871]: segfault at ff7fbff4 ip 80f1320 sp ff7fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1248.398667] __ratelimit: 2 messages suppressed
Sep 4 06:40:27 deb1 kernel: [ 1248.398667] wcgrid_cep2_6.1[4365]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1249.757705] wcgrid_cep2_6.1[4334]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.028788] wcgrid_cep2_6.1[4397]: segfault at ff5fbff4 ip 80f1320 sp ff5fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.418728] wcgrid_cep2_6.1[4391]: segfault at ff1fbff4 ip 80f1320 sp ff1fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.511084] wcgrid_cep2_6.1[4356]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.608256] wcgrid_cep2_6.1[4402]: segfault at ff1fbff4 ip 80f1320 sp ff1fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.658423] wcgrid_cep2_6.1[4394]: segfault at ff5fbff4 ip 80f1320 sp ff5fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:27 deb1 kernel: [ 1250.860330] wcgrid_cep2_6.1[4379]: segfault at ff3fbff4 ip 80f1320 sp ff3fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:28 deb1 kernel: [ 1251.163360] wcgrid_cep2_6.1[4349]: segfault at ff1fbff4 ip 80f1320 sp ff1fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]
Sep 4 06:40:28 deb1 kernel: [ 1251.352750] wcgrid_cep2_6.1[4373]: segfault at ff7fbff4 ip 80f1320 sp ff7fbed0 error 6 in wcgrid_cep2_6.19_i686-pc-linux-gnu[8048000+135000]

Havis
09-04-2010, 07:05 AM
Hi,

Do you have schedtool installed?
I would say, that the graph shown by you suggest that your BOINC is not set to use 100% CPU? (maybe 80%?,thats why is it jumping up and down?)

trn
09-04-2010, 07:17 AM
I don't know what schedtool is but I know I don't have it installed. The first thing I did when I got Linux up and running was turned off that GUI login crap; so when I see the boinc dameon start it says something about schedtool not being installed or setup. I hadn't mentioned that yet because I was thinking making Boinc start up as a dameon so no login or no X required should be easy enough for me to figure out after reading up on some documentation.

The usage is seeming to normalize without me doing anything; WCG is still throwing "segfault" errors though. I'm doing something else stupid also, i'm running Linux on a system with only 3GB of ram; I had planned to run Linux on 4GB machines and leave 3GB for Windows machines (that just sounds backwards :ROTF:) The Ram issues and what Jcool said makes me think that maybe it just takes WCG on Linux some time to load up the ram to the proper amount needed for 100% CPU crunching without accessing the swap file all the time; that just sounds strange... but I dunno?
http://i399.photobucket.com/albums/pp80/trn_photobucket/deb1-2.png

shoota
09-04-2010, 07:27 AM
until DA gets on here to fix this i would just let it go and see if it evens out. whenever i start up my 16-core arima rig it does this same thing. i'd say it takes mine a few hours to max out all the cores

SiGfever
09-04-2010, 07:30 AM
until DA gets on here to fix this i would just let it go and see if it evens out. whenever i start up my 16-core arima rig it does this same thing. i'd say it takes mine a few hours to max out all the cores

He is "Da Man". :up:

trn
09-04-2010, 07:39 AM
Yeah... you guys are right I need DA to save the day here :D And Linux just crashed on me again!!! just hardlocked!! even after I gave it a vcore +2 bump over win7 stable :confused:

Until then I got other issues to work on (like why the #%$#^%$ can't I overclock anything on these new UD3R rev 2.0 boards!!!!)

shoota
09-04-2010, 08:06 AM
well i have no words of wisdom, but if everything worked right away then it'd be over so fast and what's the fun in that? lol

Havis
09-04-2010, 08:32 AM
If WCG is segfaulting then I think you have memory issues, maybe downclock it a bit and see if it still segfaults... (maybe try stock clocks for a while? :D )

bearcatrp
09-04-2010, 08:34 AM
Had the same problem with ubuntu. Check your settings to make sure your cpu's are locked at the top speed of the processors. I think most linux distros scale down when it gets hot.

[XC] mysticmerlin
09-04-2010, 08:39 AM
(maybe try stock clocks for a while? :D )

:slap: :slapass:

But no rly I would go to stock every thing. Get it worked out and then go back to the settings you are at now and see if it stays stable that way.

shoota
09-04-2010, 08:41 AM
i can assure you both of our rigs arent running hot.

Serra
09-04-2010, 08:45 AM
I had been running Debian 5 hosts on ESXi for a long time without those issues you're reporting. One difference that might be the reason is that I don't think I used apt-get to install it, I think I did it manually and thus may have gotten a newer client. Debian is a very stable release, but to be so requires rigorous qualification of software and so it's often behind the times.

Perhaps try going through and doing a manual install of a newer client if available?

[XC] mysticmerlin
09-04-2010, 08:50 AM
i can assure you both of our rigs arent running hot.

But that doesn't sound like a heat problem.

trn
09-04-2010, 09:05 AM
Not a heat issue :D Also, the host was doing this on a stock running L5640 and I've been running this OC with windows and WCG for 24 hours or so. My RAM is underclocked also, maybe I should try to clock the ram up. Even with 4gb it seems thats not enough ram for a hexacore running Linux and WCG!

After leaving the computer idle for an hour or so, I look at the task manager and everything is 100% with no drops but RAM usage is still ~80 - 85%; I might have to drop a 6gb kit in here and see what it does.

Havis
09-04-2010, 09:26 AM
Do you have swap partition? Is the box trashing because of low ram?
If not, then you have enoughf RAM :up:
on my system each WCG project takes around 120MB to 300MB RAM

shoota
09-04-2010, 09:59 AM
4gb is more than enough for my 16-cores and linux :shrug:

trn
09-04-2010, 11:41 AM
4gb is more than enough for my 16-cores and linux :shrug:

I was running 12 CEP2 WU's which might be memory hogs. Before the system was using 85% of the available 4gb. I swapped the Debian hard drive back to a Gigabyte x58a-UD3R rev 2.0 from the rev 1.0 and installed 6gb of RAM and the L5640 is running at stock clocks and now only using 2.3GB of RAM. It looks like the overclocked CPU needed more RAM per WU or something odd. Also, when Boinc first starts the CPU usage still bounces all over the place while the RAM slowly gets up to its max usage; then CPU usage is pinned at 100% :shrug:

Good enough for now; i'm just going to leave this machine like it is and let it go for awhile; Linux was crashing on me with my overclocked machine that was windows stable :shrug:

D_A
09-04-2010, 02:56 PM
I don't know about "fix it" but I can give a little insight on what's happening.
When Linux initially loads it will only load stuff into RAM as it's required. As the code progresses, the app accesses more and more code and data from the hdd which is of course paged into RAM. The difference here is that Windows likes to load everything and it's dog in on startup rather than wait until it's called for. For the Windows machine it's a nice way of making the system SEEM faster, since when stuff is already loaded it opens REAL fast compared to when you have to load it in. It's all a trade off, with Linux going one way and Windows the other.
Another possibility is that of thermal throttling. I'm not overly familiar with Debian, so I don't know how aggressive it is in that distro, but it could be that it's messing with your CPU clocks trying to slow the CPU temperature rise. Possible, but I don't know.
Next item to check is the "stop work if CPU usage is over ... " setting in the preferences ... if you have that in the 6.2.xx series.
As for the system shutting down/locking up, Linux uses memory differently than Windows. If you've got a section of RAM that's not quite up to the rest, but is right at the end of the allocation then Windows might not ever get to it but Linux could. Linux is sometimes less fault tolerant than Windows, though it will usually run on older hardware. Go figure.

I'll admit the behaviour you're getting is something I just don't see on my machines. Quite frankly, nothing I have will chew through data nearly as fast as that rig.

xVeinx
09-04-2010, 04:17 PM
I've had segfault issues on ubuntu and fedora, and it generally is the memory/IMC. If they get out of sync, you'll end up with the same issues you're getting along with filesystem corruption, etc. I'm not sure how your system is clocked, but backing down the memory clock and/or upping the voltage on the IMC should help. Linux can be really sensitive to memory settings; when you dial them in though, it will perform really well!

trn
09-04-2010, 04:29 PM
Thanks for the insight guys; Aurgh... I hate messing with memory settings :rolleyes: With windows cruncher's i've been trying to underclock, de-time, and undervolt the memory as much as possible because its not supose to help at all for WCG. I'm running the Debian drive on a stock L5640 now with auto memory settings; its running OK, still has segfault errors but CPU usage is probably over 95% or so which is good enough for now while I work on some other issues.

jcool
09-05-2010, 02:43 AM
Hm.. no issues with errors on mine. And it's running 100% load, I just checked.
I am still not sure whether it just worked itself out magically or whether the CPU was throttling (because I was initially running 3,8 which was a little too much for the air cooled µATX rig it's running in). My 5640's start throttling at 80C already, which I think could have been reached during the summer months with ambients of 35C.

Basically I just ran prime and LinX on Win7 (it has a dual boot) and switched over to Ubuntu when it was reported stable...

Oh yeah, only 2GB of ram in mine, which is plenty for HCC only. It's using 730MB of the 2GB including OS for 12 threads of HCC.
If you don't have enough memory, run HCC :D
Also, some projects are prone to errors.. DDDT2 it was, I think? never ran anything besides cancer projects :shrug:

Havis
09-05-2010, 04:18 AM
Hi, I just noticed on debian you need to have firmware-linux (which depends on both firmware-linux-free and firmware-linux-nonfree)

the problem is that without firmware-linux-nonfree, you won't have accelerated Xserver!
It conttains firmwares for various graphics cards (mainly for Radeons)

here is a shor list of Radeon firmwares (it contains many more e.g. Intel Integrated Graphics, and Broadcom server-class nicks etc, etc...):
Radeon HD 5400-family ME microcode (radeon/CEDAR_me.bin)
* Radeon HD 5400-family PFP microcode (radeon/CEDAR_pfp.bin)
* Radeon HD 5400-family RLC microcode (radeon/CEDAR_rlc.bin)
* Radeon HD 5800/5900-family ME microcode (radeon/CYPRESS_me.bin)
* Radeon HD 5800/5900-family PFP microcode (radeon/CYPRESS_pfp.bin)
* Radeon HD 5800/5900-family RLC microcode (radeon/CYPRESS_rlc.bin)
* Radeon HD 5700-family ME microcode (radeon/JUNIPER_me.bin)
* Radeon HD 5700-family PFP microcode (radeon/JUNIPER_pfp.bin)
* Radeon HD 5700-family RLC microcode (radeon/JUNIPER_rlc.bin)
* Radeon R100-family CP microcode (radeon/R100_cp.bin)
* Radeon R200-family CP microcode (radeon/R200_cp.bin)
* Radeon R300-family CP microcode (radeon/R300_cp.bin)
* Radeon R400-family CP microcode (radeon/R420_cp.bin)
* Radeon R500-family CP microcode (radeon/R520_cp.bin)
* Radeon R600 ME microcode (radeon/R600_me.bin)
* Radeon R600 PFP microcode (radeon/R600_pfp.bin)
* Radeon R600-family RLC microcode (radeon/R600_rlc.bin)
* Radeon R700-family RLC microcode (radeon/R700_rlc.bin)
* Radeon HD 5500/5600-family ME microcode (radeon/REDWOOD_me.bin)
* Radeon HD 5500/5600-family PFP microcode (radeon/REDWOOD_pfp.bin)
* Radeon HD 5500/5600-family RLC microcode (radeon/REDWOOD_rlc.bin)
* Radeon RS600 CP microcode (radeon/RS600_cp.bin)
* Radeon RS690 CP microcode (radeon/RS690_cp.bin)
* Radeon RS780 ME microcode (radeon/RS780_me.bin)
* Radeon RS780 PFP microcode (radeon/RS780_pfp.bin)
* Radeon RV610 ME microcode (radeon/RV610_me.bin)
* Radeon RV610 PFP microcode (radeon/RV610_pfp.bin)
* Radeon RV620 ME microcode (radeon/RV620_me.bin)
* Radeon RV620 PFP microcode (radeon/RV620_pfp.bin)
* Radeon RV630 ME microcode (radeon/RV630_me.bin)
* Radeon RV630 PFP microcode (radeon/RV630_pfp.bin)
* Radeon RV635 ME microcode (radeon/RV635_me.bin)
* Radeon RV635 PFP microcode (radeon/RV635_pfp.bin)
* Radeon RV670 ME microcode (radeon/RV670_me.bin)
* Radeon RV670 PFP microcode (radeon/RV670_pfp.bin)
* Radeon RV710 ME microcode (radeon/RV710_me.bin)
* Radeon RV710 PFP microcode (radeon/RV710_pfp.bin)
* Radeon RV730 ME microcode (radeon/RV730_me.bin)
* Radeon RV730 PFP microcode (radeon/RV730_pfp.bin)
* Radeon RV770 ME microcode (radeon/RV770_me.bin)
* Radeon RV770 PFP microcode (radeon/RV770_pfp.bin)

Maybe you've been on software rendering, and thats why it was jumping up and down? :)

Brother Esau
09-06-2010, 05:50 AM
I would not worry much....Penguins are easily House Broken and don't consume much;)

SiGfever
09-06-2010, 06:43 AM
I would not worry much....Penguins are easily House Broken and don't consume much;)

Glad to see you posting here Brother. :up: