Advanced search

Message boards : Technical Support : Virtualized tasks monopolize all cores in BOINC but only use around 40% of the CPU

1 · 2 · Next
Author Message
Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 20956 - Posted: 24 Mar 2016, 3:00:51 UTC
Last modified: 24 Mar 2016, 3:59:53 UTC

Why does a Planck task use only around 36% to 40% of the CPU when it tells BOINC to reserve all of the cores? I can understand if you need to keep cores idle because the task hogs the memory system instead of the CPU cores by transferring giant amounts of data in and out of memory and therefore adding more cores would slow things down because they would fight over the memory system. If the issue is a bug and you meant to use 100% of the CPU cores, please fix this so that we can crunch more tasks at once.

EDIT: I have an Intel Xeon E5-2690 v3 with 12 physical cores and 24 virtual cores. 10 virtual cores are nearly maxed out and the rest are idling when I suspend GPU computing in BOINC and close all other normal user applications except for the task manager which is used to measure CPU utilization.

EDIT 2: I noticed that a similar problem also affects camb_boinc2docker tasks as well.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 20957 - Posted: 24 Mar 2016, 9:51:59 UTC - in response to Message 20956.
Last modified: 24 Mar 2016, 9:53:20 UTC

Hi Jesse,

Both the planck_param_sims and camb_boinc2docker tasks should use 100% of the CPUs which you allocate to them, so something must be wrong. Looking through the logs for your job, you'll see the line "Setting CPU Count for VM. (24)" so at least that seems to be getting recognized correctly.

It may be possible its a bug related to how Virtualbox maps your physical and virtual cores to the cores the VM sees. Can you try the solution here and lower the number of CPUs to say 10? Does then the number of cores used remain at 10, or is it again lower?

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 20958 - Posted: 24 Mar 2016, 16:19:54 UTC - in response to Message 20957.

I have just made the modifications, and have limited only the number of CPUs to 10 for both of the applications that require virtualization. I will have to wait until BOINC pulls more tasks to check this.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 20959 - Posted: 24 Mar 2016, 23:27:07 UTC - in response to Message 20957.
Last modified: 24 Mar 2016, 23:51:47 UTC

Making the modifications and getting new tasks shows that each Planck task still uses 9-10 virtual cores per virtualized process, and each boinc2docker task uses 8-9 virtual cores per virtualized process.

I am still trying to interpret what I found, so my data is not finished. It appears that your programs currently can only use a maximum number of threads. The numbers of cores used is really just rough estimates.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 20960 - Posted: 25 Mar 2016, 0:11:14 UTC - in response to Message 20957.

I found out that my previous methodology was flawed. A program that I was using at the same time that I expected not to hog a virtual core was instead hogging one. A few other things were also using another core. I shut them all down. I then suspended all BOINC tasks except for one of the virtualized Cosmology@home tasks under study. I then viewed the graphs in the task manager and counted the number of virtual CPU cores that were hogged.

The results are now clear to me. Both the Planck and the boinc2docker tasks can hog a maximum of 8 virtual cores, and any more virtual cores assigned to them are idle and therefore wasted.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21020 - Posted: 11 Apr 2016, 19:12:28 UTC - in response to Message 20960.
Last modified: 11 Apr 2016, 19:20:31 UTC

I found out that my previous methodology was flawed. A program that I was using at the same time that I expected not to hog a virtual core was instead hogging one. A few other things were also using another core. I shut them all down. I then suspended all BOINC tasks except for one of the virtualized Cosmology@home tasks under study. I then viewed the graphs in the task manager and counted the number of virtual CPU cores that were hogged.

The results are now clear to me. Both the Planck and the boinc2docker tasks can hog a maximum of 8 virtual cores, and any more virtual cores assigned to them are idle and therefore wasted.

Hi Jessie, thanks for digging into this. I think its possible this is related to a BOINC/vboxwrapper issue. We should be assigning Virtualbox CPUs equal to the number of cores, not hyperthreads, on a given system. Looking at your machine, I see 12 cores and 24 hyperthreads, so 12 should have been assigned (per job).

Nevertheless, I'm still somewhat surprised that you say that its not until you manually decrease the CPUs to 8 that you see 100% usage. I would have thought once you're at 12 its good. If you have time, I'd be curious what's the exact number for your CPU usage when you limit to 12 virtual CPUs? Is it possible there's anything else going on? Thanks.

Profile 25000ghz [Lombardia]
Send message
Joined: 18 Aug 15
Posts: 5
Credit: 3,503,450
RAC: 0
Message 21021 - Posted: 11 Apr 2016, 19:23:07 UTC - in response to Message 21019.

To reduce the number of cpu I used the percentage of Boinc setting. To see the work load I used "task manager" of windows 8.1

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21022 - Posted: 11 Apr 2016, 19:37:27 UTC - in response to Message 21021.
Last modified: 11 Apr 2016, 19:38:08 UTC

To reduce the number of cpu I used the percentage of Boinc setting. To see the work load I used "task manager" of windows 8.1

How many physical cores do you have? Is it 20 cores, 40 hyperthreads? If so, I believe the correct number of CPUs to give your job is 20.

So if you lower your BOINC CPU usage to 50%, the job should say "20 CPUs" and you should be seeing 50% CPU usage in Task Manager. Can you try this and see if it works?

(Sorry about these issues, hopefully we figure out a fix soon and this is all automated in the future.)

George Buzsaki
Send message
Joined: 26 Mar 16
Posts: 2
Credit: 773,074
RAC: 0
Message 21025 - Posted: 12 Apr 2016, 21:39:04 UTC

Just another data point on this topic. I have two PCs running C@H VM jobs:

- Machine 1 has a Quad Core with Hyper-threading (so 8 logical cores) and gets 100% utilization.

- Machine 2 has a Hex Core with Hyper-threading (so 12 logical cores) and gets about 70% utilization.

8 cores of load / 12 logical cores = 67% utilization.

So it does seem that the C@H VM work unites run a maximum of 8 threads of processing no matter how many logical cores you have. The VM looks correctly configured in VirtualBox as having 12 cores and the load does spread its 8 threads of work across the 12 cores (meaning I see all 12 cores at partial utilization as opposed to 8 cores maxed with 4 logical cores idle). I suspect it is a limitation of the job algorithm itself rather than something wrong with virtual box. Have not looked at the code though, no time for that right now.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21028 - Posted: 15 Apr 2016, 9:22:16 UTC - in response to Message 21025.

Thanks, this is a helpful data point to have. The software we run doesn't have any max of 8 threads, but clearly something weird is going on in these cases. We're looking into it.

George Buzsaki
Send message
Joined: 26 Mar 16
Posts: 2
Credit: 773,074
RAC: 0
Message 21030 - Posted: 15 Apr 2016, 20:16:47 UTC

Another possibility: The VM task really does have ready-to-run threads for all cores, but the Linux thread scheduler is sub-optimal at keeping all cores busy. See recently posted paper on this topic (just saw in reddit /r/programming:

http://www.ece.ubc.ca/~sasha/papers/eurosys16-final29.pdf

Or there could be some exclusive locking around a global memory structure that prevents the work from scaling linearly beyond 8 threads.

Either way, might not be fixable on your side, but worth understanding if possible.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21098 - Posted: 21 Jun 2016, 3:14:22 UTC - in response to Message 21022.
Last modified: 21 Jun 2016, 3:15:46 UTC

I am sorry that I was unable to respond. I had to quit BOINC until the fall. My parents do not allow me to run it during the spring or summer because my computer running at full load will significantly heat up the room, and they only want the heat in the fall and winter. This includes GPU tasks.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21193 - Posted: 25 Oct 2016, 0:20:07 UTC - in response to Message 21028.

ATLAS@Home has the same problem that Cosmology@home does when dealing with more than 8 vCPUs per virtual machine. Please see http://atlasathome.cern.ch/forum_thread.php?id=562, http://atlasathome.cern.ch/forum_thread.php?id=568, and http://atlasathome.cern.ch/forum_thread.php?id=573 for its experiments with limiting the number of vCPUs in each VM.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21194 - Posted: 25 Oct 2016, 21:01:25 UTC - in response to Message 21193.

Thanks for pointing me to that discussion, its useful to read through.

Btw, I may be forgetting something, but I don't recall C@H having any problem with 8CPUs in general. The problem is that by default virtualbox wrapper runs the BOINC job with NCPUs = (# of hyperthreads) as opposed to (# of physical CPUs), the latter being the correct thing to do b/c of limitations of Virtualbox. The discussion for this is https://github.com/BOINC/boinc/issues/1501 but its ongoing work to fix.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21195 - Posted: 26 Oct 2016, 5:43:22 UTC - in response to Message 21194.
Last modified: 26 Oct 2016, 5:45:42 UTC

I found that limiting the number of vCPUs to 8 per task and not limiting the number of tasks run at once maximizes my speed and CPU utilization. (24 logical cores / 8 vCPUs per task gives you 3 tasks running at once.) HyperThreading is not the problem I am seeing. I think that something is wrong within either VirtualBox or the Linux scheduler in each virtual machine to prevent it from maxing out more than 8 vCPUs per virtual machine based on my evidence, the ATLAS@home evidence, and previous posts by other users in this thread. Any time I have given a Linux VM more than 8 vCPUs, it does not max them out.

Here is the app_config.xml that I have used that maxes out my CPU:

<app_config> <app> <name>camb_boinc2docker</name> </app> <app_version> <app_name>camb_boinc2docker</app_name> <plan_class>vbox64_mt</plan_class> <avg_ncpus>8</avg_ncpus> </app_version> </app_config>

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21203 - Posted: 5 Nov 2016, 15:39:20 UTC

I have cleaned out some unnecessary lines in my app_config.xml. A better one is below:

<app_config> <app_version> <app_name>camb_boinc2docker</app_name> <plan_class>vbox64_mt</plan_class> <avg_ncpus>8</avg_ncpus> </app_version> </app_config>

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21260 - Posted: 22 Dec 2016, 3:32:22 UTC

I have done some experimenting, and found the following: using all 24 vCPUs left more than half of the virtual cores idle when the task does not hang. (The first task I try to execute with 24 vCPUs always hangs during the booting of the virtual machine and needs to be aborted. The rest go through fine.) Switching to 9 vCPUs per task allowed my CPU to be fully utilized when the other logical cores are filled with single core tasks, but there was hardly any speedup from using 8 vCPUs per task. Perhaps the other single core tasks competed for my CPU's memory controllers enough to slow down the multicore tasks. Giving tasks 10 vCPUs would cause them to consistently hang during the virtual machine's boot sequence, forcing me to abort them. I haven't had the time to try 11 or 12 vCPUs per task yet.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21266 - Posted: 31 Dec 2016, 17:07:43 UTC

Experiments where I used the app_config.xml to assign 12 vCPUs to each task failed and had to be aborted.

Jesse Viviano
Send message
Joined: 29 Nov 14
Posts: 28
Credit: 421,093
RAC: 658
Message 21497 - Posted: 18 Aug 2017, 13:49:31 UTC

I was experimenting around again, and found that 9 vCPUs would cause tasks to crash. I guess that the project should configure its VMs to use a maximum of 8 vCPUs to keep tasks from crashing instead of relying on users to use an app_config.xml file to set this up.

shu
Avatar
Send message
Joined: 4 Aug 17
Posts: 2
Credit: 26,016
RAC: 0
Message 21500 - Posted: 20 Aug 2017, 16:28:26 UTC
Last modified: 20 Aug 2017, 16:43:46 UTC

I'm running a Ryzen 1800x here and have also come across some weird problems.

When I run the task with 16CPU cores, it takes me around 3:30 to finish 1 WU. (The CPU usage is at around 50%)

When I follow the prescribed fix of using 2 tasks with a maximum of 8CPU cores each, it takes each task around twice as long to finish and my CPU usage is at 100%.

If I run only 1 task with 8CPU cores, it takes the task the same amount of time as if I run it with 16 CPU cores with the same 50% utilization...

When I use different project WUs that utilize the other cores, the processing time of the 8CPU core docker wu also goes up by about 1 minute to 4:40ish...

Would be nice if someone could figure this one out :)

1 · 2 · Next

Message boards : Technical Support : Virtualized tasks monopolize all cores in BOINC but only use around 40% of the CPU