Advanced search

Forums : Technical Support : Postponed
Message board moderation

To post messages, you must log in.

AuthorMessage
Ardis

Send message
Joined: 30 Nov 14
Posts: 4
Credit: 184,819
RAC: 0
Message 22484 - Posted: 30 Apr 2020, 5:11:22 UTC

In my task list there are currently 41 partially completed (and overdue) camb_boinc2docker 2.05 tasks that all say "Postponed: Communication with VM Hyperviser failed." Suggestions?
ID: 22484 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22485 - Posted: 30 Apr 2020, 5:36:00 UTC - in response to Message 22484.  

You didn't have any tasks returned so I couldn't look at the error log. What resources are assigned to the virtual machine for a running task? How many cpus? Are you running any other projects?
ID: 22485 · Report as offensive     Reply Quote
Ardis

Send message
Joined: 30 Nov 14
Posts: 4
Credit: 184,819
RAC: 0
Message 22486 - Posted: 30 Apr 2020, 6:48:16 UTC - in response to Message 22485.  
Last modified: 30 Apr 2020, 7:15:54 UTC

Hi Jonathan,

All four CPUs can run VM tasks. Several other projects are installed, but Rosetta@Home and World Community Grid are the only other ones that are doing anything right now.

C@H seemed to be running fine until lately. I'd see several 7+ minute tasks in the queue, they would all run overnight, and then there would be an uptick in the statistics.

I took a look at VM. I updated to 6.1.6, and now the status of those 41 tasks is "waiting to run." and completion has been reset to 0%. In fact, one of them is running right now. They are, however, all overdue. Should I abort them, or reset the project, or just wait for it to sort itself out?

Edit: on further inspection (or they might have changed since I first looked) four of the tasks are still suspended. These four were due 4/12. The others waiting to run were due 4/14.
ID: 22486 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22487 - Posted: 30 Apr 2020, 17:56:36 UTC - in response to Message 22486.  

I would just abort them. That explains why no tasks show for your computer.
ID: 22487 · Report as offensive     Reply Quote
Ardis

Send message
Joined: 30 Nov 14
Posts: 4
Credit: 184,819
RAC: 0
Message 22507 - Posted: 15 May 2020, 22:27:19 UTC

Well, it started again. There are currently 18 tasks, partially completed, with the message: "Postponed: Communication with VM Hyperviser failed." The last time this happened, a VM update fixed it. VM is still up to date (6.1.6) and running, but C@H doesn't like it. Don't know why.
ID: 22507 · Report as offensive     Reply Quote
Melvin

Send message
Joined: 10 Nov 18
Posts: 3
Credit: 1,494,434
RAC: 487
Message 22508 - Posted: 16 May 2020, 15:53:16 UTC

I have a similar issue, having noticed an accumulation of 25 units with only a minute or so elapsed on each and all have about 1hr 40mins remaining but stuck with the status "Postponed: VM job unmanagable, restarting later (2 CPUs)" or "Postponed: VM Hypervisor failed to enter an online state in a timley fashion (2 CPUs)" and 2 units saying "Postponed: Hypervisor was unable to allocate enough memory to start VM (2CPUs)". All units but one show less than 1% progress and suspending all other projets does not help get any of these restarted. (Rosetta has 2 corvid-19 projects due in a couple of days which I reluctant to hold up much longer). The new compter (420649) has the 2 CPUs needed and has 8Gb of memory, with usage set to 100% (+90% of 9Gb of swap if needed, and I have tried with and without tasks left in memory whilst suspended). I read about the AMD-v where issues seem mainly about not receiving jobs due to not being enabled, but here the compter details say this is enabled and it is recieving the jobs - just not progressing them beyond a short initial start to each.
Please advise.

AMD A9-9425 RADEON R5, 5 COMPUTE CORES 2C+3G [Family 21 Model 112 Stepping 0]
Virtualbox (6.0.14) installed, CPU has hardware virtualization support and it is enabled
Operating System Microsoft Windows 10 Professional x64 Edition, (10.00.18363.00)
BOINC version 7.16.5
ID: 22508 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22509 - Posted: 16 May 2020, 22:03:50 UTC - in response to Message 22507.  

Ardis, set no new tasks for Cosmology, abort all your Cosmology tasks then go to your Cosmology project preferences and set Max # of jobs to 1 and max # of cpus to 1. You can either exit Boinc and all tasks or just try updating the Cosmology project from the Boinc manager.
I am hoping you can get one VM task to run to verify your VirtualBox is working.
ID: 22509 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22510 - Posted: 16 May 2020, 22:11:07 UTC - in response to Message 22508.  

Melvin, you can try to set the same project preferences as above to get one work unit running. You might only be able to run single cpu work units on that computer, 420649. If the single cpu work units works, you could increase the Max # of jobs later.
ID: 22510 · Report as offensive     Reply Quote
Melvin

Send message
Joined: 10 Nov 18
Posts: 3
Credit: 1,494,434
RAC: 487
Message 22511 - Posted: 17 May 2020, 11:11:11 UTC - in response to Message 22510.  

Thank you for your suggestion Jonathan. I had meanwhile started a look at the Virtual-Box. I had never opened it's lid before, but found the individual jobs marked as not powered or aborted, though I seemed to be able to select each to start them briefly (showed as running in VB but with no change to displayed boinc status). Then noticed each had a warning message about the display memory being set too low (needing 9Mb min) so increased each from 8 to 20, thinking that probably only applies to viewing within VB and will make no difference to boinc as I would normally only view jobs from boinc. Confirmed the no-difference so closed, rebooted and saw a repeat of each job in turn running for a while (from 0%, as I seemed to have lost any previous progress) before getting the postponed messages in boinc and going into a powered off state in VB. Wondered about the boinc log mention of "No WSL found" and looked that up and decided if that was not an alternative to VB, and presumably VB would not have proceeded if it were something it needed to use, so decided to not risk getting involved with that. Instead, I downdated to latest VB 5.2, having read under another thread how this worked for a user with issues, but basically still got similar behaviour (not easy to quickly see which task is which when site lists by a task number, the boinc task list only identifies by a name, and the VB list only shows a different boinc_code_number).

Then read your post and changed boinc setting to only use 50% of the 2 cpus, though I noticed boinc log said it was to use 1 cpu, whilst VB showed 2cpu (due to multi-threading?) and suspended all jobs but one, then resumed another after that one went, so this did seem to work and the previously empty task list on cosmology site for this computer was later showing a few completed and verified tasks, and a couple more with errors (and the boinc log showed a couple of tasks "exited with zero status but no 'finished' file"). I'm now about half way through the list having manually resumed each task in turn before now marking the rest to resume so I don't have to wait on each (trusting that the issue was about the number of cpus and won't be affected by any stop/start due to task scheduling).

Comparing with my other W10 computer (372173) has been running VB 5.2.8 (BOINC 7.16.5) set to use 3 cores, for some while now with no problems though I notice if I open the VB there all the tasks show as "inaccessible". That one has many mentions about "Vbox app stderr indicates CPU VM extentions disabled " in the boinc log, but all the jobs have been running fine on that so far.
As BOINC seems to set VB to emulate a Linux system to run the tasks I am wondering if it is more efficient to just let the cosmology units run on my other native Linux computers? This new one is double booting W10/Mint19.3, though I tend to not run boinc on any irregular/lesser-used OS of a machine, to ensure mostly 24/7 availability. When these units are done I will probably drop cosmology from this computer as I don't think setting the number of cpus to 1 can be done on a per project basis, so as not to limit resources for other projects. At least I now know a few things to try if I encounter this on my other windows machine. Thanks again for your help.
ID: 22511 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22512 - Posted: 17 May 2020, 23:03:39 UTC - in response to Message 22511.  

372173 is currently showing VM extensions disabled and is failing all camb_boinc2docker work units. Check that it is on in the Bios. My AMD gets disabled with every Bios update. If it is enabled in Bios, check the FAQ section for a flag that may be set and still causing the error.

You can set and individual computer to run single cpu work units using the app_config.xml method. Explained in the FAQ section a little bit. I am attaching the code below if you wish to try it. Set for a single, concurrent work unit and a single cpu. It is written to only control the camb_boinc2docker task.
Set no news tasks for Cosmology on that computer. Abort and/or return all work units then exit Boinc Manager and all running work units. You can create the app_config.xml file per the FAQ and place in the correct directory. It should grab new work units once you start Boinc and allow new tasks.

<app_config>
    <app>
        <name>camb_boinc2docker</name>
        <max_concurrent>1</max_concurrent>
    </app>
    <app_version>
        <app_name>camb_boinc2docker</app_name>
        <plan_class>vbox64_mt</plan_class>
        <avg_ncpus>1</avg_ncpus>
    </app_version>
</app_config>
ID: 22512 · Report as offensive     Reply Quote
Melvin

Send message
Joined: 10 Nov 18
Posts: 3
Credit: 1,494,434
RAC: 487
Message 22513 - Posted: 18 May 2020, 15:26:44 UTC - in response to Message 22512.  

Thanks Jonathan, I didn't realise #cores could be project specific.
Have inserted the app_config.xml, restarted BOINC and enabled new tasks.
Boinc logged "Found app_config.xml", downloaded several units and ran 1 camb_legacy and 1 camb_boinc2docker unit. If I temporary suspend all other projects and the legacy unit, boinc didn't try to start a second docker unit, so the app_config seems to have worked as the docker unit completed and another one is running so all seems ok now.
(though I see one unit logged "17/05/2020 08:38:27 | Cosmology@Home | Aborting task camb_boinc2docker_799503_1589116819.913040_0: exceeded disk limit: 487.97MB > 476.84MB" whilst log also showed "17/05/2020 07:49:58 | | max disk usage: 76.30 GB" - task 28077884 ? Puzzling, as the boinc disk parameter is set to use up to 90% + leave at least 0.1Gb, and C: properties says there is 71.6GB free, and disk-clean-up could only find 21MB to gain as this machine is less than 2 weeks from new so is far from cluttered yet)
I obviouly never paid much attention to what type of units were passing through on my other windows machine so didn't realise VB must have never been utilised, so have set no new tasks there and will try to enable it's VM and add the app_config if needed.
ID: 22513 · Report as offensive     Reply Quote
Ardis

Send message
Joined: 30 Nov 14
Posts: 4
Credit: 184,819
RAC: 0
Message 22516 - Posted: 21 May 2020, 13:09:43 UTC

No luck. Set C@H to 1 task, 1 core, suspended everything else. Twice. C@H task stopped both times at about 6:30 with the same error message: "Postponed: Communication with VM Hypervisor failed."
ID: 22516 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22517 - Posted: 21 May 2020, 19:08:35 UTC - in response to Message 22516.  

Suspend your other projects and see if it Cosmology runs to completion. I think your computer is just too 'busy'. The VM tasks run at a higher priority than the normal Boinc tasks. The Virtual Box wrapper used by the projects runs at a lower priority and can't communicate with Boinc quickly enough and causes that error. I am not sure if running on an SSD or NVMe would help because they are faster with communications. You could also try setting processor usage to 75% but then you would only use 3 out of 4 cores for Boinc projects.
ID: 22517 · Report as offensive     Reply Quote
Grindylow

Send message
Joined: 22 Apr 09
Posts: 2
Credit: 221,823
RAC: 78
Message 22524 - Posted: 25 May 2020, 23:01:25 UTC

Hi. I am currently unable to finish any "Cosmology@HOME" tasks. I am running Windows 10 Home version on a laptop purchased in 2019, so the computer isn't too ancient. I am running BOINC Manger 7.16.5, and Virtual Box version 6.0.14.

I currently have 6 "Cosmology@Home" tasks that all say: "Postponed: VM job unmanageable, restarting later. (2 CPUs)". If I restart BOINC, these projects will run normally for a while, complete about another 1%, then generate this above message again. I have allowed a little over a day for them to "restart later", but that's never happened. There is no restarting (in that time frame, anyway). This seems similar to problems others are having in this thread.

Note that both locally on my machine, and my web preference are both set to use 50% . I only have 2 CPUs, so I'm not sure why running 50% of them (1) generates the "(2 CPU)" message above. I have also tried it running 2 CPUs, but to no avail.

I don't know whether this might have a common cause, but other projects running on my computer frequently spontaneously stop, with the task saying "task suspended by user." This happens without any input from me suspending the task. When I resume running the task, it may finish, or it may stop with the same message after a few hours. It seems random.

Anybody have any thoughts on the subject? Thanks for your time.
ID: 22524 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22526 - Posted: 26 May 2020, 2:59:01 UTC - in response to Message 22524.  

Set no new tasks and then abort any tasks you have from here.
Easiest way to change to single cpu tasks is change your Cosmology@home preferences for this project.
Set Max # jobs to 1 and Max # CPUs to 1. You should get the new preferences when you allow new tasks and request and update on the project.
You could also use the app_config.xml method listed in the FAQ section.

Your postponed jobs will start back up on their own after about 24 hours but, most likely, will get stuck again. You probably can run one, single cpu VM task and a regular BOINC task. Your tasks starting and stopping is due to your general preferences on ram or cpu usage.
ID: 22526 · Report as offensive     Reply Quote
Grindylow

Send message
Joined: 22 Apr 09
Posts: 2
Credit: 221,823
RAC: 78
Message 22528 - Posted: 26 May 2020, 18:18:44 UTC - in response to Message 22526.  

Thank you. For brevity (and an oversight), I left out that the first thing I did was suspend all running tasks except for one "Cosmology" task, and I said to not allow new tasks on any project (including "Cosmology"), then I told all to "update. I waited a day and a half, and got the same "Postponed" message. Is setting max CPUs to 1 different from what I did when I set it to use a maximum of 50% of my (two) CPUs?

To be clear, are you saying that even if I am able to get "Cosmology" running again, that I can only run one other task from one other project I'm subscribed to at a time? When BOINC Manager downloads 6 "Cosmology" tasks, I need to manually abort 5 of them? I have 9 other projects on the hook. I can only run 1 task from one other project at a time?

I'm not sure what you mean about when my tasks start and stop. Doesn't the "Switch tasks every X minutes", eg. 120 minutes) cause tasks to star and stop?
ID: 22528 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22529 - Posted: 26 May 2020, 22:29:50 UTC - in response to Message 22528.  

The use 50% of cpus setting is for all projects under Boinc. If you have two cores, Boinc would only run on one of them. Setting the Cosmology project preferences to use one max cpu and max one task is setting it to run a task that uses one cpu in the virtual machine and only runs a single task concurrently. It will still download multiple tasks but only one should run at one time. This can also be controlled by using the 'app_config.xml' method explained in the FAQ section.

When you set 'no new tasks' and abort all Cosmology at home tasks you have you were clearing out the ones that were set to use two cpus per virtual machine. You have your computers hidden so I can't look at your computer details and I can't look at the logs for returned tasks.

You can run other projects at the same time and you don't have to abort any work units or tasks manually.

The camb_boinc2docker task needs to run to completion as it doesn't checkpoint. This isn't a problem as they run so quickly. If you exit Boinc Manager and tell it to exit all tasks, when it starts up again you will notice the camb_boinc2docker task, that was postponed, will have to start computing at the beginning again.

I was hoping to get your Cosmology task running using a single cpu in a single task and then try adding the other projects back into the mix. Are your other projects regular Boinc or do they use Virtual Box also? I don't think you will be able to run more than one VM related project at a time due to your computer and Virtual Box and the Boinc virtual box wrapper being fickle.

I just set my AMD computer back to run camb_boinc2docker tasks here. You should be able to click my name in the forum and browse my computers and the tasks to see what info is shared when they aren't hidden.
ID: 22529 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22530 - Posted: 28 May 2020, 1:30:32 UTC - in response to Message 22529.  

Grindylow, thanks for showing your computer and tasks.
The three tasks you returned and that got validated were using 2 cpus in the VM. I can tell by looking at the task details run time vs the cpu time listed. You just haven't had a single cpu VM run and complete yet. I also think you are getting hit by a Boinc logging issue as it looks like part of your logs aren't reported on the completed task.

In Boinc Manager, go to Options, then Event Log Options. Make note of the default items that are selected. Check or uncheck an item, hit Apply and then you can uncheck or check the item again and hit apply. I hope this causes your log to populate more of the info for the tasks. My logs start with the section I am posting below. It shows the number of cpus assigned to the VM, the RAM, etc. It is helpful in seeing how you have your settings for the tasks.

Your computer is quite low on RAM and each of the camb_boinc2docker tasks use 2 Gb, no matter how many cpus are assigned to a VM. I don't think you will ever be able to run more than one at a time and you may not be able to run other projects concurrently due to memory.

My logs start like this for a task.
<core_client_version>7.16.5</core_client_version>
<![CDATA[
<stderr_txt>
2020-05-27 18:49:19 (14908): vboxwrapper (7.9.26200): starting
2020-05-27 18:49:19 (14908): Feature: Checkpoint interval offset (292 seconds)
2020-05-27 18:49:19 (14908): Detected: VirtualBox VboxManage Interface (Version: 6.1.6)
2020-05-27 18:49:19 (14908): Detected: Minimum checkpoint interval (600.000000 seconds)
2020-05-27 18:49:20 (14908): Create VM. (boinc_19c258507036a2ec, slot#2)
2020-05-27 18:49:20 (14908): Updating drive controller type and model for desired configuration.
2020-05-27 18:49:20 (14908): Setting Memory Size for VM. (2048MB)
2020-05-27 18:49:21 (14908): Setting CPU Count for VM. (4)
2020-05-27 18:49:21 (14908): Setting Chipset Options for VM.
2020-05-27 18:49:21 (14908): Setting Boot Options for VM.
2020-05-27 18:49:21 (14908): Setting Network Configuration for NAT.
2020-05-27 18:49:22 (14908): Enabling VM Network Access.
2020-05-27 18:49:22 (14908): Disabling USB Support for VM.
2020-05-27 18:49:22 (14908): Disabling COM Port Support for VM.
2020-05-27 18:49:23 (14908): Disabling LPT Port Support for VM.
2020-05-27 18:49:23 (14908): Disabling Audio Support for VM.
2020-05-27 18:49:23 (14908): Disabling Clipboard Support for VM.
2020-05-27 18:49:23 (14908): Disabling Drag and Drop Support for VM.
2020-05-27 18:49:24 (14908): Adding storage controller(s) to VM.
2020-05-27 18:49:24 (14908): Adding virtual ISO 9660 disk drive to VM. (vm_isocontext.iso)
2020-05-27 18:49:24 (14908): Adding VirtualBox Guest Additions to VM.
2020-05-27 18:49:24 (14908): Adding network bandwidth throttle group to VM. (Defaulting to 1024GB)
2020-05-27 18:49:25 (14908): Enabling shared directory for VM.
2020-05-27 18:49:25 (14908): Starting VM using VboxManage interface. (boinc_19c258507036a2ec, slot#2)
2020-05-27 18:49:29 (14908): Successfully started VM. (PID = '15372')
2020-05-27 18:49:29 (14908): Reporting VM Process ID to BOINC.
2020-05-27 18:49:34 (14908): Guest Log: BIOS: VirtualBox 6.1.6
ID: 22530 · Report as offensive     Reply Quote

Forums : Technical Support : Postponed