Advanced search

Forums : Technical Support : VM job unmanageable, restarting later
Message board moderation

To post messages, you must log in.

AuthorMessage
Markus Dippner

Send message
Joined: 20 Jun 09
Posts: 1
Credit: 29,783
RAC: 0
Message 22185 - Posted: 3 Jul 2019, 15:22:35 UTC

Hello,
I have the problem that none of the Cosmology-Jobs runs on my computer, everty time I get the Status in the title.

In the stderr.txt is this:
ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.

In the vbox_trace.txt I found this:
Command: VBoxManage -q showhdinfo "C:\ProgramData\BOINC\slots\1/vm_image.vdi"
Exit Code: -2135228412

I tried to find solutions with this error but still don't know what to do. What information do you need to help me?
ID: 22185 · Report as offensive     Reply Quote
Rantanplan

Send message
Joined: 11 Jun 19
Posts: 1
Credit: 112,582
RAC: 3
Message 22186 - Posted: 3 Jul 2019, 18:49:16 UTC

Bad News: U have to install Hyper-V in Windows and no 64bit VM works anymore.
Good News: Then it will work.

I turned to legacy wus in Windows.
ID: 22186 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 29 Aug 07
Posts: 19
Credit: 1,199,011
RAC: 313
Message 22239 - Posted: 2 Sep 2019, 14:28:47 UTC

I've the same issue
sometimes I need to restart and the result starts immediately after boinc is started and finishes successfully and I get credit for them
sometimes two restarts are necessary to finish the result
Every time it looses most of the CPU time and progress done
rescheduling will stop the result for 24 hours before it is started again and this really drops down the possible results I can work on this computers
Additional, if you have more than one result in the pipeline most (all) of them get the same problem

https://www.cosmologyathome.org/result.php?resultid=15967536
here the stderr.txt part of the result
2019-09-01 23:11:07 (6272): Guest Log: tau_recomb/Mpc = 266.84 tau_now/Mpc = 13558.5

2019-09-01 23:12:19 (6272): ERROR: Vboxwrapper lost communication with VirtualBox, rescheduling task for a later time.
2019-09-01 23:12:19 (6272): Powering off VM.
2019-09-01 23:12:21 (6272): Successfully stopped VM.
2019-09-02 13:54:10 (9912): vboxwrapper (7.9.26200): starting
2019-09-02 13:54:12 (9912): Feature: Checkpoint interval offset (593 seconds)
Matthias
ID: 22239 · Report as offensive     Reply Quote
Jonathan

Send message
Joined: 27 Sep 17
Posts: 161
Credit: 7,580,022
RAC: 683
Message 22241 - Posted: 3 Sep 2019, 6:21:21 UTC - in response to Message 22239.  

It usually means your computer is too 'busy'. The wrapper runs at a lower priority and becomes starved and loses communication. You can browse the forums at LHC@home as there are more posts about it.

I only try to run one project at a time if one uses Virtual Box. I can run a single, eight core work unit and I won't have problems. When I try to run two, four core workunits, I start seeing problems with lost communication. Then newer versions of Virtual Box seem to have more problems but it happens with both the 5 and 6 versions.

Are you running any other projects on the computer with problems? Are you running both the camb_legacy work units and the camb_boinc2docker units?

Link to LHC@home
http://lhcathome.cern.ch/lhcathome/
ID: 22241 · Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 29 Aug 07
Posts: 19
Credit: 1,199,011
RAC: 313
Message 22242 - Posted: 3 Sep 2019, 16:52:46 UTC - in response to Message 22241.  

Hello Jonathan,
thanks for the explanation, this looks like that what I've found also over the time.
So I decided to allow only 1 camb_boinc2docker result to be send to my computers
Same for LHC virtualbox projects

But it is not all, on my computers I only use 3 of 4 cores for Boinc to have a separate core for the normal work.
and I've seen this issue several times while the CPU usage is not at 100% for all 4 cores
this also happens while no one is working and no other processes do use the remaining CPU core
On the other side I've also seen camb_boinc2docker results finish without any problem while I use more than the remaining 1 core for my own work
Matthias
ID: 22242 · Report as offensive     Reply Quote

Forums : Technical Support : VM job unmanageable, restarting later