Advanced search

Message boards : Technical Support : VM Hypervisor

1 · 2 · Next
Author Message
Rapture
Avatar
Send message
Joined: 27 Oct 07
Posts: 85
Credit: 646,081
RAC: 272
Message 21240 - Posted: 17 Dec 2016, 13:30:12 UTC

While running my first docker work unit, a message appeared below. What does it mean? This looks like it will resume later.

Postponed: VM Hypervisor failed to enter an online state in a timely fashion.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21242 - Posted: 17 Dec 2016, 15:23:04 UTC - in response to Message 21240.

Hi,

Can you abort some of these jobs and update so the log is sent back to the server so I can see the whole thing?

I see you've got VT-x enabled on that machine so that's not the issue, but can you double check you've not been bit by this bug?

Other things to try are 1) fully uninstalling Virtualbox then rebooting and reinstalling it 2) detaching and reattaching the project and 3) just waiting, some times the message seems to dissappear.

Rapture
Avatar
Send message
Joined: 27 Oct 07
Posts: 85
Credit: 646,081
RAC: 272
Message 21246 - Posted: 17 Dec 2016, 16:38:28 UTC - in response to Message 21242.

The first job has been aborted and the update sent to the log. I also detached and reattached to the project. Let me know what you find in the log.

Bill

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21249 - Posted: 17 Dec 2016, 17:26:19 UTC - in response to Message 21246.
Last modified: 17 Dec 2016, 17:26:31 UTC

Can you try updating to the latest Virtualbox from here https://www.virtualbox.org/wiki/Downloads. I see a number of things in their changelog mentioning various Windows 10 bugs fixed, maybe yours is one of them?

Rapture
Avatar
Send message
Joined: 27 Oct 07
Posts: 85
Credit: 646,081
RAC: 272
Message 21250 - Posted: 17 Dec 2016, 17:47:52 UTC - in response to Message 21249.

The BOINC website indicates Virtual Box 5.0.18 as part of the package with BOINC Manager. Is this the latest and correct version to use?

EeqMC252
Send message
Joined: 23 Apr 09
Posts: 2
Credit: 1,750,174
RAC: 0
Message 21251 - Posted: 17 Dec 2016, 22:32:05 UTC - in response to Message 21249.

I am having the same issue, every WU in Cosmology@Home is giving me the "Postponed: VM Hypervisor failed to enter an online state in a timely fashion." message since the Paris migration and migrating to BONIC 7.6.33. ATLAS@Home is doing the same so I reverted back to BOINC 7.6.22 but this condition has not gone away.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21252 - Posted: 18 Dec 2016, 22:19:26 UTC - in response to Message 21251.
Last modified: 18 Dec 2016, 22:20:18 UTC

Hi guys, spent a while today trying to understand this issue. A lot of what I found is summarized at https://github.com/BOINC/boinc/issues/1737, although I don't have a solution yet.

What I can say to you all here is that looking at our C@H database, for ~50% of the jobs where this error appears at all, the job later finishes successfully, so a workaround right now might be to just let them wait, assuming they're not screwing anything else up.

Also, this error affects <1% of all of our jobs, which I suppose is unfortunately no consolation for the unlucky ones here! Even for you, according to the DB it only affects on average 25% of your jobs, so again, even if you see this once, try just letting the job wait, and on average subsequent jobs won't see the error.

A few questions for people with this issue that would help,

1. Do you run other BOINC projects with VM apps?
2. Are your settings such that two VM apps ever run at the same time?
3. Do you run other Virtual machines on your computer (i.e. other Virtualbox, VMWare, or Hyper-V based VM's?)

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21253 - Posted: 18 Dec 2016, 22:23:11 UTC - in response to Message 21250.

The BOINC website indicates Virtual Box 5.0.18 as part of the package with BOINC Manager. Is this the latest and correct version to use?

Its safe to try any of the 5.1.X versions directly from the Virtualbox website.

Profile Sir Thomas W. Kilburn
Send message
Joined: 24 Apr 15
Posts: 1
Credit: 89,176
RAC: 41
Message 21254 - Posted: 19 Dec 2016, 9:32:01 UTC

when running the task will stop and I get this message. vm hypervisor failed to enter an online state in a timely fashion (8cpus) . I get the same message on the other 2 computers also when running cosmology, atlas, lhc or other task that run multiple processors at the same time. could it be the new bonic software. it was ok on the old software. I had just up graded to the new version.+
____________

Rapture
Avatar
Send message
Joined: 27 Oct 07
Posts: 85
Credit: 646,081
RAC: 272
Message 21256 - Posted: 19 Dec 2016, 21:24:38 UTC - in response to Message 21253.

Thanks for the update. I will continue to wait and see what happens with the work unit I currently have. So far, the error keeps appearing every time this same work unit runs. Perhaps 5.1.x version will solve this problem.

Bill

HPETITE
Send message
Joined: 6 Apr 10
Posts: 7
Credit: 278,646
RAC: 0
Message 21263 - Posted: 22 Dec 2016, 6:41:07 UTC - in response to Message 21256.

Hi, I thought I was able to get my cosmology tasks to work.

I followed https://www.cosmologyathome.org/forum_thread.php?id=7444 which has a link to the VirtualBox wiki downloads page. I installed the latest Windows version of Virtual box (5.1.12 r112440) from there and restarted my system. Once I was back up I shut BOINC manager down and clicked on the Oracle VM Virtualbox icon on my desktop and this time it came up with no problem when I clicked on the Oracle VM Virtualbox icon on my desktop, which I wasn't able to do for at least 6 months. There was a list of machines with the "access denied" error for "path not found" so I highlighted them and right clicked to remove them so I was starting fresh. Then I went into BOINC manager and selected the cosmology project and allowed it to accept new tasks. Since then things worked until the tasks got to just above 63%.

BTW, the install that I just did used a different interface than the one you get when you select VB from the BOINC home page. The former shows a list of the VB component and allows you to choose which drive to install them on while the latter does not. Perhaps the BOINC VB installer is broken?

Once I rechecked BOIN manager later I saw that things were not actually working. I did wait more than 5 minutes before posting my previous info but I just checked VB and although I can get it to start up from the desktop, the cosmology tasks were aborted and BOINC manager is reporting the "postponed" error message for each aborted task.

The BOINC log shows:
2016-12-22 01:14:40 | Cosmology@Home | task postponed 86400.000000 sec: VM Hypervisor failed to enter an online state in a timely fashion.

At the bottom of the log for the first virtual machine in VB itself I have this:
2358.154c: supR3HardenedVmProcessInit: Opening vboxdrv stub...
2358.154c: supR3HardenedWinReadErrorInfoDevice: 'RTLdrOpenWithReader failed: -626 (Image='\Device\HarddiskVolume2\Windows\System32\ntdll.dll').'
2358.154c: Error -626 in supR3HardenedWinReSpawn! (enmWhat=3)
2358.154c: NtCreateFile(\Device\VBoxDrvStub) failed: Unknown Status -626 (0xfffffd8e) (rcNt=0xe986fd8e)
VBoxDrvStub error: RTLdrOpenWithReader failed: -626 (Image='\Device\HarddiskVolume2\Windows\System32\ntdll.dll').
11a0.2a14: supR3HardenedWinCheckChild: enmRequest=2 rc=-626 enmWhat=3 supR3HardenedWinReSpawn: NtCreateFile(\Device\VBoxDrvStub) failed: Unknown Status -626 (0xfffffd8e) (rcNt=0xe986fd8e)
VBoxDrvStub error: RTLdrOpenWithReader failed: -626 (Image='\Device\HarddiskVolume2\Windows\System32\ntdll.dll').
11a0.2a14: Error -626 in supR3HardenedWinReSpawn! (enmWhat=3)
11a0.2a14: NtCreateFile(\Device\VBoxDrvStub) failed: Unknown Status -626 (0xfffffd8e) (rcNt=0xe986fd8e)
VBoxDrvStub error: RTLdrOpenWithReader failed: -626 (Image='\Device\HarddiskVolume2\Windows\System32\ntdll.dll').

So, I guess we a not making much progress here.
____________
HPetite

Rapture
Avatar
Send message
Joined: 27 Oct 07
Posts: 85
Credit: 646,081
RAC: 272
Message 21265 - Posted: 28 Dec 2016, 18:42:16 UTC - in response to Message 21253.

What is the status regarding this issue? Have you found anything more?

Bill

Stanley A Bourdon
Volunteer tester
Send message
Joined: 7 Jun 07
Posts: 11
Credit: 82,047
RAC: 16
Message 21267 - Posted: 3 Jan 2017, 15:53:55 UTC

i have 15 units that timed out due to this issue. what is the status please
____________
Stanley


Boinc Wikipedia - the FAQ in active change

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21268 - Posted: 4 Jan 2017, 10:21:26 UTC - in response to Message 21267.

i have 15 units that timed out due to this issue. what is the status please

Could you please hit update on your client so I can take a look at your jobs as well?

I still haven't figured out anything new besides what I summarized in the Github link, but I will take a look at it further now that I'm back from the holidays.

Stanley A Bourdon
Volunteer tester
Send message
Joined: 7 Jun 07
Posts: 11
Credit: 82,047
RAC: 16
Message 21269 - Posted: 5 Jan 2017, 18:11:18 UTC
Last modified: 5 Jan 2017, 18:17:17 UTC

I aborted them when they restarted.

I do not know why but they had sat at 55% with the message about being postponed until they went past due. I let them sit there until something caused them to restart from zero. Than i aborted them not wanting to use resources in units already marked as past due.

I did change 2 things updated to latest release about 2 months ago and given boinc 50% of the available CPU so now 4 CPU instead of 1 or 2 when it was warmer.

most of the other apps that use VM i run also have this problem it might be all. i also aborted several for other projects that went past due.

did you look to see if there is an increase in units going past due / aborted by user / users reducing share or going no new task etc. ?
____________
Stanley


Boinc Wikipedia - the FAQ in active change

Stanley A Bourdon
Volunteer tester
Send message
Joined: 7 Jun 07
Posts: 11
Credit: 82,047
RAC: 16
Message 21270 - Posted: 7 Jan 2017, 7:57:26 UTC

figured out what i did to get them to restart without the message. i rebooted the machine and they went back to 0 state. then they run for a bit then show with the message again
____________
Stanley


Boinc Wikipedia - the FAQ in active change

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 427
Credit: 4,276
RAC: 0
Message 21271 - Posted: 7 Jan 2017, 19:51:08 UTC - in response to Message 21270.
Last modified: 7 Jan 2017, 19:51:24 UTC

figured out what i did to get them to restart without the message. i rebooted the machine and they went back to 0 state. then they run for a bit then show with the message again

I'm still not seeing your jobs show up on the server, is your client not uploading the failed jobs for some reason?

In any case, so I understand the symptoms for you, is it true to say that your jobs are always running to ~55% then stopping with this error message, then never completing?

Stanley A Bourdon
Volunteer tester
Send message
Joined: 7 Jun 07
Posts: 11
Credit: 82,047
RAC: 16
Message 21272 - Posted: 7 Jan 2017, 23:41:20 UTC

yes that is correct.

when i had noticed that they were over 7 days past due, i had looked at my account and they were marked past due (or whatever the correct words are from the server for past due).

i had decided to leave them anyway to see what happened.
what happened was that on a reboot they went back to 0 compleated and started running again. I am now sure that has happened several times. I had thought that some had compleated but now i am almost certain that they went to 0 and crunched to 55% compleated and than got the "Postponed: VM Hypervisor failed to enter an online state in a timely fashion." message.

with them over a week past due i aborted them, between 5-10 units, and did an update. i do not see any results on my account page now.
____________
Stanley


Boinc Wikipedia - the FAQ in active change

Profile robertmiles
Send message
Joined: 26 Oct 11
Posts: 48
Credit: 291,699
RAC: 165
Message 21282 - Posted: 14 Jan 2017, 20:24:52 UTC - in response to Message 21272.

A problem with my camb_boinc2docker 2.04 (vbox_mt) tasks:

Each arrives in state Ready to start (7 CPUs)

Every time one starts running, it runs a little over 10 minutes, then goes into a state of:

Postponed: VM Hypervisor failed to enter an online state in a timely fashion.

No checkpoint was written first. Windows 10 requires fairly frequent reboots for updates, and after every such reboot, the elapsed time drops to zero.

The only other BOINC project I subscribe to that has VM workunits is showing the same problem.

Initial runtime estimate is 00:06:05, the remaining time estimate after this happens is 00:02:21.

I don't think I've ever had any successful VM tasks before, so could anyone suggest what could be the problem?

The normal ways of checking the version number don't work with VirtualBox, so is another way available?

Profile robertmiles
Send message
Joined: 26 Oct 11
Posts: 48
Credit: 291,699
RAC: 165
Message 21284 - Posted: 16 Jan 2017, 2:03:11 UTC - in response to Message 21282.

Since writing the above, I shut down BOINC and installed VirtualBox 5.1.12, then restarted Windows 10 and BOINC. This MAY have fixed the problem for the other BOINC project with VM tasks, but it will take weeks to be sure. No Cosmology@HOME tasks have restarted yet, so there's no information on whether this is also a fix for them.

1 · 2 · Next

Message boards : Technical Support : VM Hypervisor