1) Message boards : Technical Support : postponed VM hypervisor failed to enter and online state in a timely fashion (Message 21653)
Posted 7 Mar 2018 by xii5ku
The problem between camb_boinc2docker and MindModeling@Home from my previous message turned out unrelated to the use of two boinc client instances. If MindModeling runs in the same client instance as camb_boinc2docker, the latter gets stuck in "postponed: VM Hypervisor failed to enter an online state in a timely fashion" too.
2) Message boards : Technical Support : vbox64_mt task ends in computation error on linux (Message 21650)
Posted 4 Mar 2018 by xii5ku
Indeed, your failed jobs did not use CPU time (apart from those 13 seconds).

From the FAQ:
The first thing the jobs do is download the necessary Docker container from the Docker servers. During this download, you will see the job progress frozen at 0.100%. Once the download is complete, the progress bar should continue normally and your CPU usage will jump up as the computation begins.

Maybe this download took too long, or it didn't even start.
3) Message boards : Technical Support : vbox64_mt task ends in computation error on linux (Message 21648)
Posted 4 Mar 2018 by xii5ku
Both of these tasks ended with "exceeded elapsed time limit 1386.68 (86400.00G/62.31G)".

If I remember correctly, had this problem too when I came back to Cosmology this year after several months pause. Luckily I had this problem at another project before and knew a workaround for it. (That was LHC's sixtrack application. There it was a reoccurring problem to which I had to apply the following workaround periodically. Here at Cosmo I only needed to apply it once.)

Presumed cause of the problem:
- Either the GFLOPS of your host were severely over-estimated,
- or the FPOPS of the workunits that you received were severely under-estimated.
Boinc client terminated the task while still running because it observed the task to overrun its estimated maximum runtime.

Workaround:

  • Download a few tasks.
  • Set "no new tasks".
  • Shutdown the boinc client.
  • Make a backup of /var/lib/boinc/client_state.xml.
  • Open /var/lib/boinc/client_state.xml in a text editor.
  • Search for "cosmologyathome".
  • Search for "workunit".
  • Increase the value within <rsc_fpops_bound>...</rsc_fpops_bound> by a factor of thousand. I.e. insert three more "0"s before the decimal dot or move the dot to the right by three digits.
  • Repeat for all remaining workunits in the file.
  • Save the file.
  • Start boinc client.
  • Let the tasks run, hopefully complete correctly, upload, and report.
  • Set "allow new tasks" if this went well. Your host's GFLOPS (which you can see at application details via the host details page) should be adjusted now at the server, and the newly downloaded tasks should complete properly too.

4) Message boards : Technical Support : postponed VM hypervisor failed to enter and online state in a timely fashion (Message 21645)
Posted 3 Mar 2018 by xii5ku
I have been running "camb_boinc2docker 2.05 (vbox64_mt)" with good success on host 309722 during the last 19 days. I had about 2 tasks per day which got stuck in state "postponed: VM Hypervisor failed to enter an online state in a timely fashion". I aborted these tasks automatically by means of a script which periodically looks for such tasks and weeds them out.

Unfortunately, since today at 17:00 UTC I got a very high rate of such failures. Here is the list of failing tasks since the beginning of the high failure rate. (225 tasks, and apparently more of such failures to come from what I downloaded already. Sorry for the long post. I would have enclosed the list in SPOILER tags if this board had them. Edit: List removed.) I still have some tasks succeeding, but they seem to be the minority now.

Edit: Solved!

Or worked around at least. Here is what really happened:

I now noticed that the high rate of failures began about at the same time when a second boinc-client instance on the same host began running MindModeling@Home tasks. I cured the problem by shutting down both boinc-client instances, and then restarting only the instance which runs Cosmology@Home.

Apparently vboxwrapper has got some big issue with multiple boinc-client instances on the same host.

(That's a shame. I like using different client instances for running projects in parallel for which I want to control the work buffers independently. E.g. run one project with a shallow queue and another with a deep queue. Or avoid that a project with steady task supply prevents the requesting of tasks from a project with intermittent task supply; like Cosmo vs. MindModeling.)
5) Message boards : News : C@H a BOINC Pentathlon project! (Message 21444)
Posted 21 May 2017 by xii5ku
Thanks for being such a good host of the BOINC Pentathlon Marathon.
Best wishes to the Cosmology@Home team!