1) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21829)
Posted 5 hours ago by xii5ku
Post:
PS,
perhaps the camb_boinc2docker work generator shouldn't even work faster than it does. Given that the tasks have a run time of just a few minutes but produce results files that are 2.6 MB big, I wonder if Cosmology@Home's internet connection bandwidth is another limit which prevents a higher production rate here.

(In other words, camb_boinc2docker tasks are not very effective from the Distributed Computing point of view, because of too short of a run time and too large network transfers.)
2) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21828)
Posted 6 hours ago by xii5ku
Post:
@scole of TSBT, camb_boinc2dockers can be bunkered.

But (a), certainly not many did bunker camb_boinc2dockers, and not much of it, and (b) whatever bunkering activity was going on on Thursday, it didn't have any bearing on the further decline of work availability that we saw over the course of Friday, especially towards Friday night.

Contributors' computing capacity during this sprint simply exceeds the pace of Cosmology@Home's work generator, by far. (My initial predictions in this thread that work would become available again some time after the start of the sprint were, unfortunately, far off.)
3) Forums : Technical Support : Hostname problems (Message 21823)
Posted 19 hours ago by xii5ku
Post:
I wrote:
I suspect the project scheduler sent broken replies for a brief while, which messed up client state. It happened around 12:48 UTC, if I recall correctly.

From the same moment on, Gridcoin's points-per-hour took a turn.
Either Marius did something, or Gridcoin. I guess Marius.

4) Forums : Technical Support : Hostname problems (Message 21822)
Posted 21 hours ago by xii5ku
Post:
I suspect the project scheduler sent broken replies for a brief while, which messed up client state. It happened around 12:48 UTC, if I recall correctly.

A possible fix which worked for me was to shut down the client, correct all corrupted download URLs in client_state.xml using a text editor or a tool like sed, then start the client again.
5) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21807)
Posted 1 day ago by xii5ku
Post:
On Thursday, I wrote:
As far as I can tell, the Cosmology@Home server is generating enough work for everyone who wants to join the sprint.

Of course those who are impatient may be dissatisfied with the pace at which new work is generated. So, be patient. :-)

We are now 7+ hours into the sprint, and unfortunately, the rate of work generation is not just low, it is too low.

There is more participation than available work. :-/
6) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21804)
Posted 1 day ago by xii5ku
Post:
bcavnaugh wrote:
Also by "equal ground" being they all get 100 Gallons of Full not some getting 100 Gallons and some only 50 Gallons of Full.

There is no inequality. Everybody has the same good luck or bad luck getting tasks.

There were hours earlier today, during which about 75 % of the requests for work would get you work. Right now, only a few hours before the actual start of the sprint, unfortunately only about 10 % of all requests for work get you work.

It will get better again when the initial rush is over. Nothing out of the ordinary.
7) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21802)
Posted 1 day ago by xii5ku
Post:
UBT - Timbo wrote:
The question though is will the validator, validate all the tasks being uploaded?

Currently the Server Status page shows nearly 160,000 tasks awaiting validation...and that queue will no doubt increase due to the Sprint.

So, a lot of people may be disappointed that they receive little or no credit over the duration of the Sprint (as only credits awarded between tonight at 10pm UTC and Sunday at 10pm UTC will count towards the Sprint statistics), UNLESS the validator can be given a nudge to clear the backlog?

I am pretty sure that camb_boinc2docker work will be validated within seconds or minutes after upload, as usual, whereas there is great risk that camb_legacy work will not be validated before the end of the sprint. See thread " camb_legacy tasks temporarily suspended".


UBT - Timbo wrote:
PS: It is almost IMPOSSIBLE to edit a post as the background to the post and the colour of the text entered are both WHITE... :-(

Workaround: Click on the Preview button, and you will get a white-on-black text input again, instead of white-on-white.
8) Forums : News : camb_legacy tasks temporarily suspended (Message 21800)
Posted 2 days ago by xii5ku
Post:
Marius,

the server status page currently shows that new camb_legacy work is generated, but the corresponding script_validator is not running. Also, 158071 tasks are shown waiting for validation, certainly mainly camb_legacy tasks.

Can you clarify why the work is generated even though the validator is stopped? Is validation planned for the whole active batch of camb_legacy work to occur at a scheduled time?
9) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21799)
Posted 2 days ago by xii5ku
Post:
As far as I can tell, the Cosmology@Home server is generating enough work for everyone who wants to join the sprint.

Of course those who are impatient may be dissatisfied with the pace at which new work is generated. So, be patient. :-)

mmonnin wrote:
I started up this project again the other day and it seems like only 500 of each app were in the queue. Now its dry.

The server is going to have a bad day.

No, it only had two or three bad hours. (And may have another one or two bad hours later today.)
10) Forums : General Topics : Formula BOINC sprint at Cosmology@Home, July 2018 (Message 21788)
Posted 2 days ago by xii5ku
Post:
Hi,

just a heads up that there will be increased traffic during the next few days. Formula BOINC scheduled a sprint at Cosmology@Home from Thursday, July 19, 22:00 UTC to Sunday, July 22, 22:00 UTC.
http://formula-boinc.org/sprint.py?sprint=11&year=2018

Recent Formula BOINC sprints at other projects generated double or more the daily production compared to normal days.
11) Forums : Technical Support : postponed VM hypervisor failed to enter and online state in a timely fashion (Message 21653)
Posted 7 Mar 2018 by xii5ku
Post:
The problem between camb_boinc2docker and MindModeling@Home from my previous message turned out unrelated to the use of two boinc client instances. If MindModeling runs in the same client instance as camb_boinc2docker, the latter gets stuck in "postponed: VM Hypervisor failed to enter an online state in a timely fashion" too.
12) Forums : Technical Support : vbox64_mt task ends in computation error on linux (Message 21650)
Posted 4 Mar 2018 by xii5ku
Post:
Indeed, your failed jobs did not use CPU time (apart from those 13 seconds).

From the FAQ:
The first thing the jobs do is download the necessary Docker container from the Docker servers. During this download, you will see the job progress frozen at 0.100%. Once the download is complete, the progress bar should continue normally and your CPU usage will jump up as the computation begins.

Maybe this download took too long, or it didn't even start.
13) Forums : Technical Support : vbox64_mt task ends in computation error on linux (Message 21648)
Posted 4 Mar 2018 by xii5ku
Post:
Both of these tasks ended with "exceeded elapsed time limit 1386.68 (86400.00G/62.31G)".

If I remember correctly, had this problem too when I came back to Cosmology this year after several months pause. Luckily I had this problem at another project before and knew a workaround for it. (That was LHC's sixtrack application. There it was a reoccurring problem to which I had to apply the following workaround periodically. Here at Cosmo I only needed to apply it once.)

Presumed cause of the problem:
- Either the GFLOPS of your host were severely over-estimated,
- or the FPOPS of the workunits that you received were severely under-estimated.
Boinc client terminated the task while still running because it observed the task to overrun its estimated maximum runtime.

Workaround:

  • Download a few tasks.
  • Set "no new tasks".
  • Shutdown the boinc client.
  • Make a backup of /var/lib/boinc/client_state.xml.
  • Open /var/lib/boinc/client_state.xml in a text editor.
  • Search for "cosmologyathome".
  • Search for "workunit".
  • Increase the value within <rsc_fpops_bound>...</rsc_fpops_bound> by a factor of thousand. I.e. insert three more "0"s before the decimal dot or move the dot to the right by three digits.
  • Repeat for all remaining workunits in the file.
  • Save the file.
  • Start boinc client.
  • Let the tasks run, hopefully complete correctly, upload, and report.
  • Set "allow new tasks" if this went well. Your host's GFLOPS (which you can see at application details via the host details page) should be adjusted now at the server, and the newly downloaded tasks should complete properly too.

14) Forums : Technical Support : postponed VM hypervisor failed to enter and online state in a timely fashion (Message 21645)
Posted 3 Mar 2018 by xii5ku
Post:
I have been running "camb_boinc2docker 2.05 (vbox64_mt)" with good success on host 309722 during the last 19 days. I had about 2 tasks per day which got stuck in state "postponed: VM Hypervisor failed to enter an online state in a timely fashion". I aborted these tasks automatically by means of a script which periodically looks for such tasks and weeds them out.

Unfortunately, since today at 17:00 UTC I got a very high rate of such failures. Here is the list of failing tasks since the beginning of the high failure rate. (225 tasks, and apparently more of such failures to come from what I downloaded already. Sorry for the long post. I would have enclosed the list in SPOILER tags if this board had them. Edit: List removed.) I still have some tasks succeeding, but they seem to be the minority now.

Edit: Solved!

Or worked around at least. Here is what really happened:

I now noticed that the high rate of failures began about at the same time when a second boinc-client instance on the same host began running MindModeling@Home tasks. I cured the problem by shutting down both boinc-client instances, and then restarting only the instance which runs Cosmology@Home.

Apparently vboxwrapper has got some big issue with multiple boinc-client instances on the same host.

(That's a shame. I like using different client instances for running projects in parallel for which I want to control the work buffers independently. E.g. run one project with a shallow queue and another with a deep queue. Or avoid that a project with steady task supply prevents the requesting of tasks from a project with intermittent task supply; like Cosmo vs. MindModeling.)
15) Forums : News : C@H a BOINC Pentathlon project! (Message 21444)
Posted 21 May 2017 by xii5ku
Post:
Thanks for being such a good host of the BOINC Pentathlon Marathon.
Best wishes to the Cosmology@Home team!