1) Forums : Technical Support : Forced localization (Message 22089)
Posted 10 Feb 2019 by Crystal Pellet
Post:
Any other idea ?

What about this link: http://www.cosmologyathome.org/language_select.php
2) Forums : Technical Support : Postponed: VM job unmanageable, restarting later (Message 22084)
Posted 9 Feb 2019 by Crystal Pellet
Post:
Hi Jonathan,

I saw your post at LHC@home.
It seems VirtualBox 6.0.4 is more sensible for heartbeat issues with our vboxwrapper.
I downgraded my 6.0.4 version to v5.2.26 and restarted a postponed cosmology-task,
that already was postponed twice with version 6.0.4. The 3rd trial with v5.2.26 it succeeded.

http://www.cosmologyathome.org/result.php?resultid=5588234

Downgrading with Cosmo with suspended tasks works. LHC-tasks will error out, due to incompatibility 6.0.4 and 5.2.26 for the saved snapshots.
3) Forums : Technical Support : ArchLinux 64-bit "VirtualBox is not installed" (Message 20808)
Posted 2 Feb 2016 by Crystal Pellet
Post:
I'm getting Workunits now, probably the "restart" (off/on) did the trick. Everything working fine so far, though only 80% of the CPU is used.

Good that its working, thanks for reporting back. I've never seen this 80% CPU thing though. Any other hints what might be going on?

From your Stderr output:

2016-02-02 20:05:07 (10130): Setting CPU throttle for VM. (70%)

You reduced you CPU-usage to 70% in BOINC preferences.
4) Forums : News : New beta app for analyzing Planck data (Message 20764)
Posted 31 Jan 2016 by Crystal Pellet
Post:
I returned 3 tasks so far. 2 were invalid and 1 was valid.
This valid one turned into invalid too now.
I'll hold back the 'In Progress' tasks waiting for a better fix.

Ooh that's fast. Now all three invalids turned to Completed and validated.
I'll start another task.
5) Forums : News : New beta app for analyzing Planck data (Message 20763)
Posted 31 Jan 2016 by Crystal Pellet
Post:
Issue should be fixed now. I'll go back and revalidate your first couple of jobs that I see were marked invalid.

I returned 3 tasks so far. 2 were invalid and 1 was valid.
This valid one turned into invalid too now.
I'll hold back the 'In Progress' tasks waiting for a better fix.
6) Forums : News : New beta app for analyzing Planck data (Message 20761)
Posted 31 Jan 2016 by Crystal Pellet
Post:
Thanks, and don't hesitate to give us any feedback below!

First task - validate error http://www.cosmologyathome.org/result.php?resultid=36244200
7) Forums : Technical Support : slow gpu (Message 20668)
Posted 29 Dec 2015 by Crystal Pellet
Post:
I had to reduce the CPU priority using Prio to fix that (in addition to reserving the two cores).

I got unexpected troubles using Prio - Process Priority Control (v2.0.0.2960) for lowering the cpu-priority.

When running more VM's of several BOINC projects (I had 1 CMS-dev, 2 ATLAS and 1 Cosmo) most VM's are stopping and the task in BOINC is postponed for 24 hours to retry after that period.

2015-12-29 19:27:25 (676): VM state change detected. (old = 'running', new = 'gurumeditation')
2015-12-29 19:27:35 (676): NOTE: VirtualBox has failed to allocate enough memory to continue.
2015-12-29 19:27:35 (676): This might be a temporary problem and so this job will be rescheduled for another time.
2015-12-29 19:27:35 (676): Powering off VM.


Free memory at that moment was 13GB.
This error was reproducible and disappeared after uninstalling 'PRIO' from my win7 machine.
Lowering the CPU-priority by hand or with eFMer's program priority, I didn't had that issue (could run up to 8 VM's).
Disadvantage of last mentioned program is the fact, that all programs only can get the same priority.
8) Forums : Technical Support : Still getting camb_legacy (Message 20633)
Posted 12 Dec 2015 by Crystal Pellet
Post:
I have VirtualBox version: 4.3.12 installed, right at first I did get the boinc2docker jobs, but that stopped, and all I get now are legacy jobs. My laptop meets all the given requirements. How can I fix this? I did try the newer version of Virtual box but it made no difference. Here is a excerpt of my event log I hope it has useful info.

From your stderr output:
2015-10-23 19:05:57 (16820): Hardware acceleration failed with previous execution. Disabling VirtualBox hardware acceleration support.
2015-10-23 19:05:57 (16820): ERROR: Invalid configuration. VM type requires acceleration but the current configuration cannot support it.


It looks like, that you did not enabled VT-x (Virtual acceleration) in your BIOS.
9) Forums : Announcements : Beta testing the new C@H (Message 20582)
Posted 10 Nov 2015 by Crystal Pellet
Post:
Hi Phil1966, hmm thanks for pointing me to this, these jobs hanging after the computation is over ...

That's the problem ..... the computation is not over, but the presence of VM completion file is detected.
The VM can't be cleaned up, because it's still in use for the calculation.

Normally this completion file should come from your machine.
Is it created too early or coming from elsewhere?
10) Forums : General Topics : The new name of the project in BOINC Manager (Message 20548)
Posted 3 Nov 2015 by Crystal Pellet
Post:
Where do you see Cosmohome in BOINC Manager?

Hello Jord,

It was shown for some hours after the beta server was switched off.
In BOINC Manager I even saw cosmohome twice. The beta project and the original Cosmology@Home project had the same name.
11) Forums : News : New Server is Live (Message 20539)
Posted 1 Nov 2015 by Crystal Pellet
Post:
But it seems they not accept reducing the CPU usage. I turned it to 66% to be able to play but boinc use still 100 %. Even if a new WU is started

It seems!

The execution cap for the VM is reduced to 66% and VBoxHeadless.exe is using less cpu.
But the VM processes are running at normal priority, so the user may experience some sluggishness.
12) Forums : News : New Server is Live (Message 20467)
Posted 21 Oct 2015 by Crystal Pellet
Post:
I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know.


Is it not caused by this failure:

Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers
13) Forums : Announcements : Beta testing the new C@H (Message 20365)
Posted 19 Oct 2015 by Crystal Pellet
Post:
So do I understand correctly, you edited the vbox_job.xml file to add <enable_vmsavestate/>? I'm confused though because in http://beta.cosmologyathome.org/result.php?resultid=41172 I see no mention of saving state in the log? In any case, I lowered the disk bound since without checkpointing it wasn't necessary.

I added the <enable_vmsavestate/> to camb_boinc2docker_0.04_vbox_job.xml.
In the stderr of the results never comes saving the state, but only "Stopping VM."
If the save is successful that line is following by "Successfully stopped VM."
If that line is missing, the VM-state turned into the 'Stopped' state.
During the save a file like "2015-10-19T14-01-31-048797600Z.sav" is written into the slot-sub/Snapshots directory.
Sometimes this is a very big file causing disk bound exceeding.
When the "Stopped" state occurs that sav-file seems not deleted after the resume. That's why BOINC is getting an error.
After a good 'Save state', that file is deleted after the resume.
At least doubling the disk bound should be enough to reduce that kind of errors, I think.
14) Forums : Announcements : Beta testing the new C@H (Message 20362)
Posted 19 Oct 2015 by Crystal Pellet
Post:
I played with that option, but suspend/resume still seemed very unstable. Plenty of times it seemed I got into a state where the task just hung indefinitely, restarted anyway, etc... Conversely the current setup has seemed very robust based on results I'm seeing on the beta server. The only drawback is, as you say, having to start over. At least the jobs are very short so you're not losing too much work.

For now I'm going to launch with the current setup. Eventually I think <enable_vm_savestate_usage/> is definitely the way to go, but after a little more debugging.

With those short tasks it's no big issue to restart from the beginning.
I've done several tasks with mentioned tag, suspend and resumed.
Most of the times the VM is turning into the wished savestate, but sometimes the VM doesn't save properly and turns into a stopped state.
After resume such a task, the VM restarted/booted from the beginning and then the task ends into an error.

This is cause you reduced the <rsc_disk_bound> to a too low value and the task errorred out because of EXIT_DISK_LIMIT_EXCEEDED.
Example: http://beta.cosmologyathome.org/result.php?resultid=41172

I've increased the disk_bound myself and waited for a 'stopped' stated VM.
After resume it did not error out: http://beta.cosmologyathome.org/result.php?resultid=41188
Note the peak disk usage is 967.30 MB.
15) Forums : Announcements : Beta testing the new C@H (Message 20358)
Posted 18 Oct 2015 by Crystal Pellet
Post:
A few of the updates which I have pushed recently:

  • I did away with check-pointing entirely for now. I would like to have it, but for now seemed more trouble that its worth. This should solve many memory / disk space / stuck job problems some were seeing.
  • I shortened the jobs (~20min on my laptop) so they're shorter and there's less need for check-pointing anyway.
  • The server status page has a link to the exact version of the code which the server is currently running, for those curious.
  • No more camb_legacy jobs should be sneaking in if your host can run camb_boinc2docker.



Maybe you could add <enable_vm_savestate_usage/> in your camb_boinc2docker_0.04_vbox_job.xml file.
That would save the state of the VM in stead of powering it off when a task is suspended with "Leave Application in Memory" ticked off or when BOINC is stopped/restarted.
Now after a resume a task has to start from the very beginning/the VM is booting from scratch.

CP
16) Forums : Announcements : Beta testing the new C@H (Message 20302)
Posted 28 Aug 2015 by Crystal Pellet
Post:
STEVE: That camb_boinc2docker_boinc_app file (http://beta.cosmologyathome.org/download/2b0/camb_boinc2docker_boinc_app) is and should have been present for at least the last four hours.
...

Shouldn't be that file 1 directory higher: in download-dir itself and not in /2b0/ ?
It looks like it is deleted after every task from the user's machine.

Crystal Pellet: Very useful, thanks. To make sure I understand, the difference between this, and say, just lowering the "Use at most" CPU time option is that this targets camb_boinc2docker specifically, leaving other apps to use that last 8th core?

That's correct!
This last core could be left free for GPU-task support or another single-core CPU-task could use it.
That app_config.xml should be placed in the Cosmology project directory on the users machine (now of course the beta-directory).
17) Forums : Announcements : Beta testing the new C@H (Message 20300)
Posted 27 Aug 2015 by Crystal Pellet
Post:
Crystal Pellet: Yea, I noticed the file was gone and readded it. It might have been gotten deleted again at some other points too, I'll look into why the file deleter is getting it. Btw, your suggestion with the vm_save_state looks really great, I'm testing it now. Thanks!

Hi Marius,

That camb_boinc2docker_boinc_app-file is gone again. At least it's not in the download-dir.

I've successfully tested an option to reduce the number of cores for the Virtual Machine by the user himself.
You don't have to do anything, when the user places following file with the name app_config.xml in his project directory:

<app_config>
<project_max_concurrent>1</project_max_concurrent>
 <app>
  <name>camb_boinc2docker</name>
  <max_concurrent>1</max_concurrent>
 </app>
 <app_version>
  <app_name>camb_boinc2docker</app_name>
  <plan_class>vbox64_mt</plan_class>
  <avg_ncpus>7.000000</avg_ncpus>
  <max_ncpus>7.000000</max_ncpus>
 </app_version>
</app_config>


In the example I've reduced the number of cores to 7 on my 8-threaded machines. The VM is created and running with 7 cores.

Results with 6 cores:

http://beta.cosmologyathome.org/result.php?resultid=1951
http://beta.cosmologyathome.org/result.php?resultid=1939

Result with 7 cores:

http://beta.cosmologyathome.org/result.php?resultid=1897
18) Forums : Announcements : Beta testing the new C@H (Message 20296)
Posted 27 Aug 2015 by Crystal Pellet
Post:
You removed a needed file from the download directory:

cosmohome 27 Aug 11:34:07 Giving up on download of camb_boinc2docker_boinc_app: permanent HTTP error
19) Forums : Announcements : Beta testing the new C@H (Message 20289)
Posted 25 Aug 2015 by Crystal Pellet
Post:
* Multi threaded: By default BOINC is going to allocate all free CPUs to the job. If you have 4 CPUS and in your computing preferences you tell BOINC to use 50% CPU time, it'll run it as 2 CPU job. Is this a solution to what you guys are talking about, or am I misunderstanding?

1. The problem is that VBoxHeadless.exe is running at the 'normal' priority, where normal BOINC-tasks are running at the lowest 'idle' priority.
So your task is concurring with the user himself.
Setting cpu's to e.g. 50% is only a partial solution, cause most crunchers want to use all cores, but al lowest priority for BOINC.
There is a cmdline parameter --nthreads. Maybe you could use that, when taking ncpus - 1 for --nthreads.

2. When your mt-task is starting it pushes all other already running BOINC-tasks to a waiting state, maybe even loosing a lot of computing time when 'Leave in application' is not set or swapped to disk when "LAIM" is set, but system is low on memory. Your VM needs about 1.5GB RAM.

* Crystal Pellet: Thanks good catch, there's an unnecessary vbox_job.xml in there. Btw, what is the <enable_vm_savestate_usage> tag, I'm not seeing that in the docs for vboxwrapper?

If you set that tag, in your *job.xml file together with the also not documented disable_automatic_checkpoint tag the VM will save its state immediately when a user suspend the task (LAIM off) or BOINC stops.
The VM is saved and not poweroff (although of course not running anymore)
After resume no loss, because it restores from the very last point where the user suspended it. In your setup the whole task could be lost when no checkpoint was made or at least the loss of time since the last checkpoint.
Therefore also in my setup to checkpoint every 60 seconds, because no checkpoints needed, but the checkpoint-file updates more regular now.
That file is also used for restoring the cpu-seconds after a task-resume.
20) Forums : Announcements : Beta testing the new C@H (Message 20287)
Posted 25 Aug 2015 by Crystal Pellet
Post:
MT-tasks with 8 threads: Elapsed time avg: 947.76 sec - CPU used 6700,41 seconds on average.

MT-tasks with 7 threads: Elapsed time avg: 1013.74 sec - CPU used 6371.32 seconds on average and on the 8th thread a camb_legacy v2.16 was running.


Next 20