Advanced search

Message boards : Announcements : Beta testing the new C@H

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile planetclown
Send message
Joined: 16 Feb 12
Posts: 2
Credit: 1,400,827
RAC: 236
Message 20336 - Posted: 18 Sep 2015, 17:40:21 UTC - in response to Message 20334.

Just reporting back that I've run 500+ camb_boinc2docker tasks successfully in the last four days. This is using BOINC 7.6.9, Vbox 4.3.30 on MS Windows 10 running on 7 out of 8 threads of an i7-2600K.

No errors, but I did pick up one camb_legacy task during this time. Maybe there were no available boinc2docker tasks at the time.

Thank you!

Jim1348
Send message
Joined: 17 Nov 14
Posts: 52
Credit: 2,415,981
RAC: 2,662
Message 20337 - Posted: 19 Sep 2015, 13:53:32 UTC - in response to Message 20336.

No errors, but I did pick up one camb_legacy task during this time. Maybe there were no available boinc2docker tasks at the time.

I get the camb_legacy tasks on a regular basis. They are almost 1/2 of the total in the buffer at the moment. I wonder why they are necessary at all? Are the old legacy jobs going to continue?

At any rate, I would also like the option to do only the new camb_boinc2docker (vbox64_mt) tasks. There is no point in installing the VBox otherwise.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20338 - Posted: 20 Sep 2015, 10:57:10 UTC

krarom: I think I see the problem. I need to check with BOINC people, but it looks like the maximum jobs time is calculated based on a specified maximum jobs *flops* combined with assuming your computer will be using all of its cores for the job. Hence if you use less cores (which should certainly be your right to do so), it may go over time. A workaround is for me to just increase the max flops, which I will update soon.

planetclown/Jim1248: Thanks guys. Yea, I had seen similar behavior before and asked about it on the BOINC lists but I thought I had a fix, but now that you guys mention this I think I screwed up the fix. It has to do with the fact that the feeder cashes to-be-sent jobs 200 at a time, so if the numbers of available boinc2dockers jobs drops below 200, some legacy ones sneak in an never leave. Will fix this soon.

I have several updates on the backlog, just waiting for an update to the vboxwrapper version and they should go live soon. Thanks!

kararom
Send message
Joined: 9 Jan 09
Posts: 69
Credit: 29,506,700
RAC: 0
Message 20339 - Posted: 29 Sep 2015, 17:58:10 UTC

How long will the testing?

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20340 - Posted: 29 Sep 2015, 18:36:48 UTC

Good question, right now I've been a little stuck waiting on some BOINC people to compile the latest version of vboxwrapper. But after that I don't think there's anything major left to do. I can't give a date yet, but I would say at the very latest in the next month, but likely (and hopefully) sooner than that we'll switch over.

By the way, we have *yet* to have a successful result computed on a Mac. I'm kind of excited to see the first OSX C@H result ever! Likely we're not on most Mac users radar having never supported it, anyone here have ideas how to get the word out a bit?

Profile Coleslaw
Avatar
Send message
Joined: 6 Aug 08
Posts: 16
Credit: 1,634,352
RAC: 45
Message 20341 - Posted: 3 Oct 2015, 3:18:17 UTC - in response to Message 20340.
Last modified: 3 Oct 2015, 3:22:44 UTC

Try contacting Zombie67. I believe he has a ton of them. http://wuprop.boinc-af.org/show_user.php?userid=797 or here http://www.cosmologyathome.org/show_user.php?userid=193

You may even want to go to some of the teams official forums and post a request. Some of the larger teams tend to have a lot of resources to toss around. You may also want to go to the BOINC forums and put out a post requesting testers. https://boinc.berkeley.edu/dev/

Perhaps putting a news post on the front page of the project requesting it as well. Then there is also the notices function in BOINC Manager that could be used but only if you have new enough Server software to support it.
____________

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20342 - Posted: 12 Oct 2015, 16:16:50 UTC

I'm running some upgrades on the beta server, it may be a little unstable over the next 24 hours. In particular, I'm seeing some jobs freeze indefinitely at 0.100%, so if that's happening to you, I'm looking into it.

M0CZY
Avatar
Send message
Joined: 27 Oct 07
Posts: 20
Credit: 35,414
RAC: 0
Message 20343 - Posted: 13 Oct 2015, 16:50:22 UTC

My Linux computer has downloaded vboxwrapper_26175_x86_64-pc-linux-gnu and vm_isocontext_v0.4.iso, and now all the work units that I have done since have run properly and validated correctly.

I am using Boinc version 7.2.42 with VirtualBox version 5.0.6
____________
The biggest threat to public safety and security is not terrorism, it is Government abuse of authority.

Bitcoin Donations: 1Le52kWoLz42fjfappoBmyg73oyvejKBR3

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20344 - Posted: 13 Oct 2015, 19:57:53 UTC

Thanks for the update M0CZY, and everyone still testing. In fact, if you're reading this and could do so, it'd be helpful if you could hop back on to the beta server and run a few jobs to help verify all these updates are OK. If you want you can keep your clients on the beta server, only about a week left of beta testing, and your credits will transfer when we go live.

A few of the updates which I have pushed recently:


  • I did away with check-pointing entirely for now. I would like to have it, but for now seemed more trouble that its worth. This should solve many memory / disk space / stuck job problems some were seeing.
  • I shortened the jobs (~20min on my laptop) so they're shorter and there's less need for check-pointing anyway.
  • The server status page has a link to the exact version of the code which the server is currently running, for those curious.
  • No more camb_legacy jobs should be sneaking in if your host can run camb_boinc2docker.


Jim1348
Send message
Joined: 17 Nov 14
Posts: 52
Credit: 2,415,981
RAC: 2,662
Message 20345 - Posted: 14 Oct 2015, 17:22:33 UTC - in response to Message 20300.

I've successfully tested an option to reduce the number of cores for the Virtual Machine by the user himself.
You don't have to do anything, when the user places following file with the name app_config.xml in his project directory:

Thanks. I was about to give up hope after finding that camb_boinc2docker did not honor the app_config I use to reserve cores for my GPUs.

That post really should be made a sticky, or else there should be some way to set the number of cores in "Cosmology@Home preferences".

kararom
Send message
Joined: 9 Jan 09
Posts: 69
Credit: 29,506,700
RAC: 0
Message 20346 - Posted: 14 Oct 2015, 21:13:44 UTC

Really it would be convenient to switch the setting of the number of processor cores used in Cosmology@Home preferences.

If ai am use more than two cores, my PC is really slow

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20347 - Posted: 14 Oct 2015, 22:33:15 UTC - in response to Message 20346.
Last modified: 14 Oct 2015, 22:35:08 UTC

Really it would be convenient to switch the setting of the number of processor cores used in Cosmology@Home preferences.

If ai am use more than two cores, my PC is really slow


I'm not sure such a setting belongs in the C@H wide preferences, it makes more sense to me on a per-host level, such as is possible with Crystal Pellet's app_config solution. Unfortunately there's no GUI interface to do this so it is rather inconvenient as you say.



Thanks. I was about to give up hope after finding that camb_boinc2docker did not honor the app_config I use to reserve cores for my GPUs.

That post really should be made a sticky, or else there should be some way to set the number of cores in "Cosmology@Home preferences".


I will definitely keep that solution on a main page. What do you mean camb_boinc2docker did not honor the app_config though? Does Crystal Pellet's app_config.xml not work for you?

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20348 - Posted: 14 Oct 2015, 22:38:48 UTC

I think generally I don't completely understand the experience people are having with the camb_boinc2docker app who also run other projects concurrently, since I myself am testing only with C@H. Maybe some of you could explain a bit more how you would like things to run vs. how they do run?

Jim1348
Send message
Joined: 17 Nov 14
Posts: 52
Credit: 2,415,981
RAC: 2,662
Message 20349 - Posted: 14 Oct 2015, 23:20:33 UTC - in response to Message 20347.

Thanks. I was about to give up hope after finding that camb_boinc2docker did not honor the app_config I use to reserve cores for my GPUs.

That post really should be made a sticky, or else there should be some way to set the number of cores in "Cosmology@Home preferences".


I will definitely keep that solution on a main page. What do you mean camb_boinc2docker did not honor the app_config though? Does Crystal Pellet's app_config.xml not work for you?

Yes, his app_config works fine. But I have a GPU on POEM. If I put the following app_config in the POEM project folder, it will reserve one CPU core for the GPU when I am running POEM and most CPU projects, but not when I am running POEM and camb_boinc2docker.

app_config>
<app>
<name>poemcl</name>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>1.0</cpu_usage>
</gpu_versions>
</app>
</app_config>

I suppose it is because camb_boinc2docker is multi-threaded, and the other CPU projects that I run are not. But it is not because of VBox, because the exclusion works OK when I run ATLAS. That is, a CPU core is then properly reserved for my GPU.

Jim1348
Send message
Joined: 17 Nov 14
Posts: 52
Credit: 2,415,981
RAC: 2,662
Message 20350 - Posted: 15 Oct 2015, 18:55:57 UTC
Last modified: 15 Oct 2015, 19:14:06 UTC

Both Valid and Invalid (?)

I have been puzzling over the invalids that I get, which invariably run only about 35 seconds on my machine. They validate OK on other machines, but I can see no rhyme or reason for it in terms of operating system, CPU type or any other variable. And the other machines that validate mine also have just about the same percentage of invalids on their other work units.

That is the same situation with ATLAS by the way, and it has been explained that there is a certain indeterminism in the calculations, though how that occurs is not clear to me.

However, the interesting result is this one:
http://beta.cosmologyathome.org/workunit.php?wuid=24552

I have completed a work unit that was invalid on my same machine a few hours earlier. That is interesting.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20351 - Posted: 15 Oct 2015, 21:09:14 UTC - in response to Message 20350.
Last modified: 15 Oct 2015, 21:09:58 UTC

Re: the app config, my understanding is that giving peomcl 1.0 doesn't mean it guarantees 1 CPU, its that it limits it to 1 CPU, so it has no role in how camb_boinc2docker runs, which will simply use as many CPUs as BOINC allows it to.

As for the invalid jobs, yea I've been looking at it, its happening sporadically for you and pretty much everyone else. I don't have a solution at the moment, but fortunately its not causing the jobs to hang or anything, they just quickly die and you move on to the next. I'll make a note of it in the FAQ I plan to release along with server upgrade.

Its definitely a Docker problem, so unrelated to whatever ATLAS is seeing. Docker is just failing to pull the latest camb_boinc2docker image (eg here's the log from a recent invalid one from your computer). I saw on github they might have fixed it in Docker 1.9.0 which is due out very soon, but for now I think we will launch with camb_boinc2docker v0.04 which you guys have been running and seems fairly stable and upgrade to 1.9.0 afterwards.

Profile [VENETO] boboviz
Send message
Joined: 28 Nov 07
Posts: 12
Credit: 26,360
RAC: 0
Message 20353 - Posted: 17 Oct 2015, 16:32:23 UTC

My pc crash and turn off with this new app. :-(
The only two wus i complete is "validation error"

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20354 - Posted: 17 Oct 2015, 17:31:29 UTC - in response to Message 20353.

boboviz, did you use a different username on the beta server? I can't find your jobs, but if they're able to crash your computer I'd like to take a look at them right away!

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20355 - Posted: 17 Oct 2015, 17:32:34 UTC

I introduced a bug yesterday which is causing everyone's results to be marked invalid. I'll fix it shortly, then will try to revalidate the jobs.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 430
Credit: 4,276
RAC: 0
Message 20356 - Posted: 17 Oct 2015, 22:51:30 UTC - in response to Message 20353.
Last modified: 17 Oct 2015, 22:53:48 UTC

boboviz, actually I see your results now, don't know how I missed it before.

You completed several jobs successfully which were unfortunately marked invalid incorrectly due this bug I just mentioned, e.g. this one. That'll be fixed soon and you'll get credit for those jobs you already did.

Since I see you're running jobs alright, I wonder if you computer turning off could be an overheating issue? Do you still see it happen if you lower your BOINC CPU usage to say 50%?

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Announcements : Beta testing the new C@H