Advanced search

Message boards : News : New Server is Live

1 · 2 · Next
Author Message
Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20463 - Posted: 21 Oct 2015, 15:12:58 UTC
Last modified: 21 Oct 2015, 15:16:10 UTC

Upgrade completed successfully.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20464 - Posted: 21 Oct 2015, 15:14:06 UTC
Last modified: 21 Oct 2015, 21:26:07 UTC

Today with Kevin's help we completed the upgrade of the C@H server.

Everything appears to have gone well, we don't anticipate this should disrupt the jobs you were running. However please comment with any issues you might find related to the upgrade here, we'll use this thread as tech support. Please also make sure you read the new requirements and FAQ.

This upgrade is a big milestone for us. Its the first time in several years the server or the application has been upgraded. It sets us up to deploy and update future applications very easily compared to the work that was required before, and I'm really excited about what we can and will do at C@H in the future!

So what exactly is new?


  • A new app, "camb_boinc2docker", based on the very latest version of CAMB. It runs in an entirely new way, using software I developed for BOINC based on Virtualbox and Docker, and is what will make future development much more efficient.
  • Mac OSX support
  • Multi-threaded support
  • An accurate progress bar
  • The new default "third" BOINC credit system
  • A very recent version of the BOINC server software which includes a number of changes e.g. to the forum functionality, etc...
  • A visual redesign of the site.
  • For 32-bit users or users who don't have Virtualbox installed, the existing camb app, now called "camb_legacy", is still supported.
  • The server code is (almost) entirely public on github.



For now I have marked the new app, camb_boinc2docker, as "beta" which means if you would like to run it, you need to check "Run test applications" under your Cosmology@Home preferences in your account. We'll get rid of the beta tag shortly after we are sure everything checks out.

Thank you also to those that ran the beta server over the last month, it is now shutdown permanently and I'll be transferring over the credits you earned in the next few days.

Thanks everyone and feel free to leave comments / questions below,

Marius & C@H team

Jim1348
Send message
Joined: 17 Nov 14
Posts: 47
Credit: 2,358,299
RAC: 0
Message 20465 - Posted: 21 Oct 2015, 17:36:03 UTC - in response to Message 20464.

After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20466 - Posted: 21 Oct 2015, 19:50:22 UTC - in response to Message 20465.
Last modified: 21 Oct 2015, 19:50:32 UTC

After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work.

Thanks for the update. I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know. At least it dies very quickly so there's not much wasted effort. Both I think are sourced by a bug in Docker (this if you're really following along) which I believe should be fixed in version 1.9.0 which should be out literally any day now. As soon as it is I'll update camb_boinc2docker.

Crystal Pellet
Send message
Joined: 12 Feb 13
Posts: 21
Credit: 351,882
RAC: 1
Message 20467 - Posted: 21 Oct 2015, 20:44:46 UTC - in response to Message 20466.

I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know.


Is it not caused by this failure:

Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20468 - Posted: 21 Oct 2015, 21:40:10 UTC - in response to Message 20467.


Is it not caused by this failure:

Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers


That's right, this is what I think is fixed in 1.9.0, granted I'm not 100% sure. We should find out in a few days and if its not then there's some other options.

Jim1348
Send message
Joined: 17 Nov 14
Posts: 47
Credit: 2,358,299
RAC: 0
Message 20469 - Posted: 21 Oct 2015, 22:57:45 UTC

Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero.
camb_boinc2docker_1826_1445437510.429958_0

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20470 - Posted: 21 Oct 2015, 23:18:41 UTC
Last modified: 21 Oct 2015, 23:18:57 UTC

Note to Mac users: I'm aware of a bug affecting Mac that might be causing your job to finish after ~30 seconds with no error, but produce an invalid result. I'll look into fixing it as soon as I can.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20471 - Posted: 21 Oct 2015, 23:22:42 UTC - in response to Message 20469.

Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero.
camb_boinc2docker_1826_1445437510.429958_0

That's weird, the log looks like the calculation actually ran, so this is unlike any other stuck job I've seen so far where it gets stuck pulling the Docker image at the beginning. Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right?

Jim1348
Send message
Joined: 17 Nov 14
Posts: 47
Credit: 2,358,299
RAC: 0
Message 20472 - Posted: 22 Oct 2015, 0:22:02 UTC - in response to Message 20471.
Last modified: 22 Oct 2015, 0:23:28 UTC

Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right?

I think that there were a small number there also; probably one every other day or so. I don't recall whether they stuck at the beginning or the end of a job (more likely the end), and I have detached from that server so the logs are no longer available at my end.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20477 - Posted: 22 Oct 2015, 13:29:33 UTC
Last modified: 22 Oct 2015, 14:37:38 UTC

I just pushed two updates which should fix:


  • Win/Linux users seeing sporadic invalid jobs (there may still be a few jobs getting stuck the very first time you run camb_boinc2docker, hopefully not many and this should be fixed by Docker 1.9.0 coming out in the next few day)
  • Mac users having all jobs invalid.


Note: I got rid of all the old jobs which didn't have these updates and weren't in progress for anyone, but it may take a bit to flush out the ones that were.

Profile planetclown
Send message
Joined: 16 Feb 12
Posts: 2
Credit: 1,153,776
RAC: 512
Message 20489 - Posted: 22 Oct 2015, 18:28:43 UTC - in response to Message 20471.

I've had three out of a couple hundred get stuck. Usual runtimes on 6 threads of i7-2600K are between 5 - 7 minutes, whereas these ran 45+ min before I aborted them. All three include the message "Image already exists"
log #1
log #2
log #3

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20490 - Posted: 22 Oct 2015, 19:50:11 UTC - in response to Message 20489.
Last modified: 22 Oct 2015, 19:54:03 UTC

Thanks very much for sorting through and sending these logs, its really helpful. According to this, they're all getting stuck after the calculation is complete and the VM is shutdown, so this has nothing to do with Docker. Unfortunately the log doesn't offer many hints.

One thing that'd help, which I know is asking a lot so don't feel obliged to, but if you or anyone else seeing this the next time they get a job stuck, before you abort if you could go into your BOINC folder, in the subfolder slots/X where X is whatever number this job happens to be, and send me the contents of the various text files you find in there (you can send via PM).

I'll keep looking into this.

newman
Send message
Joined: 25 Oct 08
Posts: 3
Credit: 153,670
RAC: 1
Message 20508 - Posted: 24 Oct 2015, 21:22:15 UTC

My new WU also all stuck. In the log I find the following:

Guest Log: progress_template
2015-10-24 23:07:44 (7640): Guest Log: params_00.ini
2015-10-24 23:07:44 (7640): Guest Log: params_01.ini
2015-10-24 23:07:44 (7640): Guest Log: params_02.ini
2015-10-24 23:07:44 (7640): Guest Log: params_03.ini
2015-10-24 23:07:44 (7640): Guest Log: params_04.ini
2015-10-24 23:07:44 (7640): Guest Log: Error: No such image or container: marius311/camb_boinc2docker:0.02

newman
Send message
Joined: 25 Oct 08
Posts: 3
Credit: 153,670
RAC: 1
Message 20509 - Posted: 24 Oct 2015, 21:49:34 UTC

0:00:34.466021 VMMDev: Guest Log: b3d362b23ec1: Download complete
00:00:35.137288 VMMDev: Guest Log: time="2015-10-24T21:45:17.667661453Z" level=debug msg="Downloaded b3d362b23ec1a7ba1694e6607b44c5e3fb63d68e5ae01f339c6abe8b0c995601 to tempfile /var/lib/docker/tmp/GetImageBlob710106124"
00:00:50.600772 VMMDev: Guest Log: a7e6eea8e649: Verifying Checksum
00:00:51.200290 VMMDev: Guest Log: time="2015-10-24T21:45:33.802853465Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"
00:00:53.591200 VMMDev: Guest Log: 757de7f408a1: Verifying Checksum
00:00:54.201561 VMMDev: Guest Log: time="2015-10-24T21:45:36.793200800Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20510 - Posted: 24 Oct 2015, 21:52:57 UTC - in response to Message 20508.

My new WU also all stuck. In the log I find the following:

2015-10-24 23:07:44 (7640): Guest Log: Error: No such image or container: marius311/camb_boinc2docker:0.02

That error is actually expected, it just means this is your first time running camb_boinc2docker and the image needs to be downloaded. The problem is that this download fails, which is what is shown in the several lines below that. This is the problem that I believe will be solved in Docker 1.9.0 which is due in a couple of days. Alternatively if you're eager to get it working now, its pretty sporadic, so you might just try aborting jobs that get stuck and trying to run new ones; once your client gets the image downloaded it won't have to do it again for subsequent jobs.

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20522 - Posted: 26 Oct 2015, 23:19:37 UTC
Last modified: 26 Oct 2015, 23:28:00 UTC

Just pushed two updates:


  • I had fixed the problem of no camb_legacy jobs being sent out, which introduced the problem that users were now getting camb_legacy jobs instead of camb_boinc2docker ones. Anyway, this is all fixed now, it took a bug fix in BOINC's scheduler thanks to David Anderson.
  • I had accidentally deleted the log out button. Its back now on your "Your Account" page. (But why would you want to leave? :)


Next on the TODO list is to fix errors running jobs for Mac users. Hang in OSX guys, sorry its taken this long!

kararom
Send message
Joined: 9 Jan 09
Posts: 69
Credit: 29,506,700
RAC: 0
Message 20532 - Posted: 29 Oct 2015, 16:39:29 UTC
Last modified: 29 Oct 2015, 16:39:50 UTC

Is camb_boinc2docker beta test now?

http://www.cosmologyathome.org/apps.php

P.S.: Button B i u k Quote Code List List= Img URL - not working

Profile Marius
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 29 Jun 15
Posts: 422
Credit: 4,276
RAC: 0
Message 20533 - Posted: 29 Oct 2015, 16:45:32 UTC - in response to Message 20532.
Last modified: 29 Oct 2015, 16:45:41 UTC

Is camb_boinc2docker beta test now?

http://www.cosmologyathome.org/apps.php

Yea still is. There's still a Mac issue to fix and I'd like to upgrade to Docker 1.9.0 before removing the tag, which will likely be another week or so.

P.S.: Button B i u k Quote Code List List= Img URL - not working

Thanks, I did notice that too, will look into it.

newman
Send message
Joined: 25 Oct 08
Posts: 3
Credit: 153,670
RAC: 1
Message 20538 - Posted: 30 Oct 2015, 20:50:44 UTC - in response to Message 20533.

Mow it also works for me to crunch camp_boinc2docker WUs. But it seems they not accept reducing the CPU usage. I turned it to 66% to be able to play but boinc use still 100 %. Even if a new WU is started

1 · 2 · Next

Message boards : News : New Server is Live