Forums :
News :
New Server is Live
Message board moderation
Author | Message |
---|---|
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Upgrade completed successfully. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Today with Kevin's help we completed the upgrade of the C@H server. Everything appears to have gone well, we don't anticipate this should disrupt the jobs you were running. However please comment with any issues you might find related to the upgrade here, we'll use this thread as tech support. Please also make sure you read the new requirements and FAQ. This upgrade is a big milestone for us. Its the first time in several years the server or the application has been upgraded. It sets us up to deploy and update future applications very easily compared to the work that was required before, and I'm really excited about what we can and will do at C@H in the future! So what exactly is new?
|
Jim1348 Send message Joined: 17 Nov 14 Posts: 92 Credit: 4,031,155 RAC: 148 |
After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work. Thanks for the update. I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know. At least it dies very quickly so there's not much wasted effort. Both I think are sourced by a bug in Docker (this if you're really following along) which I believe should be fixed in version 1.9.0 which should be out literally any day now. As soon as it is I'll update camb_boinc2docker. |
Crystal Pellet Send message Joined: 12 Feb 13 Posts: 23 Credit: 361,123 RAC: 0 |
I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know. Is it not caused by this failure: Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
That's right, this is what I think is fixed in 1.9.0, granted I'm not 100% sure. We should find out in a few days and if its not then there's some other options. |
Jim1348 Send message Joined: 17 Nov 14 Posts: 92 Credit: 4,031,155 RAC: 148 |
Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero. camb_boinc2docker_1826_1445437510.429958_0 |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Note to Mac users: I'm aware of a bug affecting Mac that might be causing your job to finish after ~30 seconds with no error, but produce an invalid result. I'll look into fixing it as soon as I can. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero. That's weird, the log looks like the calculation actually ran, so this is unlike any other stuck job I've seen so far where it gets stuck pulling the Docker image at the beginning. Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right? |
Jim1348 Send message Joined: 17 Nov 14 Posts: 92 Credit: 4,031,155 RAC: 148 |
Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right? I think that there were a small number there also; probably one every other day or so. I don't recall whether they stuck at the beginning or the end of a job (more likely the end), and I have detached from that server so the logs are no longer available at my end. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
I just pushed two updates which should fix:
|
![]() Send message Joined: 16 Feb 12 Posts: 2 Credit: 2,004,545 RAC: 393 |
|
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Thanks very much for sorting through and sending these logs, its really helpful. According to this, they're all getting stuck after the calculation is complete and the VM is shutdown, so this has nothing to do with Docker. Unfortunately the log doesn't offer many hints. One thing that'd help, which I know is asking a lot so don't feel obliged to, but if you or anyone else seeing this the next time they get a job stuck, before you abort if you could go into your BOINC folder, in the subfolder slots/X where X is whatever number this job happens to be, and send me the contents of the various text files you find in there (you can send via PM). I'll keep looking into this. |
newman Send message Joined: 25 Oct 08 Posts: 3 Credit: 159,318 RAC: 0 |
My new WU also all stuck. In the log I find the following: Guest Log: progress_template 2015-10-24 23:07:44 (7640): Guest Log: params_00.ini 2015-10-24 23:07:44 (7640): Guest Log: params_01.ini 2015-10-24 23:07:44 (7640): Guest Log: params_02.ini 2015-10-24 23:07:44 (7640): Guest Log: params_03.ini 2015-10-24 23:07:44 (7640): Guest Log: params_04.ini 2015-10-24 23:07:44 (7640): Guest Log: Error: No such image or container: marius311/camb_boinc2docker:0.02 |
newman Send message Joined: 25 Oct 08 Posts: 3 Credit: 159,318 RAC: 0 |
0:00:34.466021 VMMDev: Guest Log: b3d362b23ec1: Download complete 00:00:35.137288 VMMDev: Guest Log: time="2015-10-24T21:45:17.667661453Z" level=debug msg="Downloaded b3d362b23ec1a7ba1694e6607b44c5e3fb63d68e5ae01f339c6abe8b0c995601 to tempfile /var/lib/docker/tmp/GetImageBlob710106124" 00:00:50.600772 VMMDev: Guest Log: a7e6eea8e649: Verifying Checksum 00:00:51.200290 VMMDev: Guest Log: time="2015-10-24T21:45:33.802853465Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" 00:00:53.591200 VMMDev: Guest Log: 757de7f408a1: Verifying Checksum 00:00:54.201561 VMMDev: Guest Log: time="2015-10-24T21:45:36.793200800Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
My new WU also all stuck. In the log I find the following: That error is actually expected, it just means this is your first time running camb_boinc2docker and the image needs to be downloaded. The problem is that this download fails, which is what is shown in the several lines below that. This is the problem that I believe will be solved in Docker 1.9.0 which is due in a couple of days. Alternatively if you're eager to get it working now, its pretty sporadic, so you might just try aborting jobs that get stuck and trying to run new ones; once your client gets the image downloaded it won't have to do it again for subsequent jobs. |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Just pushed two updates:
|
kararom Send message Joined: 9 Jan 09 Posts: 69 Credit: 29,506,700 RAC: 0 |
Is camb_boinc2docker beta test now? http://www.cosmologyathome.org/apps.php P.S.: Button B i u k Quote Code List List= Img URL - not working |
![]() Project administrator Project developer Project scientist ![]() Send message Joined: 29 Jun 15 Posts: 470 Credit: 4,276 RAC: 0 |
Is camb_boinc2docker beta test now? Yea still is. There's still a Mac issue to fix and I'd like to upgrade to Docker 1.9.0 before removing the tag, which will likely be another week or so. P.S.: Button B i u k Quote Code List List= Img URL - not working Thanks, I did notice that too, will look into it. |
newman Send message Joined: 25 Oct 08 Posts: 3 Credit: 159,318 RAC: 0 |
Mow it also works for me to crunch camp_boinc2docker WUs. But it seems they not accept reducing the CPU usage. I turned it to 66% to be able to play but boinc use still 100 %. Even if a new WU is started |