Forums :
Technical Support :
CAMB 2.02
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 17 Jul 07 Posts: 302 Credit: 5,006,319 RAC: 0 |
Linux progress indicator appears to be working again. PS- It still counts up to 15-20% real quick (< 3 minutes), then slows down. ![]() ![]() Boinc Button Abuser In Training >My Shrubbers< |
![]() Volunteer moderator Project administrator Project developer ![]() Send message Joined: 1 Apr 07 Posts: 662 Credit: 13,742 RAC: 0 |
Linux progress indicator appears to be working again. I don't think I'm ever going to be able to get it to update the progress bar at a constant interval, sorry. Some integrations just take longer to complete than others. Scott Kruger Project Administrator, Cosmology@Home |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 25 Jun 07 Posts: 508 Credit: 2,282,158 RAC: 0 |
Linux progress indicator appears to be working again. Scott my observations so far show that the progress moves fast from 0-25% slow 25-75% and then fast the last 25%. Hope that helps quantify it....don't give up yet! |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 25 Jun 07 Posts: 508 Credit: 2,282,158 RAC: 0 |
Would like to add at 95% it pauses for a minute then uploads every view I have had and it pauses then moves fast up exactly at 70%. |
ronald.s.larsen Send message Joined: 28 Oct 07 Posts: 2 Credit: 105,900 RAC: 0 |
I think I am having a problem with the 2.02 version. It downloaded about 01.45 UMT and since then the workunits restart about every four minutes (according the message log). I am running two units: The first runs quickly up to 14.0% and then restarts, the second hits 11.9% before it restarts. When the units restart, they increment in 0.700% in about 16 second intervals. When they hit their top level (14.000 and 11.900 respectively), the CPU timer restarts. At around 01:16, the Progress indicator resets, and begins the 0.700% incremention again. This pattern has been repeating since the 2.02 version was downloaded. (Units under 2.01 did not exhibit this behavior.) |
![]() Send message Joined: 1 Jul 07 Posts: 37 Credit: 208,284 RAC: 0 |
I too have noticed a strange behaviour in Camb 2.02. Here are the logs for the first 2 WUs cleaned up only to show the start, stop and checkpointing. I have coloured them so that you can see which is which. 29/10/2007 01:10:25|Cosmology@Home|Starting wu_102807_140108_0_0 29/10/2007 01:10:25|Cosmology@Home|Starting task wu_102807_140108_0_0 using camb version 202 29/10/2007 01:10:26|Cosmology@Home|Starting wu_102707_190121_4_0 29/10/2007 01:10:26|Cosmology@Home|Starting task wu_102707_190121_4_0 using camb version 202 29/10/2007 01:11:11|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed 29/10/2007 01:11:19|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed 29/10/2007 01:11:39|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed 29/10/2007 01:12:10|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed 29/10/2007 01:12:56|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed 29/10/2007 03:22:16|Cosmology@Home|Computation for task wu_102807_140108_0_0 finished 29/10/2007 03:23:05|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed 29/10/2007 03:23:14|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed 29/10/2007 03:23:33|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed 29/10/2007 03:24:04|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed 29/10/2007 03:24:52|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed 29/10/2007 05:37:30|Cosmology@Home|Computation for task wu_102707_190121_4_0 finished The first point to note is that there are only 5 checkpoints and they all occur in the space of about 1 minute. The other point is that the first WU checkpoints almost immediately whilst the second WU checkpoints only after after the first one finishes. Is this a coincidence or are the two instances of Camb 2.02 running on a dual core interfering with each other? This might also give a clue to rslarsen's problems. I'm away for a day or so but I will leave the system running to see if it gives any further clues. Dave. Edit: Oops, running Win XP Pro on an AMD X2 4200. |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 15 Jun 07 Posts: 345 Credit: 50,500 RAC: 0 |
Hmm, for me it doesn't even say it's checkpointing, just the restarting is logged. 29-Oct-07 10:38:02|Cosmology@Home|Starting wu_102807_140622_0_1 29-Oct-07 10:38:02|Cosmology@Home|Starting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 10:43:00|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 10:48:32|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 10:54:04|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 11:04:35|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 11:10:08|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 29-Oct-07 11:15:43|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202 I do have the checkpoint_debug flag on and other projects show they checkpoint: 29-Oct-07 10:15:31|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed 29-Oct-07 10:21:18|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed 29-Oct-07 10:26:39|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed CAMB 2.01 showed it was checkpointing: 29-Oct-07 3:03:18|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed 29-Oct-07 3:03:36|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed 29-Oct-07 3:03:56|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed 29-Oct-07 3:05:40|Cosmology@Home|Computation for task wu_102607_190047_5_1 finished I wonder if this is a weird side-effect of BOINC 5.10.27 in combination with CAMB 2.02 |
![]() Send message Joined: 1 Jul 07 Posts: 37 Credit: 208,284 RAC: 0 |
Hmm, for me it doesn't even say it's checkpointing, just the restarting is logged. I wonder if this is a weird side-effect of BOINC 5.10.27 in combination with CAMB 2.02 Jord, I'm using 5.10.26. I too have now had a WU go through without any checkpoints and a fourth where the checkpointing only occurred twice: 29/10/2007 05:45:47|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed 29/10/2007 05:46:19|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed Very strange. Dave. |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 15 Jun 07 Posts: 345 Credit: 50,500 RAC: 0 |
Ah thanks for that. Side effect of CAMB 2.02 then. :-) I notice with each restart that and the CPU time and the percentage done starts from zero. It takes 2 minutes to reach 10%, then CPU time switches to --- for the remainder of the time, before it restarts again. Does it even do anything? My stderr.txt in the slot it's running in has not been written to for over an hour. It's still at 0KB. I'm going on a limb here and put Cosmo on NNT, abort this task and update the project. For I feel it'll never end this task, what with all the restarting and not writing its state away. |
![]() Send message Joined: 1 Jul 07 Posts: 37 Credit: 208,284 RAC: 0 |
I notice with each restart that and the CPU time and the percentage done starts from zero. It takes 2 minutes to reach 10%, then CPU time switches to --- for the remainder of the time, before it restarts again. Does it even do anything? My stderr.txt in the slot it's running in has not been written to for over an hour. It's still at 0KB. Jord, IF BOINC is telling the truth about Camb 2.02 check pointing then restarting is going to lose a lot of work and could easily lead to a WU never finishing. None of the versions of Camb have ever written to stderr.txt in the slot for me. It is always 0KB. The strange thing is that I have seen results for a few users with stderr.txt in their result but I have never worked out why. Must away now, Dave. |
![]() Volunteer moderator Volunteer tester ![]() Send message Joined: 15 Jun 07 Posts: 345 Credit: 50,500 RAC: 0 |
It's possible it never writes to stderr.txt, I must say I never checked that. But there weren't any .cp files written either. Before my computer is stuck retrying the same task over and over for the day, I feel it's better it puts its resources on the other projects I crunch for, at least until Scott has had a chance to peek in and give an explanation. :-) |
zettabyte Send message Joined: 25 Oct 07 Posts: 2 Credit: 43,700 RAC: 0 |
Same 2.02 restart problem here, about every 5 minutes restart from scratch, but only on single-core systems, on both AMD/Linux and Intel/WinXp. This does not happen on Multicore AMD/Intel computers, everything ok there so far. |
![]() ![]() Send message Joined: 17 Jul 07 Posts: 302 Credit: 5,006,319 RAC: 0 |
Ah thanks for that. Side effect of CAMB 2.02 then. :-) Note 1- When I restarted 2 of my machines after turning on checkpoint_debug, my Linux machine started the restart/exit/restart thing. A project reset corrected it. Also note that all tasks you had assigned prior to the reset are sent back to you, so you only lose the processing time on the running WUs. Note 2- Windows and Linux are looking like they are acting different. If checkpoint_debug shows different things, I will post the results to different threads so Scott doesn't have to dig through this one for different info. Note 3- It is odd the the checkpoint_debug output did not produce any output until a new WU started after the restarted WU finished. Boinc bug? ![]() ![]() Boinc Button Abuser In Training >My Shrubbers< |
![]() Send message Joined: 3 Aug 07 Posts: 35 Credit: 153,234 RAC: 0 |
Something strange going on here. Quad core / Vista 2 units of cosmology are running (have not seen any restarting issues). 1 unit of malariacontrol is running 1 unit of LHC@Home claims to be running but CPU time is stuck 1 unit of Cosmology claims that it is waiting to run, however both CPU time and progress bar are counting up. All Cosmology units are CAMB 2.02 While I was typing the above the LHC unit's CPU time is now counting up again. ![]() |
ronald.s.larsen Send message Joined: 28 Oct 07 Posts: 2 Credit: 105,900 RAC: 0 |
I tried first to abort the workunits running, thinking that there might be something with the workunits. Then I re-set the project. The project dutifully pulled down two new units. This time, the units worked up to 13.300% before resetting, also incrementing up by 0.700% each step. I have suspended the project for the meantime. I don't know that much about the Boinc software, or the CAMB application -- meaning I have no idea where to look or what to test. |
![]() Send message Joined: 11 Aug 07 Posts: 63 Credit: 1,843,380 RAC: 0 |
WE have 5 machines, four core 2 duos and one p4. Four with vista one with XP and all of them are running CAMB 2.02 just fine. All of them are finishing just fine. Boinc 5.10.20 ![]() |
![]() Send message Joined: 3 Aug 07 Posts: 35 Credit: 153,234 RAC: 0 |
unit that ran while claiming to be in "waiting to run" status finished and seem to have been granted credit. ![]() |
![]() ![]() Send message Joined: 28 Jun 07 Posts: 12 Credit: 47,000 RAC: 0 |
Just noticed that this result that I reported a short time ago has claimed 2,026 credits. Obviously this is wrong and with fixed credits here it doesn't matter, but might be worth further investigation? We'll see what the wingman laurenu2 claims. Other results are making expected claims, just that one way out. |
![]() Volunteer tester ![]() Send message Joined: 8 Jun 07 Posts: 175 Credit: 446,074 RAC: 0 |
Not sure which CAMB 2.02 thread this should be under but ever since I have started running 2.02, I have noticed a lot of restarts by the computer. I almost never saw that on the older WUs. 10/30/2007 8:29:34 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202 10/30/2007 8:29:34 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202 10/30/2007 8:32:21 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202 10/30/2007 8:32:21 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202 10/30/2007 8:32:53 PM|Cosmology@Home|Sending scheduler request: Requested by user 10/30/2007 8:32:53 PM|Cosmology@Home|Reporting 2 tasks 10/30/2007 8:32:58 PM|Cosmology@Home|Scheduler RPC succeeded [server version 601] 10/30/2007 8:32:58 PM|Cosmology@Home|Deferring communication for 7 sec 10/30/2007 8:32:58 PM|Cosmology@Home|Reason: requested by project 10/30/2007 8:34:17 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202 10/30/2007 8:34:36 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202 10/30/2007 8:37:48 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202 10/30/2007 8:37:48 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202 10/30/2007 8:39:42 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202 10/30/2007 8:39:42 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202 10/30/2007 8:43:11 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202 10/30/2007 8:43:11 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202 10/30/2007 8:44:28 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202 10/30/2007 8:44:47 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202 10/30/2007 8:48:36 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202 10/30/2007 8:48:36 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202 10/30/2007 8:49:14 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202 10/30/2007 8:49:33 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202 ![]() |
![]() Volunteer moderator Project administrator Project developer ![]() Send message Joined: 1 Apr 07 Posts: 662 Credit: 13,742 RAC: 0 |
I've released CAMB 2.03 which rolls back the last changes I made. I'll release an update when I figure out why this behavior is happening. Scott Kruger Project Administrator, Cosmology@Home |