Advanced search

Forums : Technical Support : CAMB 2.02
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3650 - Posted: 28 Oct 2007, 21:37:32 UTC
Last modified: 28 Oct 2007, 22:12:02 UTC

Linux progress indicator appears to be working again.
PS- It still counts up to 15-20% real quick (< 3 minutes), then slows down.

Boinc Button Abuser In Training >My Shrubbers<
ID: 3650 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 3651 - Posted: 28 Oct 2007, 22:18:07 UTC - in response to Message 3650.  

Linux progress indicator appears to be working again.
PS- It still counts up to 15-20% real quick (< 3 minutes), then slows down.

I don't think I'm ever going to be able to get it to update the progress bar at a constant interval, sorry. Some integrations just take longer to complete than others.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 3651 · Report as offensive
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 3652 - Posted: 28 Oct 2007, 22:21:12 UTC - in response to Message 3651.  
Last modified: 28 Oct 2007, 22:21:39 UTC

Linux progress indicator appears to be working again.
PS- It still counts up to 15-20% real quick (< 3 minutes), then slows down.

I don't think I'm ever going to be able to get it to update the progress bar at a constant interval, sorry. Some integrations just take longer to complete than others.



Scott my observations so far show that the progress moves fast from 0-25% slow 25-75% and then fast the last 25%. Hope that helps quantify it....don't give up yet!
ID: 3652 · Report as offensive
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 3655 - Posted: 29 Oct 2007, 1:16:29 UTC
Last modified: 29 Oct 2007, 1:43:40 UTC

Would like to add at 95% it pauses for a minute then uploads every view I have had and it pauses then moves fast up exactly at 70%.
ID: 3655 · Report as offensive
ronald.s.larsen

Send message
Joined: 28 Oct 07
Posts: 2
Credit: 105,900
RAC: 0
Message 3659 - Posted: 29 Oct 2007, 7:26:18 UTC
Last modified: 29 Oct 2007, 7:27:45 UTC

I think I am having a problem with the 2.02 version.
It downloaded about 01.45 UMT and since then the workunits restart about every four minutes (according the message log).
I am running two units: The first runs quickly up to 14.0% and then restarts, the second hits 11.9% before it restarts.
When the units restart, they increment in 0.700% in about 16 second intervals. When they hit their top level (14.000 and 11.900 respectively), the CPU timer restarts. At around 01:16, the Progress indicator resets, and begins the 0.700% incremention again.

This pattern has been repeating since the 2.02 version was downloaded. (Units under 2.01 did not exhibit this behavior.)
ID: 3659 · Report as offensive
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 3660 - Posted: 29 Oct 2007, 10:08:07 UTC
Last modified: 29 Oct 2007, 10:59:26 UTC

I too have noticed a strange behaviour in Camb 2.02. Here are the logs for the first 2 WUs cleaned up only to show the start, stop and checkpointing. I have coloured them so that you can see which is which.

29/10/2007 01:10:25|Cosmology@Home|Starting wu_102807_140108_0_0
29/10/2007 01:10:25|Cosmology@Home|Starting task wu_102807_140108_0_0 using camb version 202

29/10/2007 01:10:26|Cosmology@Home|Starting wu_102707_190121_4_0
29/10/2007 01:10:26|Cosmology@Home|Starting task wu_102707_190121_4_0 using camb version 202

29/10/2007 01:11:11|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:11:19|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:11:39|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:12:10|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:12:56|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 03:22:16|Cosmology@Home|Computation for task wu_102807_140108_0_0 finished

29/10/2007 03:23:05|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:23:14|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:23:33|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:24:04|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:24:52|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 05:37:30|Cosmology@Home|Computation for task wu_102707_190121_4_0 finished


The first point to note is that there are only 5 checkpoints and they all occur in the space of about 1 minute.
The other point is that the first WU checkpoints almost immediately whilst the second WU checkpoints only after after the first one finishes. Is this a coincidence or are the two instances of Camb 2.02 running on a dual core interfering with each other? This might also give a clue to rslarsen's problems.
I'm away for a day or so but I will leave the system running to see if it gives any further clues.
Dave.

Edit: Oops, running Win XP Pro on an AMD X2 4200.
ID: 3660 · Report as offensive
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 3662 - Posted: 29 Oct 2007, 10:27:39 UTC
Last modified: 29 Oct 2007, 10:30:15 UTC

Hmm, for me it doesn't even say it's checkpointing, just the restarting is logged.

29-Oct-07 10:38:02|Cosmology@Home|Starting wu_102807_140622_0_1
29-Oct-07 10:38:02|Cosmology@Home|Starting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 10:43:00|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 10:48:32|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 10:54:04|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 11:04:35|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 11:10:08|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202
29-Oct-07 11:15:43|Cosmology@Home|Restarting task wu_102807_140622_0_1 using camb version 202

I do have the checkpoint_debug flag on and other projects show they checkpoint:
29-Oct-07 10:15:31|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed
29-Oct-07 10:21:18|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed
29-Oct-07 10:26:39|N-Queens@home|[checkpoint_debug] result Nq24_08_01_19_10_1 checkpointed

CAMB 2.01 showed it was checkpointing:
29-Oct-07 3:03:18|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed
29-Oct-07 3:03:36|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed
29-Oct-07 3:03:56|Cosmology@Home|[checkpoint_debug] result wu_102607_190047_5_1 checkpointed
29-Oct-07 3:05:40|Cosmology@Home|Computation for task wu_102607_190047_5_1 finished


I wonder if this is a weird side-effect of BOINC 5.10.27 in combination with CAMB 2.02
ID: 3662 · Report as offensive
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 3663 - Posted: 29 Oct 2007, 10:53:15 UTC - in response to Message 3662.  

Hmm, for me it doesn't even say it's checkpointing, just the restarting is logged.

I wonder if this is a weird side-effect of BOINC 5.10.27 in combination with CAMB 2.02


Jord, I'm using 5.10.26. I too have now had a WU go through without any checkpoints and a fourth where the checkpointing only occurred twice:

29/10/2007 05:45:47|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed
29/10/2007 05:46:19|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed

Very strange. Dave.
ID: 3663 · Report as offensive
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 3664 - Posted: 29 Oct 2007, 11:00:06 UTC
Last modified: 29 Oct 2007, 11:07:39 UTC

Ah thanks for that. Side effect of CAMB 2.02 then. :-)

I notice with each restart that and the CPU time and the percentage done starts from zero. It takes 2 minutes to reach 10%, then CPU time switches to --- for the remainder of the time, before it restarts again. Does it even do anything? My stderr.txt in the slot it's running in has not been written to for over an hour. It's still at 0KB.

I'm going on a limb here and put Cosmo on NNT, abort this task and update the project. For I feel it'll never end this task, what with all the restarting and not writing its state away.
ID: 3664 · Report as offensive
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 3665 - Posted: 29 Oct 2007, 11:15:49 UTC - in response to Message 3664.  

I notice with each restart that and the CPU time and the percentage done starts from zero. It takes 2 minutes to reach 10%, then CPU time switches to --- for the remainder of the time, before it restarts again. Does it even do anything? My stderr.txt in the slot it's running in has not been written to for over an hour. It's still at 0KB.


Jord, IF BOINC is telling the truth about Camb 2.02 check pointing then restarting is going to lose a lot of work and could easily lead to a WU never finishing.
None of the versions of Camb have ever written to stderr.txt in the slot for me. It is always 0KB. The strange thing is that I have seen results for a few users with stderr.txt in their result but I have never worked out why.
Must away now, Dave.

ID: 3665 · Report as offensive
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 3666 - Posted: 29 Oct 2007, 11:27:29 UTC - in response to Message 3665.  

It's possible it never writes to stderr.txt, I must say I never checked that. But there weren't any .cp files written either. Before my computer is stuck retrying the same task over and over for the day, I feel it's better it puts its resources on the other projects I crunch for, at least until Scott has had a chance to peek in and give an explanation. :-)
ID: 3666 · Report as offensive
zettabyte

Send message
Joined: 25 Oct 07
Posts: 2
Credit: 43,700
RAC: 0
Message 3668 - Posted: 29 Oct 2007, 12:56:14 UTC

Same 2.02 restart problem here, about every 5 minutes restart from scratch, but only on single-core systems, on both AMD/Linux and Intel/WinXp. This does not happen on Multicore AMD/Intel computers, everything ok there so far.
ID: 3668 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3672 - Posted: 29 Oct 2007, 14:40:27 UTC - in response to Message 3664.  
Last modified: 29 Oct 2007, 14:52:49 UTC

Ah thanks for that. Side effect of CAMB 2.02 then. :-)

I notice with each restart that and the CPU time and the percentage done starts from zero. It takes 2 minutes to reach 10%, then CPU time switches to --- for the remainder of the time, before it restarts again. Does it even do anything? My stderr.txt in the slot it's running in has not been written to for over an hour. It's still at 0KB.

I'm going on a limb here and put Cosmo on NNT, abort this task and update the project. For I feel it'll never end this task, what with all the restarting and not writing its state away.

Note 1- When I restarted 2 of my machines after turning on checkpoint_debug, my Linux machine started the restart/exit/restart thing. A project reset corrected it. Also note that all tasks you had assigned prior to the reset are sent back to you, so you only lose the processing time on the running WUs.

Note 2- Windows and Linux are looking like they are acting different. If checkpoint_debug shows different things, I will post the results to different threads so Scott doesn't have to dig through this one for different info.

Note 3- It is odd the the checkpoint_debug output did not produce any output until a new WU started after the restarted WU finished. Boinc bug?


Boinc Button Abuser In Training >My Shrubbers<
ID: 3672 · Report as offensive
Profile Campion

Send message
Joined: 3 Aug 07
Posts: 35
Credit: 153,234
RAC: 0
Message 3682 - Posted: 29 Oct 2007, 21:45:25 UTC

Something strange going on here.

Quad core / Vista

2 units of cosmology are running (have not seen any restarting issues).

1 unit of malariacontrol is running

1 unit of LHC@Home claims to be running but CPU time is stuck

1 unit of Cosmology claims that it is waiting to run, however both CPU time and progress bar are counting up.

All Cosmology units are CAMB 2.02

While I was typing the above the LHC unit's CPU time is now counting up again.






ID: 3682 · Report as offensive
ronald.s.larsen

Send message
Joined: 28 Oct 07
Posts: 2
Credit: 105,900
RAC: 0
Message 3684 - Posted: 29 Oct 2007, 22:15:32 UTC

I tried first to abort the workunits running, thinking that there might be something with the workunits. Then I re-set the project. The project dutifully pulled down two new units. This time, the units worked up to 13.300% before resetting, also incrementing up by 0.700% each step.

I have suspended the project for the meantime. I don't know that much about the Boinc software, or the CAMB application -- meaning I have no idea where to look or what to test.
ID: 3684 · Report as offensive
Profile Beezlebub

Send message
Joined: 11 Aug 07
Posts: 63
Credit: 1,843,380
RAC: 0
Message 3687 - Posted: 29 Oct 2007, 23:34:04 UTC
Last modified: 29 Oct 2007, 23:39:26 UTC

WE have 5 machines, four core 2 duos and one p4. Four with vista one with XP and all of them are running CAMB 2.02 just fine. All of them are finishing just fine.

Boinc 5.10.20
ID: 3687 · Report as offensive
Profile Campion

Send message
Joined: 3 Aug 07
Posts: 35
Credit: 153,234
RAC: 0
Message 3693 - Posted: 30 Oct 2007, 5:23:23 UTC

unit that ran while claiming to be in "waiting to run" status finished and seem to have been granted credit.




ID: 3693 · Report as offensive
Profile Ray Murray
Avatar

Send message
Joined: 28 Jun 07
Posts: 12
Credit: 47,000
RAC: 0
Message 3716 - Posted: 30 Oct 2007, 23:33:17 UTC
Last modified: 30 Oct 2007, 23:38:00 UTC

Just noticed that this result that I reported a short time ago has claimed 2,026 credits. Obviously this is wrong and with fixed credits here it doesn't matter, but might be worth further investigation? We'll see what the wingman laurenu2 claims. Other results are making expected claims, just that one way out.
ID: 3716 · Report as offensive
Profile [B^S] Acmefrog
Volunteer tester
Avatar

Send message
Joined: 8 Jun 07
Posts: 175
Credit: 446,074
RAC: 0
Message 3718 - Posted: 31 Oct 2007, 0:50:25 UTC

Not sure which CAMB 2.02 thread this should be under but ever since I have started running 2.02, I have noticed a lot of restarts by the computer. I almost never saw that on the older WUs.

10/30/2007 8:29:34 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202
10/30/2007 8:29:34 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202
10/30/2007 8:32:21 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202
10/30/2007 8:32:21 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202
10/30/2007 8:32:53 PM|Cosmology@Home|Sending scheduler request: Requested by user
10/30/2007 8:32:53 PM|Cosmology@Home|Reporting 2 tasks
10/30/2007 8:32:58 PM|Cosmology@Home|Scheduler RPC succeeded [server version 601]
10/30/2007 8:32:58 PM|Cosmology@Home|Deferring communication for 7 sec
10/30/2007 8:32:58 PM|Cosmology@Home|Reason: requested by project
10/30/2007 8:34:17 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202
10/30/2007 8:34:36 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202
10/30/2007 8:37:48 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202
10/30/2007 8:37:48 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202
10/30/2007 8:39:42 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202
10/30/2007 8:39:42 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202
10/30/2007 8:43:11 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202
10/30/2007 8:43:11 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202
10/30/2007 8:44:28 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202
10/30/2007 8:44:47 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202
10/30/2007 8:48:36 PM|Cosmology@Home|Restarting task wu_102907_020244_3_0 using camb version 202
10/30/2007 8:48:36 PM|Cosmology@Home|Restarting task wu_102907_150106_0_0 using camb version 202
10/30/2007 8:49:14 PM|Cosmology@Home|Restarting task wu_102907_020424_3_1 using camb version 202
10/30/2007 8:49:33 PM|Cosmology@Home|Restarting task wu_102807_190320_1_0 using camb version 202

ID: 3718 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 3719 - Posted: 31 Oct 2007, 0:56:09 UTC

I've released CAMB 2.03 which rolls back the last changes I made. I'll release an update when I figure out why this behavior is happening.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 3719 · Report as offensive
1 · 2 · Next

Forums : Technical Support : CAMB 2.02