21) Forums : Technical Support : CAMB 2.02 (Message 3663)
Posted 29 Oct 2007 by Profile Sou'westerly
Post:
Hmm, for me it doesn't even say it's checkpointing, just the restarting is logged.

I wonder if this is a weird side-effect of BOINC 5.10.27 in combination with CAMB 2.02


Jord, I'm using 5.10.26. I too have now had a WU go through without any checkpoints and a fourth where the checkpointing only occurred twice:

29/10/2007 05:45:47|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed
29/10/2007 05:46:19|Cosmology@Home|[checkpoint_debug] result wu_102707_190020_1_0 checkpointed

Very strange. Dave.
22) Forums : Technical Support : CAMB 2.02 (Message 3660)
Posted 29 Oct 2007 by Profile Sou'westerly
Post:
I too have noticed a strange behaviour in Camb 2.02. Here are the logs for the first 2 WUs cleaned up only to show the start, stop and checkpointing. I have coloured them so that you can see which is which.

29/10/2007 01:10:25|Cosmology@Home|Starting wu_102807_140108_0_0
29/10/2007 01:10:25|Cosmology@Home|Starting task wu_102807_140108_0_0 using camb version 202

29/10/2007 01:10:26|Cosmology@Home|Starting wu_102707_190121_4_0
29/10/2007 01:10:26|Cosmology@Home|Starting task wu_102707_190121_4_0 using camb version 202

29/10/2007 01:11:11|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:11:19|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:11:39|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:12:10|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 01:12:56|Cosmology@Home|[checkpoint_debug] result wu_102807_140108_0_0 checkpointed
29/10/2007 03:22:16|Cosmology@Home|Computation for task wu_102807_140108_0_0 finished

29/10/2007 03:23:05|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:23:14|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:23:33|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:24:04|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 03:24:52|Cosmology@Home|[checkpoint_debug] result wu_102507_170022_3_0 checkpointed
29/10/2007 05:37:30|Cosmology@Home|Computation for task wu_102707_190121_4_0 finished


The first point to note is that there are only 5 checkpoints and they all occur in the space of about 1 minute.
The other point is that the first WU checkpoints almost immediately whilst the second WU checkpoints only after after the first one finishes. Is this a coincidence or are the two instances of Camb 2.02 running on a dual core interfering with each other? This might also give a clue to rslarsen's problems.
I'm away for a day or so but I will leave the system running to see if it gives any further clues.
Dave.

Edit: Oops, running Win XP Pro on an AMD X2 4200.
23) Forums : Technical Support : IMPORTANT - Testing for Stand-Alone CAMB Application Soon (Message 3619)
Posted 28 Oct 2007 by Profile Sou'westerly
Post:

Might be a Linux problem then?

It's working here on my Windows 2000 box. Ran 32 minutes and the progress bar is showing 50.400%. BOINC 5.10.27


I think you are right Jord. I have now had one 2.01 run right through on Win XP and the progress bar worked perfectly as far as I could see. I also have
28/10/2007 11:35:17|Cosmology@Home|[checkpoint_debug] result wu_102607_120232_2_1 checkpointed
so we should see project switching with the BOINC 5.10 series. Well done Scott.
Dave.
24) Forums : Technical Support : there was work but it was committed to other platforms (Message 3603)
Posted 27 Oct 2007 by Profile Sou'westerly
Post:
I get this message on a Macbook pro (Intel dual core 2.2 GHz) running OSX, but I do not get the message (and get work units) on an older Windows XP box with an old Intel 2.2 GHz processor (pre-hyperthreading)

You aren't getting anywhere near the work you could be getting from me if there is no work for my newer machine.


Steve, OSX not yet supported, see here, Dave.
25) Forums : Technical Support : How much Pending "pile-up" is to be expected? (Message 3590)
Posted 27 Oct 2007 by Profile Sou'westerly
Post:
Chris, looking at a dip sample of your farm's results (very impressive) the vast majority of your pendings are currently awaiting your wingmen reporting their results. I only saw a couple being sent out a third time with possible validation problems for you or your wingman and none that were stuck awaiting the validator. With a 10 day deadline I'm hoping that the pendings should start to stabilise soon. Dave.
26) Forums : Technical Support : Someone Please Kick The Server! (Message 3587)
Posted 27 Oct 2007 by Profile Sou'westerly
Post:
Someone please kick the server! I'm getting no work again!!!



I’ve just got work for my AMD4200 which is in the same class as your 4400

27/10/2007 14:56:35|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 284 seconds of work, reporting 0 completed tasks
27/10/2007 14:56:40|Cosmology@Home|Scheduler request succeeded: got 1 new tasks.

What did your log say?
Dave.
27) Forums : Technical Support : active_frac too low (Message 3584)
Posted 27 Oct 2007 by Profile Sou'westerly
Post:
Hi guys -

Sorry to have yet another issue after yesterday's minor attachment SNAFU, but this has been an ongoing problem for many months, and is irritating the heck out me.

In my client_state file the active_frac value is ridiculously low; something on the order of 0.00048! This has been keeping me from getting more than one WU per core on this specific machine (my other machines are fine), and also forcing one or two of the four WUs to always be running in "high priority" mode (the machine is a Q6600 C2Q). I have no idea how this came about.

To make matters more difficult, I've even manually edited the active_frac value using an HTML text editor, only to have it always revert back to the same extremely low value whenever the BOINC MGR contacts the project (including on initial start-up)! I'm trying to ascertain if the state reverts only during comm or if during other actions of the BOINC MGR, but I'm not fully knowledgeable on when comm is taking place in the normal launching and running cycles of the BOINC MGR.

Is the active_frac value set by the project server upon comm? Or possibly in some (XP) registry key (I checked and didn't see it in the usual BOINC reg keys)?

The only BOINC project currently running on this machine is C@H. To add further insult, I even removed and re-installed the BOINC MGR, to no avail.

I'm really at a loss on this one.

If this issue is beyond the scope of the C@H project forum, I'll head over to the BOINC forums. I hate to bother you guys with something "generic" BOINC if you believe this doesn't fall under the realm of C@H.

Thanks.

Eddie


Eddie, I wonder if your problem is that BOINC is reverting to the client state back up file. Two possible reasons for this could be:
1, You have the file locked open in another program when BOINC wants to read it.
2, Your xml editor is doing something to the file which BOINC does not like. When it reads it to do the comm it fails and reverts to the backup. I use Notepad to edit this file and have just used <active_frac>0.990000</active_frac> I then saved it under all files with a .xml extension. It has work fine even after a couple of project updates and has not reverted to the old value. Dave.

28) Forums : Technical Support : Does resume work? (Message 3526)
Posted 26 Oct 2007 by Profile Sou'westerly
Post:
I've resumed cosmology@home 2 times and both it lost the work done before.
First time work done at 16%, second time done at 21%.

Thank you.



Luke, as far as I can tell these WUs are resuming from a checkpoint. Scott is currently working on a problem where the WU is not talking properly to BOINC. This means that when you resume the CPU time recorded by BOINC resets to zero and the progress can also appear to drop back. Dave.
29) Forums : Technical Support : IMPORTANT - Testing for Stand-Alone CAMB Application Soon (Message 3481)
Posted 25 Oct 2007 by Profile Sou'westerly
Post:
Back to testing! I’ve turned on checkpoint debugging on a couple of my 5.10 clients and found that Camb 2.00 is not communicating the fact that it has check pointed to the core client. Therefore it will always run to completion on 5.10 clients. I then tried stopping BOINC with a WU at 44%. On restart the CPU time reset to zero but the progress indicator carried on from 44%. When I checked it 16 minutes later the progress showed only 2.8%. The progress then steadily increased to 96% at which point the WU completed. From the time taken I would guess that it did resume from the checkpoint but I could find no debugging information in Stderr_txt to confirm this. Dave.
30) Forums : Technical Support : IMPORTANT - Testing for Stand-Alone CAMB Application Soon (Message 3470)
Posted 24 Oct 2007 by Profile Sou'westerly
Post:
Looks like I shall have to retire my laptop from the project. It ran 1.27 OK but now the memory requirement seems to be too much for it.

24/10/2007 20:34:41|Cosmology@Home|Starting task wu_102307_201014_2_0 using camb version 200
24/10/2007 20:35:46|Cosmology@Home|Aborting task wu_102307_201014_2_0: exceeded memory limit 273.26MB > 254.21MB
24/10/2007 20:35:46|Cosmology@Home|Reason: Unrecoverable error for result wu_102307_201014_2_0 (Maximum memory exceeded)

A second WU seems to be running though? I'll leave leave it attached just to see what happens to it first.
Dave
31) Forums : Technical Support : Errors (Message 3295)
Posted 17 Oct 2007 by Profile Sou'westerly
Post:
Can you give me an idea of how often you see this error?


I've had a couple, this one, and this one. Both stopped due to parameter error.
Dave.
32) Forums : Technical Support : Warning: Disk Usage On New WU's (Message 1243)
Posted 20 Jul 2007 by Profile Sou'westerly
Post:
Check your first several results of the new WU's. I had 2 systems that had results error out because of disk space problems. The checkpointing looks like it might use up a more space than most projects do for that function. (PS- I also run QMC on those machines so it could be the combination of the 2).
At any rate, I settings of 50% of 10 GB (stock settin), but only had 5 GB available on the partition, so Boinc was limited to 2.5 GB- Not enough.


How much disc usage is required! My preferences are :-
19/07/2007 09:43:21||General prefs: from Einstein@Home (last modified 2007-05-09 23:43:50)
19/07/2007 09:43:21||Host location: home
19/07/2007 09:43:21||General prefs: using separate prefs for home
19/07/2007 09:43:21||Preferences limit memory usage when active to 1023.74MB
19/07/2007 09:43:21||Preferences limit memory usage when idle to 1842.73MB
19/07/2007 09:43:21||Preferences limit disk usage to 93.13GB
but the unit errored out with this:-
20/07/2007 23:27:35|Cosmology@Home|Reason: Unrecoverable error for result wu_072007_140402_0_1 (Maximum disk usage exceeded)
20/07/2007 23:27:42|Cosmology@Home|[error] Can't rename output file wu_072007_140402_0_1_0
20/07/2007 23:27:48|Cosmology@Home|[error] Can't rename output file wu_072007_140402_0_1_1
20/07/2007 23:27:54|Cosmology@Home|[error] Can't rename output file wu_072007_140402_0_1_2
20/07/2007 23:27:59|Cosmology@Home|[error] Can't rename output file wu_072007_140402_0_1_3
20/07/2007 23:28:05|Cosmology@Home|[error] Can't rename output file wu_072007_140402_0_1_4
20/07/2007 23:28:05|Cosmology@Home|Computation for task wu_072007_140402_0_1 finished.
I checked shortly before it finished and it was only using 87.86M, total BOINC usage was 270MB. Surely the output files aren't bigger than 92GB?
Am I Missing something? Dave.
33) Forums : Technical Support : Comments on Fixed Credit System (Message 939)
Posted 11 Jul 2007 by Profile Sou'westerly
Post:
The new version is somehow broken. I'm working on it right now to try to get it fixed.


Scott, could it be something has changed in the validator rather than 1.20 being broken. I’m still crunching through my cache with 1.19 so the new app hasn’t yet affected my results. However at some point between 12:42:51 and 15:31:02 I went from about 99% validation to 100% checked and no consensus. Dave.
34) Forums : Technical Support : Invalid results (Message 866)
Posted 7 Jul 2007 by Profile Sou'westerly
Post:
I am going to try using coarse HR (i.e. one OS per WU) to see if that lowers the invalidation rate.


I’m not sure if this is how you intended it but I get the impression that each work unit is sent to two different OS. In my case it is usually 2 to Windows & 2 to Linux. If there is a disagreement then it’s a straight race as to which system gets the two results back first. In this case BOK and I lost out to the Linux men who were much quicker off the mark! Dave.



35) Forums : Technical Support : Linux/Windows speed difference? (Message 852)
Posted 6 Jul 2007 by Profile Sou'westerly
Post:
Anybody else not seeing any difference?



Scott. Sorry to be a damp squid after all your hard work but any speed up on my AMD 4200+ using XP pro is less than the variation in times between the different work units. I still love the project though :) Dave.
36) Forums : Technical Support : 6 min vs. 13 min runtime (Message 832)
Posted 4 Jul 2007 by Profile Sou'westerly
Post:
Well, I've just happended to watch a cosmo wu finish.
It was at aprox. 12:43 min when it went to 100%, but then showed only 6:32 min in the boinc manager.
Sysfried


Sysfried, I’m not seeing this on Windows XP. Is there any chance that this was one of your Linux boxes? In which case it could explain some of the speed differences between Linux and Windows.
Dave.


Previous 20