Advanced search

Forums : Technical Support : Problem switching projects
Message board moderation

To post messages, you must log in.

AuthorMessage
EigenState

Send message
Joined: 18 Nov 07
Posts: 21
Credit: 111,451
RAC: 0
Message 4013 - Posted: 20 Nov 2007, 18:43:07 UTC

Greetings,

I have just joined Cosmology@Home and encountered a problem on my second work unit: Task ID 1602186. I run Windows XP, SP2, and the C@H application was CAMB 2.04.

The C@H work unit successfully switched over to another project, but upon returning to C@H, the work unit appears to have begun calculation all over again from the very beginning. The observation was that a progress indicator showing approximately 50% completion dropped to 1%, and the time to completion went up to the sum of time already completed plus the original estimated time to completion. I aborted the task. While I can certainly run calculations by manually controlling the distribution of time committed to each project, this is hardly an ideal situation. If there is a solution already in place, please do tell me about it. I do have BOINC set to leave applications in memory while suspended.

Two other observations, all made within the Advanced View of BOINC Manager, that might be of interest.

The %Progress indicator is highly nonlinear, particularly at the beginning of a work unit where it will advance to as much as 25% in a matter of minutes for a work unit taking about 5 hours to actually complete.

The disk usage indicated for C@H is not stable in that it varies wildly with time, and does not come close to reflecting the actual size of the C@H file within the BOINC program file folder.

Best regards,
EigenState
ID: 4013 · Report as offensive     Reply Quote
Profile Benjamin Wandelt
Volunteer moderator
Project administrator
Project scientist
Avatar

Send message
Joined: 24 Jun 07
Posts: 192
Credit: 15,273
RAC: 0
Message 4014 - Posted: 20 Nov 2007, 19:52:21 UTC - in response to Message 4013.  

Hi

No work is lost in these cases. See this thread for a similar problem. It\'s just that the progress indicator may not be accurate after resuming work, but the work unit does start from the place where it left off. Scott has had a few goes at getting the progress indicator working. But it still seems to be buggy. I will meet with him next week to understand what the problem is and solve it.

This causes people to either think work was lost because the progress bar begins at 0 again or are surprised when a work unit appears to have taken much less time (since the work unit was already almost done before it resumed).

All the best,
Ben

Creator of Cosmology@Home
ID: 4014 · Report as offensive     Reply Quote
EigenState

Send message
Joined: 18 Nov 07
Posts: 21
Credit: 111,451
RAC: 0
Message 4015 - Posted: 20 Nov 2007, 20:04:03 UTC - in response to Message 4014.  

Greetings Professor Wandelt,

Thank you for the prompt reply, and I am gratified to know that the research is not being adversely affected. I can certainly live with a progress indicator that is less than ideal. Being an experimentalist rather than a theorist, I am rather used to babysitting apparatus, but very glad not to have to do so now.

I might also say that I do hope that this project will provide a real learning opportunity for the participants and even visitors. Cosmology is a fascinating field. Best of luck with the research!

Best regards,
EigenState
ID: 4015 · Report as offensive     Reply Quote
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 4016 - Posted: 20 Nov 2007, 20:12:38 UTC - in response to Message 4013.  

I do have BOINC set to leave applications in memory while suspended.

EigenState, what you describe indicates that this work unit was not kept in memory but resumed from a check point. It may be that you either re-booted your system exited BOINC in the time before the unit restarted. In that case everything will have been removed from memory. If this isn\'t the case then you need to check that BOINC is indeed leaving tasks in memory. Dave.
ID: 4016 · Report as offensive     Reply Quote
EigenState

Send message
Joined: 18 Nov 07
Posts: 21
Credit: 111,451
RAC: 0
Message 4017 - Posted: 20 Nov 2007, 20:37:25 UTC

Greetings Dave,

Thank you. All indications are that BOINC is indeed set to retain tasks in memory, and I have never experienced a similar problem with work units for other projects. Also, I am quite certain that I did not reboot my system, nor did I exit BOINC. I would be surprised as well to think that no checkpoint had been written since the first moments of 2 hours of calculations.

Given Professor Wandelt\'s comments above, I will again try allowing the projects to switch on their own. Seems a reasonable experiment.

Best regards,
EigenState
ID: 4017 · Report as offensive     Reply Quote
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 4018 - Posted: 20 Nov 2007, 21:00:47 UTC - in response to Message 4017.  

I would be surprised as well to think that no checkpoint had been written since the first moments of 2 hours of calculations.

You can check on such checkpointing progress by enabling the [checkpoint_debug} tag in cc_config.xml
In the messages log BOINC will then tell you when Cosmo (and other projects) checkpoint.

See How to set up cc_config.xml for more information on (enabling) the various core client messages.
ID: 4018 · Report as offensive     Reply Quote

Forums : Technical Support : Problem switching projects