Advanced search

Forums : Technical Support : Long running WU aborted by project - no credit
Message board moderation

To post messages, you must log in.

AuthorMessage
Ruud van der Kroef

Send message
Joined: 25 Aug 07
Posts: 12
Credit: 3,298,533
RAC: 0
Message 7327 - Posted: 30 Sep 2008, 12:52:42 UTC

I have a number of systems running unattended; one of them is synstar04.

This morning I noticed there is a C@H WU running at high priority. I found it is at approx. 65% and had been running for already about 350 hours. Time to complete is another 250 hours, and due date was something like 19-09-2008.

I checked the Tasks Status Page for this system, but there are no tasks displays.
I decided to do a Update Project on the Projects page of BOINCManager. Then I looked on the Work page of BOINCManager, and found the WU is aborted by the project. The Messages page of BOINCManager shows the following:

30-09-2008 12:10:54|Cosmology@Home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
30-09-2008 12:10:59|Cosmology@Home|Scheduler request succeeded: got 0 new tasks
30-09-2008 12:11:04|Cosmology@Home|Computation for task wu_072508_040229_1_4 finished


The task has disappeared, does not show in the Task Status Page, no credits have been granted. 350 hours of computation time wasted.

Has this problem been seen before? I cannot imagine that I am the first one to run into this.
I have searched the Forum, did find other complaints about not getting credit, but nothing about this particular incident.

Regards,
Ruud
ID: 7327 · Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 7329 - Posted: 30 Sep 2008, 19:48:03 UTC - in response to Message 7327.  

Has this problem been seen before? I cannot imagine that I am the first one to run into this.


I\'ve not seen any Cosmo task go much over 5.25hours since my old P4 was retired with a failed power supply a few months ago. It could be a rogue set of variables but I think it might possibly be to do with check-pointing and how often Boinc switches between tasks if you run multiple projects on your machine. If the switch interval is less than the interval between checkpoints, no progress can be made unless Boinc doesn\'t need to do a switch. That could explain how you got 65% in 350 hours of processing. I have heard of something similar on climate prediction.

As for the task disappearing that can probably be explained by the fact the task would have been re-issued to someone else on the 19th and if they completed it on that day it would have been archived within 10 days (if your original wingman had completed on time too). The project would have issued a cancel task request on the 19th but you didn\'t pick it up because no update was done presumably.

I\'d start by checking the general preferences on your account here and if you use the home / work / school options, check them. Also the local settings on the machine in question. You should be safe leaving the switch interval at 60 minutes or more (not less).

Because a project can cancel a task but that doesn\'t get passed to your machine until an update is made it may be worth installing Boinccmd and setting up a scheduled task to run an update via that software every 24 hours - choose an odd time not on the hour or half hour etc. If that had been in place your long-running task would have been canceled on the 19th or 20th not the 30th.

Phoneman1

ID: 7329 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 7332 - Posted: 1 Oct 2008, 2:46:14 UTC


I\'ve got a 266G p4 xpsp3 and have NEVER run into a wu that long! The wu\'s have never gone over about 11hrs on that box!
A clear conscience is usually the sign of a bad memory
ID: 7332 · Report as offensive     Reply Quote
Ruud van der Kroef

Send message
Joined: 25 Aug 07
Posts: 12
Credit: 3,298,533
RAC: 0
Message 7610 - Posted: 9 Nov 2008, 17:38:10 UTC - in response to Message 7329.  

It has been a while that I reported this problem, but I would like to thank the respondants.

I\'d start by checking the general preferences on your account here and if you use the home / work / school options, check them. Also the local settings on the machine in question. You should be safe leaving the switch interval at 60 minutes or more (not less).

I have checked the preferences, and the switch interval is in all profiles left unchanged: 60 min.

Because a project can cancel a task but that doesn\'t get passed to your machine until an update is made it may be worth installing Boinccmd and setting up a scheduled task to run an update via that software every 24 hours - choose an odd time not on the hour or half hour etc. If that had been in place your long-running task would have been canceled on the 19th or 20th not the 30th.

Phoneman1

As I mentioned in my problem description, the tasks were running \'high priority\'.
From experience (I have not checked any documentation on this), that means that task switching has been disabled for that task.
I also think that in these cases updates are disabled.

In the mean time I have found more of these tasks, but I will discuss them in a separate message.

Thanks and regards,
Ruud
ID: 7610 · Report as offensive     Reply Quote
Ruud van der Kroef

Send message
Joined: 25 Aug 07
Posts: 12
Credit: 3,298,533
RAC: 0
Message 7612 - Posted: 9 Nov 2008, 19:00:16 UTC - in response to Message 7610.  
Last modified: 9 Nov 2008, 19:04:17 UTC

Last week Wednesday (05/11) I discovered, that on this same host synstar04 there are 3 C@H tasks running at high priority:

Task ID..........Name.........................................Processor time........Progress.......Remaining time
11328240.....wu_101908_001039_0_2_0.....263h....................... 43%..............217h
11332781.....wu_101908_003211_0_2_0.....226h....................... 36.5%...........226h
11350047.....wu_101908_001559_1_1_1.....248h....................... 36%..............241h

I aborted task wu_101908_001039_0_2_0 just to see what would happen.
Checking the task webpage for that computer shows the task has been aborted as expected, and of course without any credit granted, but look at the low claimed credit: only 560.80 for 816,696.90 CPU seconds.

Another thing is, should I leave the remaining tasks (or maybe just one) running to see what will happen?
Or should I kill them both, as it seems like a waste of computer time?

Any suggestions?
(Personally I think I will keep one running just to see what will happen.)

Thanks,
Ruud
ID: 7612 · Report as offensive     Reply Quote
sygopet

Send message
Joined: 2 Aug 08
Posts: 27
Credit: 204,771
RAC: 0
Message 7616 - Posted: 10 Nov 2008, 10:16:48 UTC - in response to Message 7612.  
Last modified: 10 Nov 2008, 10:27:25 UTC

.....look at the low claimed credit: only 560.80 for 816,696.90 CPU seconds.


Something is wrong! With your setup you should definitely be processing units within a few hours. I wouldn\'t have thought continuing with the others would have any worth either.
Could be worth trying a project reset.
It may be significant that two of the units (and possibly the third) you mention are ones where others have had downloading problems. so you\'ve probably just got rubbish units.
You might have a claimed credit of 560 but the standard awarded is just 140 at present.
ID: 7616 · Report as offensive     Reply Quote
web03

Send message
Joined: 29 Aug 07
Posts: 4
Credit: 314,240
RAC: 0
Message 7621 - Posted: 10 Nov 2008, 19:39:43 UTC
Last modified: 10 Nov 2008, 19:41:36 UTC

ok - some things to think about.

It looks like you are running version 5.10.45. Any reason why you haven\'t upgraded to the current version of BOINC - 6.2.19?

Second - what about your pc statistics? This can be found on the Computer Summary page (but not viewable by us). here\'s mine as an example.

% of time BOINC client is running 99.8956 %
While BOINC running, % of time host has an Internet connection 98.6076 %
While BOINC running, % of time work is allowed 99.974 %
Average CPU efficiency 0.979113
Task duration correction factor 1.160849

Third - have you recently re-ran benchmarks? Maybe something is wrong there.

Wendy
ID: 7621 · Report as offensive     Reply Quote
Ruud van der Kroef

Send message
Joined: 25 Aug 07
Posts: 12
Credit: 3,298,533
RAC: 0
Message 7634 - Posted: 13 Nov 2008, 12:56:13 UTC - in response to Message 7621.  
Last modified: 13 Nov 2008, 12:57:58 UTC

OK, some answers:

sygopet:
...
I wouldn\'t have thought continuing with the others would have any worth either.
...


As you advised I have killed both tasks. Also because the progress looks very slow, so it might take quite some time to finish, if ever.


webo3:
ok - some things to think about.

It looks like you are running version 5.10.45. Any reason why you haven\'t upgraded to the current version of BOINC - 6.2.19?

Nothing special, other than that it is a lot of work upgrading 50+ boxes.
Second - what about your pc statistics? This can be found on the Computer Summary page (but not viewable by us). here\'s mine as an example.

% of time BOINC client is running 99.8956 %
While BOINC running, % of time host has an Internet connection 98.6076 %
While BOINC running, % of time work is allowed 99.974 %
Average CPU efficiency 0.979113
Task duration correction factor 1.160849

My statistics for this client are:

% of time BOINC client is running 100 %
While BOINC running, % of time work is allowed 99.9914 %
Average CPU efficiency 0.999058
Task duration correction factor 16.420957
Third - have you recently re-ran benchmarks? Maybe something is wrong there.

Wendy

I think they run automatically. Looking into the Messages tab of BOINCManager I found they run every 5 days or so.

Thanks,
Ruud
ID: 7634 · Report as offensive     Reply Quote

Forums : Technical Support : Long running WU aborted by project - no credit