Advanced search

Forums : Technical Support : Various tasks or WU\'s that appear to have problems
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7020 - Posted: 14 Aug 2008, 18:45:30 UTC

While I am a bit dubious as to whether the project is \"functional\" or not, I am committed to doing whatever small part I can to help. Towards that end, I have compiled a list of oddities that I feel indicate that not all is well.

1. WU 4978112 indicates initial replication 3, but only 2 have ever been sent. Task 11010313 has remained in an \"unsent\" state since 5 Aug 2008 6:28:24 UTC when it was created.

2. WU 4955098 had the initial 2 tasks cancelled with the state \"Didn\'t need\", yet 2 more tasks were created. They appear to be processing normally though, so this may not be a problem.

3. WU 4980033 contains task 10934511 that has remained in an \"unsent\" status since 31 Jul 2008 10:40:41 UTC.

4. WU 4979913 contains task 11003103 that has remained \"unsent\" since 4 Aug 2008 17:58:26 UTC.

5. I\'m finding many, many WU\'s like 4883478 that were previously cancelled with \"Didn\'t need\" that now have new tasks created today. Some are sent... many are not. Is this an effort to finish WU\'s that never would have completed? Was this purposefully done or just the WU generator doing it\'s \"own thing\"?

6. I\'ve avoided listing the many WU\'s that I have pending that were cancelled and therefore will probably not create another task in order to validate. Hopefully they will eventually be purged, but many, like 4858921 have been cluttering up the result database for quite some time and may or may not ever purge (I do not posess sufficient knowledge of BOINC to be sure).

Perhaps I\'m looking for \"functioning correctly\" and that\'s a higher standard than \"functional\", but in any case, I\'d say these are not normal issues for other BOINC projects.

If any others have similar issues, I would invite them to add to this list. Perhaps it can help prevent these problems in the future.
ID: 7020 · Report as offensive     Reply Quote
Profile ChertseyAl
Avatar

Send message
Joined: 23 Aug 07
Posts: 21
Credit: 175,420
RAC: 0
Message 7021 - Posted: 14 Aug 2008, 19:01:08 UTC - in response to Message 7020.  

5. I\'m finding many, many WU\'s like 4883478 that were previously cancelled with \"Didn\'t need\" that now have new tasks created today. Some are sent... many are not. Is this an effort to finish WU\'s that never would have completed? Was this purposefully done or just the WU generator doing it\'s \"own thing\"?


My only remaining pending WU that escaped the Zero Credit Axe is this one:

http://www.cosmologyathome.org/workunit.php?wuid=4880325

Note that it\'s partner in crime was flagged as \"Didn\'t need\" and then a new WU was created 13 seconds later! I live in hope that my last surviving WU escapes the death penalty ;)

Al.
ID: 7021 · Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 7025 - Posted: 14 Aug 2008, 21:08:21 UTC
Last modified: 14 Aug 2008, 21:14:50 UTC

Having seen Ben\'s post elsewhere earlier I thought I\'d have another bash at the Cosmo work... Good idea to start this thread, Thunder. I think you could be lucky with some of these work units. It seems the axe job was not as thorough as it could have been as BOINC is more resilient than I and I think most people thought - I\'ll explain why I think that in a bit.

ChertseyAl - if your new wingman completes the task in the next few days you may get some credit even though you used 2.14 and he\'s presumably now using 2.15. My understanding is it is the same underlying code but certain linux flavours of the 2.14 executable were empty. It will be interesting to see if you do get a credit for this one.

Thunder - Looking at your wu\'s:

WU 4978112 - Is a very strange case. Your wingman used 2.12 and you used 2.14 despite being sent out to both of you over a week after 2.12 was superceeded. Of course these results will never match - your wingman won\'t get any credit and neither will you unless the admins can force this unit to be sent out again.

WU 4978112 - agreed this one could finish normally after all.

WU 4980033 - not sure about the original unsent third task. It may have happened when the system parameters were changed by the project admin for initial replication. Again I don\'t think this result will move forward without a kick by project admin.

WU 4979913 - Also needs a kick to get another task out.

WU 4883478 and similar ones are being monitored by the system as their last deadline date expires a new task is created and assigned (usually in a matter of a few seconds or minutes). Most of my pending list are now in this state and I hope the units will run through to their normal conclusion now.

WU 4858921 - If I\'m right a new task will be created automatically by the systen just after 6:49:32 UTC on August 17th.

Looking at my own pending list they also fall into 4 distinct categories:

1) The majority have a new task created since the axe fell on August 9th. The new task is shown as now in progress and stands a chance of completing normally.

2) I\'ve 4 wu\'s that I\'m waiting for an automatic creation of a new task - 3 are due to be created before 10:00:00 UTC tomorrow so not long to wait. The remaining one is not due to be created until August 19th.

3) I\'ve 5 wu\'s with no visible sign they\'ll get a further task created and sent without a kick by the project admin, they are:

4950188
4979436
4979689
4979497
4979629


4) Another strange case, this time where a new task was created but there was no corresponding expiry of a task to trigger it:
4925230

[edit] There is also a fifth category, those like Dr Iggy mentioned in the No Credit thread. The results got cancelled as they were being reported (because BOINC can only tell clients about units cancelled on the server when the clients make contact). [/edit]

BTW If functional means you will get \"no work sent\" and \"committed to other platform\" type messages when you request work most times then yes this project is now functional again.



Phoneman1
ID: 7025 · Report as offensive     Reply Quote
Profile Benjamin Wandelt
Volunteer moderator
Project administrator
Project scientist
Avatar

Send message
Joined: 24 Jun 07
Posts: 192
Credit: 15,273
RAC: 0
Message 7073 - Posted: 19 Aug 2008, 1:07:47 UTC

Hi -

thank you for these posts. They are useful in figuring out what\'s bits of our scheduler are sick.

I anticipate that we will get a chance to make a push on these issues next week, after Scott\'s qual exam. So it might get a bit worse before it gets better.

At that time we will also adjust the HR classes and validator to make sending out work easier. Actually, I just heard back from Scott and he may take a crack at it tonight, though he will have a very limited amount of time to spend on it.

All the best,
Ben

Creator of Cosmology@Home
ID: 7073 · Report as offensive     Reply Quote
rbpeake

Send message
Joined: 27 Jun 07
Posts: 118
Credit: 61,883
RAC: 0
Message 7074 - Posted: 19 Aug 2008, 1:15:22 UTC - in response to Message 7073.  

Hi -

thank you for these posts. They are useful in figuring out what\'s bits of our scheduler are sick.

I anticipate that we will get a chance to make a push on these issues next week, after Scott\'s qual exam. So it might get a bit worse before it gets better.

At that time we will also adjust the HR classes and validator to make sending out work easier. Actually, I just heard back from Scott and he may take a crack at it tonight, though he will have a very limited amount of time to spend on it.

All the best,
Ben

Thanks for the update! Am currently pairing CAH with another project, so even with spotty CAH work there is always something to do.
ID: 7074 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7075 - Posted: 19 Aug 2008, 1:19:49 UTC

Thank you Dr. Wandelt. We all appreciate it when we\'re kept informed. I know Scott and yourself have quite an uphill struggle to get things working and despite my occassional frustration, I DO wish you both the best of luck in your efforts.
ID: 7075 · Report as offensive     Reply Quote
sygopet

Send message
Joined: 2 Aug 08
Posts: 27
Credit: 204,771
RAC: 0
Message 7078 - Posted: 19 Aug 2008, 8:30:15 UTC - in response to Message 7073.  

. . .
I just heard back from Scott and he may take a crack at it tonight
. . .
Don\'t know if Scott was able to do anything, or if I was just lucky, but I am pleased to report something positive and managed to download one unit overnight. This is now 40% completed and we will see if it completes and uploads later.
Let\'s hope that the problems are gradually being overcome and maybe, one day, we will even get equable credit!
ID: 7078 · Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 7 Jul 08
Posts: 5
Credit: 12,610
RAC: 0
Message 7079 - Posted: 19 Aug 2008, 9:19:44 UTC

Gday guys,
Im new at this so please bear with me... Im having problems getting new work from the project. The scheduler is sending requests and the server keeps saying no work sent can you shed any light on this please.. Very frustrating im using bionic client ver 6.2.16

Thanks Glenn.
Aust..
ID: 7079 · Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 7083 - Posted: 19 Aug 2008, 10:58:05 UTC - in response to Message 7025.  

An update to my earlier post in this thread:
2) I\'ve 4 wu\'s that I\'m waiting for an automatic creation of a new task - 3 are due to be created before 10:00:00 UTC tomorrow so not long to wait. The remaining one is not due to be created until August 19th.


All in the above category have had new tasks issued, sent, been completed and credit awarded!

3) I\'ve 5 wu\'s with no visible sign they\'ll get a further task created and sent without a kick by the project admin, they are:

4950188
4979436
4979689
4979497
4979629


All 5 of the above are in the same state they were and doubtless there are many others in other people\'s pending lists like these which will need project admin to get another task generated and sent out for each unit.

4) Another strange case, this time where a new task was created but there was no corresponding expiry of a task to trigger it:
4925230


On reflection I shouldn\'t have identified this as a separate category - it was a curiosity and should have been lumped with category 1) - that it is to say it has another task currently in progress.

So to try and summarise things, it is now nearly ten days since the axe fell on in progress tasks, the vast majority now have a new task generated in progress or the unit is completed. There remain a number (like the 5 mentioned above) without another task in progress. Usually these have an unsent task already created. There are also a few tasks which were rejected on reporting (like those mention by Dr Iggy in the No Credit thread) which will need investigating.

However, I suspect that most would agree tha most urgent thing to investigate is why so little work is forthcoming. As I have suggested in another thread it does seem as if re-sends are being prioritised ahead of new work to such an extend they have to be matched to a requesting machine of the correct type before machine of other types can get work. The effect is cumulative of course, so if for example an type A machine aborts a task then a type B aborts a task no type C machine will get any work until first a type B requests a task and then a type A machine requests a task (in that order). Add a few dozen machines to that list and you\'ll begin to see why it is so difficult to get work from this project, at the moment.

Phoneman1
ID: 7083 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7084 - Posted: 19 Aug 2008, 11:14:08 UTC - in response to Message 7083.  

However, I suspect that most would agree tha most urgent thing to investigate is why so little work is forthcoming.


Agreed... My AMD has about 20 hours of work left between here and Einstein. Since I\'m hoping that S5R3 resends are still happening over at Einstein, I\'m not switching to S5R4 processing there, so if no new work appears in either project by then, my system will be taking a break...
ID: 7084 · Report as offensive     Reply Quote
Nothing But Idle Time

Send message
Joined: 27 Aug 07
Posts: 84
Credit: 148,380
RAC: 0
Message 7086 - Posted: 19 Aug 2008, 16:40:52 UTC - in response to Message 7084.  

However, I suspect that most would agree tha most urgent thing to investigate is why so little work is forthcoming.


Agreed... My AMD has about 20 hours of work left between here and Einstein. Since I\'m hoping that S5R3 resends are still happening over at Einstein, I\'m not switching to S5R4 processing there, so if no new work appears in either project by then, my system will be taking a break...

Project scientist(s) must be content to get so little ouput. Being cynical for a moment...perhaps they never look at the output anyway so it isn\'t missed; perhaps this whole enterprise is just a ruse/vehicle to get funding and we are unknowing participants. I can\'t explain what seems like a total lack of interest in achievement or goals.
ID: 7086 · Report as offensive     Reply Quote
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 7087 - Posted: 19 Aug 2008, 17:03:07 UTC - in response to Message 7086.  


Project scientist(s) must be content to get so little ouput. Being cynical for a moment...perhaps they never look at the output anyway so it isn\'t missed; perhaps this whole enterprise is just a ruse/vehicle to get funding and we are unknowing participants. I can\'t explain what seems like a total lack of interest in achievement or goals.


Its just major back end,behind the scenes problems which will be solved in time.Please just have patience and bear with us.
ID: 7087 · Report as offensive     Reply Quote
Nothing But Idle Time

Send message
Joined: 27 Aug 07
Posts: 84
Credit: 148,380
RAC: 0
Message 7089 - Posted: 19 Aug 2008, 21:54:17 UTC - in response to Message 7087.  

Its just major back end, behind the scenes problems which will be solved in time. Please just have patience and bear with us.

Au contraire, a large number of people have persevered, been very patient for a very long time and are now thoroughly exasperated. We aren\'t giving up on Cosmo but some REAL progress would be a welcome and tranquilizing (if not novel) feeling.
ID: 7089 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7104 - Posted: 21 Aug 2008, 22:20:33 UTC

Okay, here\'s another one that just plain defies logic:

WU 4941639 indicates it was set for:

max # of error/total/success tasks 3, 6, 3

yet it generated a total of 7 tasks and despite having 2 successfully returned and:

minimum quorum 2

I assume it\'s not counting any of them as a canonical result because both of the successful return show 0 credit and the WU errors out with:

errors Too many total results
ID: 7104 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7109 - Posted: 22 Aug 2008, 1:24:22 UTC - in response to Message 7104.  
Last modified: 22 Aug 2008, 1:35:06 UTC


errors Too many total results


Hmmm... Not good... I had been watching for something like that to happen when the total number of tasks started going beyond 6. I didn\'t have time to check this morning before I left, but I notice that one of my own tasks got caught up in that same scenario...

WU 4951154

Since a few of the tasks that I have are on their 5th and 6th entry, I\'m not willing to gamble, so I\'m aborting any tasks that I\'m dependent upon someone else to complete... Apologies in advance...

Edit: Well, that turned out to only be 2 tasks because I looked at 2 others that have wingmen that haven\'t reported already and they appear to be reporting fairly consistently... Since I\'m trying to run through the tasks I picked up as a test from Milkyway, I won\'t be getting around to them for a couple of days, so if either of those two hosts fail in their respective WUs, I\'ll abort those as well, since it would end up having a 7th task and thus I would not be doing any \"science\" that the project would use...
ID: 7109 · Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 7110 - Posted: 22 Aug 2008, 6:31:58 UTC - in response to Message 7109.  


errors Too many total results


Since a few of the tasks that I have are on their 5th and 6th entry, I\'m not willing to gamble, so I\'m aborting any tasks that I\'m dependent upon someone else to complete... Apologies in advance...


I too have noticed this sort of thing and have aborted any task that ended in a _6 or an _5, in the latter case only where an _6 already exists. Apologies to the wingmen involved but neither of us would have got any credit for the work and I suspect therefore the result would not have contributed to the project.

It looks as if a change to the server code is needed. There needs to be some logic in the work generator like this:

If task-id ends with greater than \"_5\" then
don\'t generate task
issue abort notice to other tasks in unit.

I don\'t know how difficult it would be to include something along these lines, it would also have not to be hard coded \"_5\" but follow the maximum number of tasks currently set, of course.

In each of the cases I where I have aborted a task recently I have counted more than what I would call 3 errors. Perhaps what counts as an error also needs to be re-examined.

Phoneman1
ID: 7110 · Report as offensive     Reply Quote

Forums : Technical Support : Various tasks or WU\'s that appear to have problems