Advanced search

Forums : Technical Support : Unit Verified But Status is Still Pending
Message board moderation

To post messages, you must log in.

AuthorMessage
arcturus

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 666,900
RAC: 0
Message 4035 - Posted: 22 Nov 2007, 3:24:44 UTC

http://www.cosmologyathome.org/workunit.php?wuid=732100
http://www.cosmologyathome.org/workunit.php?wuid=724700
http://www.cosmologyathome.org/workunit.php?wuid=723981
http://www.cosmologyathome.org/workunit.php?wuid=723887
http://www.cosmologyathome.org/workunit.php?wuid=722973
http://www.cosmologyathome.org/workunit.php?wuid=722910
http://www.cosmologyathome.org/workunit.php?wuid=722891
http://www.cosmologyathome.org/workunit.php?wuid=719822
http://www.cosmologyathome.org/workunit.php?wuid=717999
http://www.cosmologyathome.org/workunit.php?wuid=717236
http://www.cosmologyathome.org/workunit.php?wuid=716501
http://www.cosmologyathome.org/workunit.php?wuid=715030
http://www.cosmologyathome.org/workunit.php?wuid=713611
http://www.cosmologyathome.org/workunit.php?wuid=711137
http://www.cosmologyathome.org/workunit.php?wuid=708313
http://www.cosmologyathome.org/workunit.php?wuid=707925
http://www.cosmologyathome.org/workunit.php?wuid=670314
ID: 4035 · Report as offensive     Reply Quote
Jim Weisert

Send message
Joined: 1 Oct 07
Posts: 3
Credit: 194,536
RAC: 0
Message 4036 - Posted: 22 Nov 2007, 5:19:37 UTC - in response to Message 4035.  

http://www.cosmologyathome.org/workunit.php?wuid=732100
http://www.cosmologyathome.org/workunit.php?wuid=724700
http://www.cosmologyathome.org/workunit.php?wuid=723981
http://www.cosmologyathome.org/workunit.php?wuid=723887
http://www.cosmologyathome.org/workunit.php?wuid=722973
http://www.cosmologyathome.org/workunit.php?wuid=722910
http://www.cosmologyathome.org/workunit.php?wuid=722891
http://www.cosmologyathome.org/workunit.php?wuid=719822
http://www.cosmologyathome.org/workunit.php?wuid=717999
http://www.cosmologyathome.org/workunit.php?wuid=717236
http://www.cosmologyathome.org/workunit.php?wuid=716501
http://www.cosmologyathome.org/workunit.php?wuid=715030
http://www.cosmologyathome.org/workunit.php?wuid=713611
http://www.cosmologyathome.org/workunit.php?wuid=711137
http://www.cosmologyathome.org/workunit.php?wuid=708313
http://www.cosmologyathome.org/workunit.php?wuid=707925
http://www.cosmologyathome.org/workunit.php?wuid=670314


I didn\'t check all of these WUs, but the ones I looked at are probably situations in which either your result or your wingman\'s result failed to upload properly, so the results did not validate. Eventually, a 3rd result will used for validation.

There appears to be a bug in the BOINC client that caused uploads to be aborted sometime during the Cosmology@Home server failure. See this thread
ID: 4036 · Report as offensive     Reply Quote
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 4039 - Posted: 22 Nov 2007, 10:29:27 UTC - in response to Message 4036.  

There appears to be a bug in the BOINC client that caused uploads to be aborted sometime during the Cosmology@Home server failure. See this thread

It\'s not that simple. It\'s a problem with one of the file_upload_handlers on the project and the client timing out and giving up on the upload after a couple of retries. The client side has now been adjusted so any upload problem doesn\'t time out over time, but is handled as a transient error (anything to do with uploading/downloading). That still leaves it up to the project to make sure they have sufficient file_upload_handlers in place.
ID: 4039 · Report as offensive     Reply Quote
arcturus

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 666,900
RAC: 0
Message 4041 - Posted: 22 Nov 2007, 19:14:03 UTC

ok fine when can we expect points to be granted or not?
ID: 4041 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 4044 - Posted: 22 Nov 2007, 20:15:57 UTC

You have an Athlon XP CPU, right? Are you overclocking it (or anything on your machine, for that matter).

The reason why I ask is that a lot of your results are considered \"inconclusive\" right now, meaning that your result and the other did not agree to the desired accuracy. This doesn\'t mean that it was necessarily your machine\'s fault (nor does it necessarily mean that you won\'t get credit when the next result is compared), but it can be if you\'re overclocking or increased your RAM voltage or something along those lines.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 4044 · Report as offensive     Reply Quote
Hefto99

Send message
Joined: 24 Jun 07
Posts: 7
Credit: 6,391,524
RAC: 0
Message 4050 - Posted: 23 Nov 2007, 2:27:11 UTC - in response to Message 4044.  

You have an Athlon XP CPU, right? Are you overclocking it (or anything on your machine, for that matter).

The reason why I ask is that a lot of your results are considered \"inconclusive\" right now, meaning that your result and the other did not agree to the desired accuracy. This doesn\'t mean that it was necessarily your machine\'s fault (nor does it necessarily mean that you won\'t get credit when the next result is compared), but it can be if you\'re overclocking or increased your RAM voltage or something along those lines.


Scott, I think it is not related to OC or anything else.
I have the same problem on 2 machines - all results uploaded during recent problems with server and reported Nov 15 are invalid.
ID: 4050 · Report as offensive     Reply Quote
arcturus

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 666,900
RAC: 0
Message 4051 - Posted: 23 Nov 2007, 3:30:29 UTC - in response to Message 4044.  
Last modified: 23 Nov 2007, 3:36:58 UTC

You have an Athlon XP CPU, right? Are you overclocking it (or anything on your machine, for that matter).


Yes, it\'s an overclocked XP but why *only* those units submitted 1:39:26 UTC on the 16th? All other units submitted subsequent (and previous) have received points and the pc remains untouched so it seems unlikely the oc is relevant in this case.


ID: 4051 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 28 Aug 07
Posts: 169
Credit: 2,093,665
RAC: 3,105
Message 4052 - Posted: 23 Nov 2007, 13:55:56 UTC

I have the same problem when my work units uploaded on the 15th and none of my machines are overclocked, all standard.
It is related to the server problems that were fixed on the 15th, the constant retrys to upload seems to have caused some irregularities in the returned results by possibly giving up on upload and when WU was uploaded not all files were included?
I have over 35 work units affected by this.
All work prior and since is not having problems and being validated ok.

This is about the 3rd thread with people having this problem so it is not an isolated problem to one user.

If you check your database you will see lots and lots of work units that have not been verified on the 15th/16th after they uploaded when the server came back online.
ID: 4052 · Report as offensive     Reply Quote
cwhyl

Send message
Joined: 26 Jul 07
Posts: 3
Credit: 296,700
RAC: 0
Message 4053 - Posted: 23 Nov 2007, 16:45:33 UTC - in response to Message 4051.  

You have an Athlon XP CPU, right? Are you overclocking it (or anything on your machine, for that matter).


Yes, it\'s an overclocked XP but why *only* those units submitted 1:39:26 UTC on the 16th? All other units submitted subsequent (and previous) have received points and the pc remains untouched so it seems unlikely the oc is relevant in this case.

Yep, my lost results have nothing to do with overclocking.
I run Boinc with a command window +manager and saw the exact same things as Matthias Lehmkuhl reported in the other thread: \"giving up on uploading...files not found\" and then the WU state switched from \"Uploading\" to \"Ready to report\" in the manager after numerous tries.
ID: 4053 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 4054 - Posted: 23 Nov 2007, 19:21:05 UTC

Just a thought =)

I\'m turning on some debugging features of the validator to see which files aren\'t validating and why. Hopefully I\'ll have some answers soon.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 4054 · Report as offensive     Reply Quote
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 4057 - Posted: 24 Nov 2007, 0:42:20 UTC - in response to Message 4054.  

Just a thought =)

I\'m turning on some debugging features of the validator to see which files aren\'t validating and why. Hopefully I\'ll have some answers soon.


Scott, If it helps the problem will be that these files will have some or all of the output files missing. The clients effected will probably be 5.10.21 and later and the files will have downloaded prior to the outage and reported back to the scheduler after the outage.
I suspect that the project connected up a temporary server on the 12th to inform users what the problem was. This server would not have had a file_upload_handler program installed and this caused these later clients to think that they had contacted the server and that the files were no longer needed so they auto-deleted any upload file that tried to upload. The code has now been changed and should be in the next release after 5.10.30. Dave.
ID: 4057 · Report as offensive     Reply Quote
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 4059 - Posted: 24 Nov 2007, 1:06:30 UTC

Scott, A final thought before I go to bed. What happens if two clients have reported the same WU but have both failed to upload the output files? Will the validator spot the lack of output files and invalidate them both? If not then you could have a real headache trying to sort them out of your database! Dave.
ID: 4059 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 4067 - Posted: 25 Nov 2007, 3:21:50 UTC - in response to Message 4059.  

Scott, A final thought before I go to bed. What happens if two clients have reported the same WU but have both failed to upload the output files? Will the validator spot the lack of output files and invalidate them both? If not then you could have a real headache trying to sort them out of your database! Dave.

If there aren\'t exactly 6 output files, the result is invalidated. If both results fail to have exactly 6 output files, then both are invalidated.


Anyway, the validator shows that some of the results are failing because CAMB will output \"NaN\" (not a number) instead of real values in a number of the output files, meaning that the integrations have failed. This seems to be an issue with the compiler; we compiled CAMB with ifort 10 instead of ifort 9 like usual. We might try compiling it the other way around, but that might cause more problems than it solves, though.

As it stands, only about 1% of the results end up being invalidated (for the last couple of weeks, we had around 250 out of 25000 invalid results, which is pretty good, considering).

I\'ll keep you posted, though.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 4067 · Report as offensive     Reply Quote
arcturus

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 666,900
RAC: 0
Message 4136 - Posted: 3 Dec 2007, 15:49:55 UTC

Any update?
ID: 4136 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 4137 - Posted: 3 Dec 2007, 21:41:37 UTC - in response to Message 4136.  

Any update?

I\'ve been going through the invalidated results and I can\'t find any real correlation between them, other than certain hosts seem to send back invalid results most of the time. However, the invalidation rate has gone down to about .4% in the last couple of weeks, so I think I\'d rather not fix what isn\'t that broken.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 4137 · Report as offensive     Reply Quote
arcturus

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 666,900
RAC: 0
Message 4138 - Posted: 3 Dec 2007, 22:07:04 UTC

How is this thread different from the thread HERE where that guys seems to have a similar issue and you mention a fix?
ID: 4138 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 28 Aug 07
Posts: 169
Credit: 2,093,665
RAC: 3,105
Message 4139 - Posted: 3 Dec 2007, 22:12:08 UTC - in response to Message 4137.  

Any update?

I\'ve been going through the invalidated results and I can\'t find any real correlation between them, other than certain hosts seem to send back invalid results most of the time. However, the invalidation rate has gone down to about .4% in the last couple of weeks, so I think I\'d rather not fix what isn\'t that broken.


G\'Day Scott,
The validation problem, I understood this thread to be most concerned about, is all the results from 15th (possibly 16th for some) of November.
This is when we were able to upload our results after the server went down.
Very few of those results have validated for me, even ones I had completed before the crash are still sitting there due to another computer returning it\'s result on the 15th.

So currently the validator is going great but back on the 15th/16th it was sick.
I have not noticed any of my results from the 15th having been resent to a third or 4th host (and a couple will be 5th and 6th) as yet.

(I suppose I am trying not to lose 3,900 cobblestones (39 results x 100 granted credits)).

Thanks Conan,
Keep smiling, it makes others wonder what you have been up too.

ID: 4139 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 4144 - Posted: 4 Dec 2007, 3:52:33 UTC - in response to Message 4139.  

Any update?

I\'ve been going through the invalidated results and I can\'t find any real correlation between them, other than certain hosts seem to send back invalid results most of the time. However, the invalidation rate has gone down to about .4% in the last couple of weeks, so I think I\'d rather not fix what isn\'t that broken.


G\'Day Scott,
The validation problem, I understood this thread to be most concerned about, is all the results from 15th (possibly 16th for some) of November.
This is when we were able to upload our results after the server went down.
Very few of those results have validated for me, even ones I had completed before the crash are still sitting there due to another computer returning it\'s result on the 15th.

So currently the validator is going great but back on the 15th/16th it was sick.
I have not noticed any of my results from the 15th having been resent to a third or 4th host (and a couple will be 5th and 6th) as yet.

(I suppose I am trying not to lose 3,900 cobblestones (39 results x 100 granted credits)).

Thanks Conan,
Keep smiling, it makes others wonder what you have been up too.

Ahh... OK. For whatever reason, I assumed this was a general problem with validation, not just with the ones on the 15/16.

I can get that fixed fairly easily. I will make those results valid, grant them credit, and then cancel them so that they don\'t have to get sent out again. This will have to wait until tomorrow evening, though, since I\'m a bit busy with school work right now.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 4144 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 28 Aug 07
Posts: 169
Credit: 2,093,665
RAC: 3,105
Message 4145 - Posted: 4 Dec 2007, 4:27:49 UTC - in response to Message 4144.  

Any update?

I\'ve been going through the invalidated results and I can\'t find any real correlation between them, other than certain hosts seem to send back invalid results most of the time. However, the invalidation rate has gone down to about .4% in the last couple of weeks, so I think I\'d rather not fix what isn\'t that broken.


G\'Day Scott,
The validation problem, I understood this thread to be most concerned about, is all the results from 15th (possibly 16th for some) of November.
This is when we were able to upload our results after the server went down.
Very few of those results have validated for me, even ones I had completed before the crash are still sitting there due to another computer returning it\'s result on the 15th.

So currently the validator is going great but back on the 15th/16th it was sick.
I have not noticed any of my results from the 15th having been resent to a third or 4th host (and a couple will be 5th and 6th) as yet.

(I suppose I am trying not to lose 3,900 cobblestones (39 results x 100 granted credits)).

Thanks Conan,
Keep smiling, it makes others wonder what you have been up too.

Ahh... OK. For whatever reason, I assumed this was a general problem with validation, not just with the ones on the 15/16.

I can get that fixed fairly easily. I will make those results valid, grant them credit, and then cancel them so that they don\'t have to get sent out again. This will have to wait until tomorrow evening, though, since I\'m a bit busy with school work right now.


That will be fine, thanks Scott, much appreciated.
ID: 4145 · Report as offensive     Reply Quote

Forums : Technical Support : Unit Verified But Status is Still Pending