Advanced search

Forums : Technical Support : cant get any new work???
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 6980 - Posted: 10 Aug 2008, 13:07:51 UTC

Workunits created over two weeks ago have been piling up, so we\'ve made a small change to the feeder code and cancelled the offending WUs.


Lovely. :-P

That was nearly every task that every one of my computers were currently working and had worked for the last day or so. You just trashed 90% of the work done by 15 computers for the last day or so as well as making sure that I can never see the vast majority of the work sitting in \'pending\' ever complete.

Oh, and now nearly every computer is getting \"no work sent\" messages instead of just 25% of them so nothing was really fixed. :-P

Might I suggest taking a fire axe to the server next time... it would have been about as elegant a solution and would have produced approximately equal results.
ID: 6980 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 6981 - Posted: 10 Aug 2008, 13:58:57 UTC

After the general experience I\'ve had since joining this project, I really should just quit it completely, but frankly, despite the problems, I value the science.

Instead, I\'m going to leave 1 computer from each of the HR classes active and set all of them to a minimal resource of around 5% for the purpose of testing this VERY alpha stage project.

If I ever see a point in which 2 weeks can go by without a lack of work, bad applications, bad HR WU allocation, database issues, etc. then I\'ll consider joining at a normal level.
ID: 6981 · Report as offensive     Reply Quote
APoch

Send message
Joined: 12 Feb 08
Posts: 21
Credit: 245,710
RAC: 0
Message 6984 - Posted: 10 Aug 2008, 15:51:33 UTC

I need to correct one thing I brought up. The server does know I detached but , the old web page does not reflect this I just checked the new web page and I do see that it reflects accurately. I have been using the old page because I have trouble reading the new one. my bad and my bad eye\'s from now on I will use the new page and just have to put up with the not being able to see it as easily.

peace
ID: 6984 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 6988 - Posted: 10 Aug 2008, 17:19:07 UTC - in response to Message 6976.  
Last modified: 10 Aug 2008, 17:30:51 UTC

In my opinion, you might consider at least half-credit. This could help retain participants.

I\'ll remember this post as something to point at when the complaints come in about this half-credit. :-)


Shut it... I said \"at least\" :-P

Edit: And after looking around at how badly the last SQL script was thought out, I\'m no longer endorsing \"half-credit\" at all... It would behoove the project greatly to:

  • Grant the full 70 credits to any task that was returned with \"success\" for the past 3 weeks.
  • Seriously think through the consequences of their SQL scripts.
  • If this project is now going to take a stance of \"it\'s \'beta\', so if there are problems, expect 100% loss of your time and energy\", then it should state that motto somewhere on the front page of the project.


ID: 6988 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 6989 - Posted: 10 Aug 2008, 17:25:14 UTC - in response to Message 6980.  

Workunits created over two weeks ago have been piling up, so we\'ve made a small change to the feeder code and cancelled the offending WUs.


Lovely. :-P

That was nearly every task that every one of my computers were currently working and had worked for the last day or so. You just trashed 90% of the work done by 15 computers for the last day or so as well as making sure that I can never see the vast majority of the work sitting in \'pending\' ever complete.

Oh, and now nearly every computer is getting \"no work sent\" messages instead of just 25% of them so nothing was really fixed. :-P

Might I suggest taking a fire axe to the server next time... it would have been about as elegant a solution and would have produced approximately equal results.


Yeah, like I said in another thread, I was suspecting that a sledgehammer was used... :sigh:

In theory, just theory mind you, but in theory stuff going forward should be ok...

However, in light of how poorly designed / thought out the SQL script was, Jord will be happy to note that I\'m changing my \"at least half-credit\" to be \"the project should grant full credit to all affected tasks during the past 3 weeks\". This may not get everything for everyone, but it is at least something...
ID: 6989 · Report as offensive     Reply Quote
Bill & Patsy
Avatar

Send message
Joined: 27 Jul 08
Posts: 25
Credit: 1,045,640
RAC: 0
Message 7009 - Posted: 11 Aug 2008, 23:41:59 UTC

Still not getting work.
--Bill

ID: 7009 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 7029 - Posted: 15 Aug 2008, 13:49:00 UTC

Well I put 1 box back on over here yesterday to see if I could get a wu, run it and get it to validate. surprise, all went smooth but now I cant get another wu even though there\'s plenty to be had. You still got some work to do Ben. Scott didn\'t have everything working, huh?
A clear conscience is usually the sign of a bad memory
ID: 7029 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7030 - Posted: 15 Aug 2008, 18:13:45 UTC

As of about 40 minutes ago I started getting \"No work sent\" messages on my P4 HR class machine.
ID: 7030 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7034 - Posted: 15 Aug 2008, 22:56:40 UTC - in response to Message 7030.  

As of about 40 minutes ago I started getting \"No work sent\" messages on my P4 HR class machine.


Now Athlon XP, P4D (might be same HR class as P4(ht)) and Xeon 51xx (same as Core2) HR classes aren\'t getting any work as well.

In other words, pretty much exactly the same problems as before the recent \"fixes\".
ID: 7034 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7035 - Posted: 15 Aug 2008, 23:14:10 UTC - in response to Message 7034.  

As of about 40 minutes ago I started getting \"No work sent\" messages on my P4 HR class machine.


Now Athlon XP, P4D (might be same HR class as P4(ht)) and Xeon 51xx (same as Core2) HR classes aren\'t getting any work as well.

In other words, pretty much exactly the same problems as before the recent \"fixes\".


Athlon 64 picked up stuff on the first try again..
ID: 7035 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7044 - Posted: 16 Aug 2008, 22:01:30 UTC

Wow... this is just plain amazing... I knew it would happen eventually, but it finally has.

Every single HR class of machine I own is asking for work and getting the message \"No work sent\".

I was curious about the whole AMD\'s getting work when nothing else did, so I resumed my AMD machines and got nothing. So between 11 computers I have 5 tasks left that are nearing completion and then I\'ll have nothing left working for Cosmo.

I\'m guessing they\'d be better off wiping the WU/result database and starting over, starting small and seeing if they can get a few hundred WU\'s to process without screwing up, then a couple thousand, etc.
ID: 7044 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7045 - Posted: 17 Aug 2008, 1:08:36 UTC - in response to Message 7044.  


I\'m guessing they\'d be better off wiping the WU/result database and starting over, starting small and seeing if they can get a few hundred WU\'s to process without screwing up, then a couple thousand, etc.


You know, I never had really looked at the systems you have, until just now.

It is really bizzare that you have an X2 3800+ system that apparently isn\'t picking up anything, while my 3700+ system had a whole bunch of X2 3800+ systems that were wingmen...

I know it has been a couple of hours since your post, but once I saw it, I tried again and my 3700+ system got 3 tasks on the first try...

One thing I do notice though is I\'m running XP Pro SP2, where as you\'re running SP3. Like I said, \"anecdotal evidence seems to indicate that there may be too many HR classes\"...
ID: 7045 · Report as offensive     Reply Quote
Nothing But Idle Time

Send message
Joined: 27 Aug 07
Posts: 84
Credit: 148,380
RAC: 0
Message 7046 - Posted: 17 Aug 2008, 1:11:32 UTC

I don\'t presume to know anything about BOINC and it\'s data base queries but those queries centered around HR functionality must not be working...just thrashing about. Paraphrasing Thunder \"the Cosmo implementation is screwed up\". Maybe starting over -- figuratively if not literally -- might be necessary. Something needs to be done and I hope the new hiree is up to the challenge and has the knowledge to lift us out of Oblivion. I prefer getting someone from BOINC development to do the heavy lifting and then present a working project to the new hiree, with a handbook.
ID: 7046 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7047 - Posted: 17 Aug 2008, 1:15:07 UTC - in response to Message 7046.  

I don\'t presume to know anything about BOINC and it\'s data base queries but those queries centered around HR functionality must not be working...just thrashing about. Paraphrasing Thunder \"the Cosmo implementation is screwed up\". Maybe starting over -- figuratively if not literally -- might be necessary. Something needs to be done and I hope the new hiree is up to the challenge and has the knowledge to lift us out of Oblivion. I prefer getting someone from BOINC development to do the heavy lifting and then present a working project to the new hiree, with a handbook.


Obviously, since I\'m apparently a member of a functional HR class, I\'m not in favor of the aborts that they did before, especially since my version of BOINC (5.8.16) won\'t support being told that what I have has been cancelled...

It looks like to me that they are considering Windows XP SP3 as a different entity from Windows XP SP2. I think if they merge those classes across all processors, things will probably get substantially better.
ID: 7047 · Report as offensive     Reply Quote
David Guymer

Send message
Joined: 28 Jan 08
Posts: 4
Credit: 1,164,038
RAC: 447
Message 7049 - Posted: 17 Aug 2008, 2:54:15 UTC - in response to Message 7047.  
Last modified: 17 Aug 2008, 2:55:53 UTC

windows XP Pro SP2

Once again getting no work

17/08/2008 8:03:31 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:04:31 AM|Cosmology@Home|Fetching scheduler list
17/08/2008 8:04:37 AM|Cosmology@Home|Master file download succeeded
17/08/2008 8:04:42 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65716 seconds of work, reporting 0 completed tasks
17/08/2008 8:04:47 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:04:47 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:05:47 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65736 seconds of work, reporting 0 completed tasks
17/08/2008 8:05:53 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:05:53 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:06:53 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65755 seconds of work, reporting 0 completed tasks
17/08/2008 8:06:59 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:06:59 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:07:59 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65774 seconds of work, reporting 0 completed tasks
17/08/2008 8:08:05 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:08:05 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:09:05 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65793 seconds of work, reporting 0 completed tasks
17/08/2008 8:09:11 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:09:11 AM|Cosmology@Home|Message from server: No work sent
17/08/2008 8:10:21 AM|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 65815 seconds of work, reporting 0 completed tasks
17/08/2008 8:10:27 AM|Cosmology@Home|Scheduler request completed: got 0 new tasks
17/08/2008 8:10:27 AM|Cosmology@Home|Message from server: No work sent

this has been an intermittant problem sincer I joined Cosmo.

Located in Australia
ID: 7049 · Report as offensive     Reply Quote
rbpeake

Send message
Joined: 27 Jun 07
Posts: 118
Credit: 61,883
RAC: 0
Message 7050 - Posted: 17 Aug 2008, 3:20:46 UTC - in response to Message 7047.  

...It looks like to me that they are considering Windows XP SP3 as a different entity from Windows XP SP2. I think if they merge those classes across all processors, things will probably get substantially better.

I have Vista SP-1 and am getting no work for my Core2 Duo.....
ID: 7050 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7051 - Posted: 17 Aug 2008, 5:33:20 UTC - in response to Message 7050.  

...It looks like to me that they are considering Windows XP SP3 as a different entity from Windows XP SP2. I think if they merge those classes across all processors, things will probably get substantially better.

I have Vista SP-1 and am getting no work for my Core2 Duo.....


Something to bear in mind is that Intel processors attached to this project outnumber AMD processors by around a 2:1 margin. Because of this and the overly picky HR classifications, I would guess that Intel systems will have a more difficult time than AMD systems.

An added difficulty is the fact that nobody appears to be getting any \"brand new\" tasks. I don\'t know if the workunit generator is actually running or if the tasks being reported as ready to send on the status page are really all due to tasks timing out or others reporting back detaches / aborts. I personally think it\'s the latter...


Finally, bear in mind that if we\'re only getting resends, then that means that everyone is competing for a limited pool of work that has already been designated for a specific CPU/OS combination.

ID: 7051 · Report as offensive     Reply Quote
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 7052 - Posted: 17 Aug 2008, 6:38:04 UTC - in response to Message 7051.  

]...It looks like to me that they are considering Windows XP SP3 as a different entity from Windows XP SP2. I think if they merge those classes across all processors, things will probably get substantially better.


My Intel Core 2 Quad is running Windows Vista SP 1 is regularily paired with XP SP2 and XP SP3 sytems using all sorts of Intel chips; from Xenon, Core 2 Quad, Extreme, Core 2 Duo, P4 and even the odd P3!

I have Vista SP-1 and am getting no work for my Core2 Duo.....


Same here, in the last 24 hours I was lucky to pick up 6 units. 4 were resends created on Aug 13th and 2 from July 23rd!

An added difficulty is the fact that nobody appears to be getting any \"brand new\" tasks. I don\'t know if the workunit generator is actually running or if the tasks being reported as ready to send on the status page are really all due to tasks timing out or others reporting back detaches / aborts. I personally think it\'s the latter...


I think there is another reason. It is the order in which work gets issued. I think what is happening is there is a pool of work to be sent and part of it is split up and pre-allocated to the Intel / AMD / Windows and Linux groups. In theory any user should get some work from these groups BUT re-sends are being treated as a higher priority. If, for example, an AMD Windows user detatches from the project or aborts a task(s) no work can be issued until these tasks are re-sent. So there could be dozens of requests for work until someone comes along and requests work for a matching machine.

What should be happening is that the aborted tasks are put to the top of their own pool of work for sending and not at the top of ALL pools of work for sending.

Finally, bear in mind that if we\'re only getting resends, then that means that everyone is competing for a limited pool of work that has already been designated for a specific CPU/OS combination.


There is a chink of light here - if both users abort a task or it times out it can be re-issued to a different class of users. My Intel Vista machines have crunched units where the original recipients were a pair of AMD users running Linux.

Phoneman1

ID: 7052 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 7053 - Posted: 17 Aug 2008, 7:40:15 UTC - in response to Message 7052.  


I think there is another reason. It is the order in which work gets issued. I think what is happening is there is a pool of work to be sent and part of it is split up and pre-allocated to the Intel / AMD / Windows and Linux groups. In theory any user should get some work from these groups BUT re-sends are being treated as a higher priority. If, for example, an AMD Windows user detatches from the project or aborts a task(s) no work can be issued until these tasks are re-sent. So there could be dozens of requests for work until someone comes along and requests work for a matching machine.


Perhaps. HR is supposed to pre-allocate. Since we\'re running through resends, unless what you mentioned happens (both initial replications time out / error out / are aborted), then the only people who can pick up further replications are the same HR class systems...

I\'ve seen so much weirdness that I have no idea what is up and what is down anymore... I still think XP SP2 and SP3 are being treated differently, but it could be my imagination...
ID: 7053 · Report as offensive     Reply Quote
Profile Thunder
Avatar

Send message
Joined: 15 Apr 08
Posts: 101
Credit: 4,535,998
RAC: 0
Message 7059 - Posted: 17 Aug 2008, 15:20:32 UTC
Last modified: 17 Aug 2008, 15:24:39 UTC

Brian, to the best of my knowledge, any Windows 32-bit OS should be in the same HR class with any other, so long as they have the same family of processors (Example being that Core2\'s and Xeons 5110+ have the same \'insides\', Athlon 64\'s and Athlon 64X2\'s are the same, etc.)

From the BOINC documentation:

1
A fine-grained classification with 80 classes (4 OS and 20 CPU types).

(describing using setting \"1\" for this)

I\'m fairly sure that with only 4 OS\'s (I know obviously that Linux, Windows and MacOS are 3 of them...) all windows are \"Windows\". :)

There has been quite a bit of work into eliminating the need for HR by using compiler libraries that produce mathematically identical results across OS\'s and processors (LHC managed it successfully, perhaps others...). I\'d have to know how CAMB is compiled to guess whether it\'s possible for Cosmo, but I have to imagine they\'d be better off trying it than trying to fix the hopelessly broken HR system they have.
ID: 7059 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Forums : Technical Support : cant get any new work???