Advanced search

Forums : Technical Support : URGENT Problems Discussion Thread
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 18 · Next

AuthorMessage
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,522,388
RAC: 0
Message 5825 - Posted: 5 Apr 2008, 23:31:13 UTC
Last modified: 5 Apr 2008, 23:41:55 UTC

For now, I\'m going to increase the number of available WUs for crunching, a temporary solution.


That doesn\'t seem to be working Scott, most of my PC\'s are Completely Out or the few that have any Wu\'s only have 1 or 2 at best before they are out too. I just keep getting the Message > Reason: No work from Project.
ID: 5825 · Report as offensive
Profile Westsail and *Pyxey*
Avatar

Send message
Joined: 19 Dec 07
Posts: 24
Credit: 889,050
RAC: 0
Message 5826 - Posted: 5 Apr 2008, 23:32:32 UTC - in response to Message 5805.  
Last modified: 5 Apr 2008, 23:51:00 UTC

Update on server problems:

At first, we thought the problem lay just with Debian\'s start-up table integrity check for mysql. However, even though removing the check improved the situation somewhat, problems obviously still persisted. Mysql was using a bunch of memory with only 1% CPU or so. After further investigation, we discovered that mysql spent nearly all of it\'s time waiting for the disk. After talking with some support people and playing around with the mysql config file, we were able to significantly lower the IO delays.

The result is that now the web page seems to be loading a lot faster than before, the server is no longer chugging, and the average mysql query time is much, much lower. In addition, processes no longer seem to be piling up in the mysql process list, which means that queries from the project daemons are being answered properly. I want to say that this is the end of the database problems, but I will wait for another couple of days before deciding that all is well with that part.

I am continuing to investigate why work doesn\'t seem to be flowing very well at the moment. For now, I\'m going to increase the number of available WUs for crunching, a temporary solution.

Also: I will keep credit the way it is for now to make up for all of the credit lost. After a couple of days, I may knock it back down to around 50 or so.


OMG, that sounds like a ridiculous amount of work. All you had to do was give us a heads up. Heck, we thought all the team had gone on holiday. Thanks for all the hard work. We post \"here\" because the urgent problem was that there was no communication from the project. Had everyone known you were working on it and to be patient there would have been NO bashing. Just saying.

edit to add: Yea, getting \"no work from project\" across platforms here too.
ID: 5826 · Report as offensive
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 5827 - Posted: 5 Apr 2008, 23:45:51 UTC
Last modified: 6 Apr 2008, 0:52:26 UTC

Got some work ealier when you added more work but getting no work from project message for a while now.
ID: 5827 · Report as offensive
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 5828 - Posted: 6 Apr 2008, 0:21:25 UTC - in response to Message 5827.  
Last modified: 6 Apr 2008, 0:24:59 UTC

Got some work ealier when you added more work but getting no new work for a while now.


Are the messages being returned

2008-04-05 14:21:17 [Cosmology@Home] Requesting 144402 seconds of new work
2008-04-05 14:21:22 [Cosmology@Home] Scheduler RPC succeeded [server version 601]
2008-04-05 14:21:22 [Cosmology@Home] Deferring communication for 6 sec
2008-04-05 14:21:22 [Cosmology@Home] Reason: requested by project
2008-04-05 14:21:22 [Cosmology@Home] Deferring communication for 5 min 21 sec
2008-04-05 14:21:22 [Cosmology@Home] Reason: no work from project


or are they

2008-04-05 15:50:24 [Cosmology@Home] Requesting 88278 seconds of new work
2008-04-05 15:50:29 [Cosmology@Home] Scheduler RPC succeeded [server version 601]
2008-04-05 15:50:29 [Cosmology@Home] Message from server: No work sent
2008-04-05 15:50:29 [Cosmology@Home] Message from server: (there was work but it was committed to other platforms)
2008-04-05 15:50:29 [Cosmology@Home] Deferring communication for 6 sec
2008-04-05 15:50:29 [Cosmology@Home] Reason: requested by project
2008-04-05 15:50:29 [Cosmology@Home] Deferring communication for 1 min 0 sec
2008-04-05 15:50:29 [Cosmology@Home] Reason: no work from project


The distinction between these two is very important. The first likely just means that there is no work or the feeder has run out. In the second case though, the situation is HR-induced.

I speculated earlier that the problem should resolve itself as hungry systems refill their caches. The hungrier systems will be the Intel Core2 systems, so once the \"big guns\" tank up, the work should flow better for everyone else unless the generator / feeder has now broken... A properly functioning status page would help in determining the problem... An expanded status page would be even better, with breakdown categories like what SETI provides...
ID: 5828 · Report as offensive
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,522,388
RAC: 0
Message 5829 - Posted: 6 Apr 2008, 0:37:31 UTC
Last modified: 6 Apr 2008, 0:48:01 UTC

All my Systems are Intel Quads, 1 of them has 2 Wu\'s on it, the rest have none. Thats a long way from having them filled up ... I get the 1\'st message Reason: no work from project on all of them without the (there was work but it was committed to other platforms) message.
ID: 5829 · Report as offensive
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 5830 - Posted: 6 Apr 2008, 1:36:55 UTC - in response to Message 5829.  

All my Systems are Intel Quads, 1 of them has 2 Wu\'s on it, the rest have none. Thats a long way from having them filled up ... I get the 1\'st message Reason: no work from project on all of them without the (there was work but it was committed to other platforms) message.


Same here now... Clearly the status page is badly broken, as it states:

Results ready to send 83,763
ID: 5830 · Report as offensive
Klimax

Send message
Joined: 24 Oct 07
Posts: 22
Credit: 648,291
RAC: 0
Message 5832 - Posted: 6 Apr 2008, 5:09:02 UTC - in response to Message 5828.  

Got some work ealier when you added more work but getting no new work for a while now.


Are the messages being returned

2008-04-05 14:21:17 [Cosmology@Home] Requesting 144402 seconds of new work
2008-04-05 14:21:22 [Cosmology@Home] Scheduler RPC succeeded [server version 601]
2008-04-05 14:21:22 [Cosmology@Home] Deferring communication for 6 sec
2008-04-05 14:21:22 [Cosmology@Home] Reason: requested by project
2008-04-05 14:21:22 [Cosmology@Home] Deferring communication for 5 min 21 sec
2008-04-05 14:21:22 [Cosmology@Home] Reason: no work from project


or are they

2008-04-05 15:50:24 [Cosmology@Home] Requesting 88278 seconds of new work
2008-04-05 15:50:29 [Cosmology@Home] Scheduler RPC succeeded [server version 601]
2008-04-05 15:50:29 [Cosmology@Home] Message from server: No work sent
2008-04-05 15:50:29 [Cosmology@Home] Message from server: (there was work but it was committed to other platforms)
2008-04-05 15:50:29 [Cosmology@Home] Deferring communication for 6 sec
2008-04-05 15:50:29 [Cosmology@Home] Reason: requested by project
2008-04-05 15:50:29 [Cosmology@Home] Deferring communication for 1 min 0 sec
2008-04-05 15:50:29 [Cosmology@Home] Reason: no work from project


The distinction between these two is very important. The first likely just means that there is no work or the feeder has run out. In the second case though, the situation is HR-induced.

I speculated earlier that the problem should resolve itself as hungry systems refill their caches. The hungrier systems will be the Intel Core2 systems, so once the \"big guns\" tank up, the work should flow better for everyone else unless the generator / feeder has now broken... A properly functioning status page would help in determining the problem... An expanded status page would be even better, with breakdown categories like what SETI provides...


And it is the first message.Just tried it.And in 21:00 CET(Summer time) it worked correctly.It looks like feeder still has big problems...

And databse is apparently bit slower as well (again)
ID: 5832 · Report as offensive
Profile Conan
Avatar

Send message
Joined: 28 Aug 07
Posts: 169
Credit: 1,256,874
RAC: 0
Message 5841 - Posted: 6 Apr 2008, 13:38:00 UTC

I am getting a few different messages which seem to depend on which computer has asked for the work.
I get \"no work from project\"
Or I get \"got 0 new tasks\"

I don\'t seem to be getting the \"work available for other platforms\" message.

Also project keeps downloading the \"Master File\", I believe this is because no work is coming through.

I did get some work on the 5/4/08 (Saturday), but not a great deal on the Linux machines and now they have run out.
My Windows machine managed to download quite a few and is still working through it\'s lot of WU\'s.

I only have AMD Opteron processors on this project.
ID: 5841 · Report as offensive
Fred

Send message
Joined: 17 Jan 08
Posts: 40
Credit: 228,230
RAC: 0
Message 5843 - Posted: 6 Apr 2008, 15:03:39 UTC - in response to Message 5841.  

I am getting a few different messages which seem to depend on which computer has asked for the work.
I get \"no work from project\"
Or I get \"got 0 new tasks\"

I don\'t seem to be getting the \"work available for other platforms\" message.

Also project keeps downloading the \"Master File\", I believe this is because no work is coming through.

I did get some work on the 5/4/08 (Saturday), but not a great deal on the Linux machines and now they have run out.
My Windows machine managed to download quite a few and is still working through it\'s lot of WU\'s.

I only have AMD Opteron processors on this project.

Likewise on my Intel box (Q6600/XP Home) I am getting only:

06/04/2008 15:14:52|Cosmology@Home|Sending scheduler request: To fetch work. Requesting 380027 seconds of work, reporting 1 completed tasks
06/04/2008 15:14:58|Cosmology@Home|Scheduler request succeeded: got 0 new tasks

F.
ID: 5843 · Report as offensive
JardaM

Send message
Joined: 19 Dec 07
Posts: 11
Credit: 663,490
RAC: 0
Message 5848 - Posted: 6 Apr 2008, 17:53:50 UTC

Guess who the winner is:
wu_030908_190805_1:
.......outcome...........CPU time
...didn\'t need.............0,00
...didn\'t need.............0,00
...didn\'t need.............0,00
...didn\'t need...........20 000 (+-)
ID: 5848 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 5850 - Posted: 6 Apr 2008, 20:17:36 UTC

It looks like the feeder just keeps quitting on me. I didn\'t get any work on my machine (C2D Linux 32-bit), checked the feeder, and found that it wasn\'t writing to the log. I restarted and now it looks like it\'s handing out some work again (at least to my machine).

Can anybody with a different architecture confirm that they can get work too?
Scott Kruger
Project Administrator, Cosmology@Home
ID: 5850 · Report as offensive
-ShEm-

Send message
Joined: 28 Nov 07
Posts: 17
Credit: 410,400
RAC: 0
Message 5852 - Posted: 6 Apr 2008, 20:30:49 UTC - in response to Message 5850.  
Last modified: 6 Apr 2008, 20:33:42 UTC

Can anybody with a different architecture confirm that they can get work too?

Got some on Pentium M WinXP. Tried a few minutes later on P4 WinXP and got none, tried again and got 1 :) (it wants more, but gets the \'committed to other platforms\' message now)
ID: 5852 · Report as offensive
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 5853 - Posted: 6 Apr 2008, 20:30:53 UTC - in response to Message 5850.  

It looks like the feeder just keeps quitting on me. I didn\'t get any work on my machine (C2D Linux 32-bit), checked the feeder, and found that it wasn\'t writing to the log. I restarted and now it looks like it\'s handing out some work again (at least to my machine).

Can anybody with a different architecture confirm that they can get work too?


No work from project, then 2 tasks, then committed to other platforms...
ID: 5853 · Report as offensive
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,522,388
RAC: 0
Message 5854 - Posted: 6 Apr 2008, 20:40:19 UTC - in response to Message 5853.  
Last modified: 6 Apr 2008, 20:59:17 UTC

It looks like the feeder just keeps quitting on me. I didn\'t get any work on my machine (C2D Linux 32-bit), checked the feeder, and found that it wasn\'t writing to the log. I restarted and now it looks like it\'s handing out some work again (at least to my machine).

Can anybody with a different architecture confirm that they can get work too?


No work from project, then 2 tasks, then committed to other platforms...


On 12 Intel Quad PC\'s I received about 8-10 Wu\'s on each 1 of them, some others none, and no more on the one\'s that did get a few since. The work committed to other Platforms message only since then ... !!!

PS: I think some of the problem in getting any amount of work is for every 100 Wu\'s sent to me 50 of them or more are nothing but Download Error\'s & 40 are Cancelled by the Server as not needed any more or for some other reason so I only end up with 10 Wu\'s. That 50 & 40 figure is just a guess but as I look at my Tasks it sure looks that way to me ... :)
ID: 5854 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 5855 - Posted: 6 Apr 2008, 20:56:27 UTC

More info:

The feeder may keep on dying due to the timeout settings in mysql being set too low. I\'ve changed them around and restarted the database, so we\'ll see what happens.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 5855 · Report as offensive
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 5856 - Posted: 6 Apr 2008, 21:11:44 UTC - in response to Message 5855.  

More info:

The feeder may keep on dying due to the timeout settings in mysql being set too low. I\'ve changed them around and restarted the database, so we\'ll see what happens.


Since you went to UWM last week, if what you have done does not fix the issue, I would suggest contacting either Bernd Machenschalk or Bruce Allen at Einstein@Home to see if they can help...
ID: 5856 · Report as offensive
Cluster Physik

Send message
Joined: 4 Feb 08
Posts: 1
Credit: 588,130
RAC: 0
Message 5857 - Posted: 6 Apr 2008, 21:13:52 UTC - in response to Message 5855.  
Last modified: 6 Apr 2008, 21:24:56 UTC

More info:

The feeder may keep on dying due to the timeout settings in mysql being set too low. I\'ve changed them around and restarted the database, so we\'ll see what happens.

Is that change already in effect? Just tried on 2 machines, a C2Q and a X2, and didn\'t get any work.

Edit:
Just that I wrote it, the Quad got one WU and another X2 two. But it doesn\'t appear the problems are solved, just a little bit alleviated. I also got two MD5 checksum errors.
And now it\'s back to:
06-Apr-2008 23:23:52 [Cosmology@Home] Message from server: No work sent
06-Apr-2008 23:23:52 [Cosmology@Home] Message from server: (there was work but it was committed to other platforms)
ID: 5857 · Report as offensive
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 5858 - Posted: 6 Apr 2008, 21:24:11 UTC - in response to Message 5854.  


PS: I think some of the problem in getting any amount of work is for every 100 Wu\'s sent to me 50 of them or more are nothing but Download Error\'s


I was about to comment on how it was odd that I never have seen these download error (MD5 problems), but then I have a lovely sea of red in my messages tab right now...

The INI files are highly compressable, so downloading is fast. I would think that the communication infrastructure in BOINC uses TCP and not UDP, so delivery of all packets should be guaranteed... Hmmm.... Malformed workunit, but the md5 signature is created properly?
ID: 5858 · Report as offensive
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 5859 - Posted: 6 Apr 2008, 21:32:36 UTC

Thanks Scott.

Since you changed the timeout both my M$ Vista Intel Quads are now fully loaded with work and a queue of 0.1 days is being slowly built one or two wus at a time approximately every fifth automatic report. On the other four out five times I am getting \"committed to other platform\" messages.

By the way in the process two MD5 checksum messages have appeared in the last 40 minutes.

I don\'t know anything about mySQL but databases generally can easily have their performance knocked by rapid growth in the number of database records. If you know a mySQL specialist, or are one yourself, it might be an idea to check whether any indexes need re-sizing or any other housekeeping is needed on them. Note on most databases this normally means some downtime.

Adjusting the time out, as you have just done, is a bit like retarding the inigtion on an old car when running with poor quality petrol, it will keep you moving but performance will suffer. That is to say the database is running but presumably slower than you intended.

Phoneman1
ID: 5859 · Report as offensive
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,522,388
RAC: 0
Message 5861 - Posted: 6 Apr 2008, 22:10:17 UTC
Last modified: 6 Apr 2008, 22:34:17 UTC

I just had to Detach the Cosmo Project from 1 Intel Quad PC, BOINC simply would not run with the Project Attached. BOINC kept locking up every time I started it & I suspected Cosmo was the reason for it.

I suspect it was a Network Problem that was locking up BOINC as it was trying to connect to Cosmology but couldn\'t for some reason, as soon as I Detached Cosmo (Had to do it manually by deleting the files from the BOINC Directory) and restarted BOINC it ran fine.

As an example of the Download Errors that PC had 20 Cosmo Wu\'s on it, 15 of them were Download Errors waiting to return. I\'ve Re-Attached that PC to Cosmo again but haven\'t received any new work yet so I don\'t know if it will Lock up again when it does ... !!!

PS: The PC Received 3 Wu\'s but all 3 were Download Error\'s & now it\'s back to the Work for other Projects Message again ... !!!
ID: 5861 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 18 · Next

Forums : Technical Support : URGENT Problems Discussion Thread