Advanced search

Forums : Technical Support : No Validation
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 6455 - Posted: 9 Jul 2008, 16:25:42 UTC

It seems that while its green across the board at the server status page, the validator is still losing ground. Is it just me or does anyone else see something I don\'t?
A clear conscience is usually the sign of a bad memory
ID: 6455 · Report as offensive     Reply Quote
Siegfried Niklas
Avatar

Send message
Joined: 21 Mar 08
Posts: 2
Credit: 420,550
RAC: 0
Message 6456 - Posted: 9 Jul 2008, 16:51:04 UTC

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")


Member of Crunching Family
http://crunching-family.at/
ID: 6456 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 6457 - Posted: 9 Jul 2008, 16:56:03 UTC - in response to Message 6456.  

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")




WOW! Thanks, I hadn\'t even thought about looking at the old page. Kinda puts a new spin on things,huh?
A clear conscience is usually the sign of a bad memory
ID: 6457 · Report as offensive     Reply Quote
Profile Jord
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 6458 - Posted: 9 Jul 2008, 17:14:23 UTC - in response to Message 6456.  

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")


That has a very easy explanation, just look at the link you\'re on.
The old web-pages run cached through the test site. There needs to be nothing running than the web pages as that\'s the only thing that is on there.
ID: 6458 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 6459 - Posted: 9 Jul 2008, 17:36:25 UTC - in response to Message 6458.  

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")


That has a very easy explanation, just look at the link you\'re on.
The old web-pages run cached through the test site. There needs to be nothing running than the web pages as that\'s the only thing that is on there.



OK, so the old server stats is NOT the one to watch, just the one on the new page? And if so and all IS green, any Idea on why the validator is still losing ground?
A clear conscience is usually the sign of a bad memory
ID: 6459 · Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 11 Dec 07
Posts: 420
Credit: 270,580
RAC: 0
Message 6461 - Posted: 9 Jul 2008, 23:12:46 UTC - in response to Message 6458.  

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")


That has a very easy explanation, just look at the link you\'re on.
The old web-pages run cached through the test site. There needs to be nothing running than the web pages as that\'s the only thing that is on there.


Gonna have to disagree with you here. The \"old\" design has a more current UTC timestamp.

However, it doesn\'t really matter either way, because over time I\'ve found that either page is utterly and completely worthless. I wouldn\'t depend on the page(s) to be accurate at all...
ID: 6461 · Report as offensive     Reply Quote
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 6462 - Posted: 10 Jul 2008, 0:29:31 UTC - in response to Message 6461.  
Last modified: 10 Jul 2008, 0:31:27 UTC

Very mysterious!

New page design: Status all green (\"Running\")

Old page design: Status most red (\"Not Running\")


That has a very easy explanation, just look at the link you\'re on.
The old web-pages run cached through the test site. There needs to be nothing running than the web pages as that\'s the only thing that is on there.


Gonna have to disagree with you here. The \"old\" design has a more current UTC timestamp.

However, it doesn\'t really matter either way, because over time I\'ve found that either page is utterly and completely worthless. I wouldn\'t depend on the page(s) to be accurate at all...


I agree with Brian.....reliability of the Server Status has always been questionable.

However if the difference is just the website why the different numbers, timestamps,and status? There is something miscue here....if it were set up right they should always be exactly the same!
ID: 6462 · Report as offensive     Reply Quote
Profile caspr
Avatar

Send message
Joined: 8 Aug 07
Posts: 54
Credit: 527,780
RAC: 0
Message 6476 - Posted: 11 Jul 2008, 4:21:38 UTC
Last modified: 11 Jul 2008, 4:22:47 UTC

actually Jeff this is what Scott said about the difference between old and new sites.....

The old page can\'t see the pid files for the daemons, but it can read the database correctly.



thought you might like to know.
Rick
A clear conscience is usually the sign of a bad memory
ID: 6476 · Report as offensive     Reply Quote
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,539,257
RAC: 0
Message 6477 - Posted: 11 Jul 2008, 10:35:04 UTC
Last modified: 11 Jul 2008, 10:36:26 UTC

LOL, my Box\'s are so choked up with Wu\'s from Projects that I can\'t return work to it\'s a wonder they don\'t start up chucking their hardware ... :)

PBT-04 64B MGR PBOYZTOY4 71956 Riesel Sieve Project 7/11/2008 6:22:57 AM Temporarily failed upload of riesel-sieve_12031192_0_fact.out: system connect
PBT-04 64B MGR PBOYZTOY4 71953 Cosmology@Home 7/11/2008 6:22:48 AM Message from server: Project is temporarily shut down for maintenance
PBT-04 64B MGR PBOYZTOY4 71944 Milkyway@home 7/11/2008 6:22:38 AM Scheduler request failed: Couldn\'t connect to server
ID: 6477 · Report as offensive     Reply Quote
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 6481 - Posted: 11 Jul 2008, 22:24:14 UTC

I posted on another thread somewhere about this... but I\'ll post it here.

The validator is working fine and is validating results. However, this seems to be happening extrordinarily slowly due to overwhelming file IO traffic on the server. We\'re looking at purchasing a RAID controller and new drives for a serious database setup, but that will probably take a bit. In the mean time, I want to keep the scheduler off to let the validator grind away.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 6481 · Report as offensive     Reply Quote
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 6482 - Posted: 11 Jul 2008, 22:47:02 UTC - in response to Message 6481.  
Last modified: 11 Jul 2008, 22:54:11 UTC

I posted on another thread somewhere about this... but I\'ll post it here.

The validator is working fine and is validating results. However, this seems to be happening extrordinarily slowly due to overwhelming file IO traffic on the server. We\'re looking at purchasing a RAID controller and new drives for a serious database setup, but that will probably take a bit. In the mean time, I want to keep the scheduler off to let the validator grind away.



Scott at the current rate of validation you would have to keep the scheduler off almost a week,in the past the server has validated this amount of backlog in less than a few hours....with the scheduler off where is all this load coming from?

Since it will be a while until you can get the hardware you could help yourself and us by going back to the longer tasks we used to get when the lensing parameter was included. The longer the tasks the less the server load.
ID: 6482 · Report as offensive     Reply Quote
Honza
Volunteer tester

Send message
Joined: 21 May 07
Posts: 26
Credit: 5,222,146
RAC: 0
Message 6485 - Posted: 12 Jul 2008, 9:56:28 UTC - in response to Message 6482.  

Scott at the current rate of validation you would have to keep the scheduler off almost a week,in the past the server has validated this amount of backlog in less than a few hours....with the scheduler off where is all this load coming from?
And once the validation queue is empty, it will be flooded again by results in progress...
If we make darkmatter HW visible - what would we see? What\'s the HW specification of the server?
BOINC Project specifications and hardware requirements
ID: 6485 · Report as offensive     Reply Quote
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 6486 - Posted: 12 Jul 2008, 18:57:32 UTC

Another suggestion for cutting the I/O down is to consolidate the output files from the current 5 files to something less....ideally 1 which would lighten the server load tremendously.
ID: 6486 · Report as offensive     Reply Quote
Honza
Volunteer tester

Send message
Joined: 21 May 07
Posts: 26
Credit: 5,222,146
RAC: 0
Message 6487 - Posted: 12 Jul 2008, 19:11:18 UTC - in response to Message 6486.  

Another suggestion for cutting the I/O down is to consolidate the output files from the current 5 files to something less....ideally 1 which would lighten the server load tremendously.

Yes, this is classic: make a single upload file, use a compression.
BOINC Project specifications and hardware requirements
ID: 6487 · Report as offensive     Reply Quote
Stevea

Send message
Joined: 14 Oct 07
Posts: 22
Credit: 1,084,550
RAC: 0
Message 6488 - Posted: 13 Jul 2008, 0:06:42 UTC - in response to Message 6487.  

Another suggestion for cutting the I/O down is to consolidate the output files from the current 5 files to something less....ideally 1 which would lighten the server load tremendously.

Yes, this is classic: make a single upload file, use a compression.
I have files trying to download from 5 days ago... its hammering my network with valuable cpu cycles, let alone the Cosmo server....
ID: 6488 · Report as offensive     Reply Quote
Honza
Volunteer tester

Send message
Joined: 21 May 07
Posts: 26
Credit: 5,222,146
RAC: 0
Message 6493 - Posted: 13 Jul 2008, 7:01:51 UTC

I\'ve hundreds to be reported and thousands pending...even can\'t get to pending list how numerous they are.
Leaving for a week+ so hopefuly Cosmo will be operational when coming back.

I wonder if there is anythink we can help...
BOINC Project specifications and hardware requirements
ID: 6493 · Report as offensive     Reply Quote
STE\/E
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 375
Credit: 16,539,257
RAC: 0
Message 6495 - Posted: 13 Jul 2008, 10:16:17 UTC

I don\'t know if it was just a Window of Opportunity or what but I managed to get all the Wu\'s I had waiting to Report back to the Server.

I still can\'t Upload several Thousand Wu\'s at the moment to the Server but at least I got rid of 1500 or so of the ones already Uploaded & waiting to Report ... :)
ID: 6495 · Report as offensive     Reply Quote
keithhenry

Send message
Joined: 29 Aug 07
Posts: 1
Credit: 298,715
RAC: 0
Message 6503 - Posted: 14 Jul 2008, 1:45:20 UTC

Well, according to the server status page, the number to validate seems to be dropping steadily. It may be the sheer volume that is causing or contributing to the problem, enough so that the server is memory constrained and having to do a lot of paging and thus lots of I/O. There may be some sort of index that is maintained by the file system that impacts this too, either from size or volume of updates from all the files.
ID: 6503 · Report as offensive     Reply Quote
Profile Ananas

Send message
Joined: 19 Jan 08
Posts: 180
Credit: 2,500,290
RAC: 0
Message 6508 - Posted: 14 Jul 2008, 14:00:35 UTC

The server overall performance is currently very poor, even URLs without database access and high CPU or memory requirements load very slow.

I do not think that the validator alone can cause something like that, especially as the validate rate is not very high.

When all BOINC services return, it will only be a matter of hours until we\'re in the same situation again.

It is necessary to figure out what really makes it slow - and then let the validator catch up with a better performance.

Ram shortage, filesystem inode count, even a loose network plug or a damaged network card could cause those things - but what happens now is just working on the symptoms, not fixing the problem

Well, this is just my opinion and I might be wrong ... we\'ll see.
ID: 6508 · Report as offensive     Reply Quote
Previous · 1 · 2

Forums : Technical Support : No Validation