Advanced search

Forums : Technical Support : Tasks will not complete
Message board moderation

To post messages, you must log in.

AuthorMessage
Scott S Leach

Send message
Joined: 8 Apr 12
Posts: 2
Credit: 0
RAC: 0
Message 10717 - Posted: 12 Apr 2012, 13:22:10 UTC

I started running C@H 4 days ago. The program downloaded 3 tasks and began running one of them. The first task completed after almost 2 days but when it tried to upload to your server it generated a failed to download error. No credit for me. Then, the next task was supposed to take only 7 hours, but 1.5 days later it has grown and now says that 17 hours have elapsed and there are 10 hours remaining. How can 17 hours elapse and 10 hours remain on a 7 hour task? S&H does not do this so I do not think that it is a problem with my system. Anyway, since it seems like this task will never finish I decided to abort it, and since this was 2 failed tasks out of 2 total tasks I have relunctly decided to not waist any more time on C&H. I do not know if my effort or time was benificial to C&H "which is the issue", but I hope I did some good.

Good Luck to you all!
ID: 10717 · Report as offensive     Reply Quote
Profile cykodennis

Send message
Joined: 31 May 10
Posts: 234
Credit: 4,896,378
RAC: 0
Message 10718 - Posted: 12 Apr 2012, 14:02:29 UTC

I try to be patient.

You have obviously neither checked the message board for explainations nor the message log of your boinc manager. You do not know how the boinc client works and you obviously had no really interest in C@H.

I suggest you stay with S@H.

Thanks in advance
ID: 10718 · Report as offensive     Reply Quote
Scott S Leach

Send message
Joined: 8 Apr 12
Posts: 2
Credit: 0
RAC: 0
Message 10764 - Posted: 15 Apr 2012, 17:40:57 UTC

Ah, ok. My post was not intended for a self inflated yahoo with an anti-social attitude like yourself, but aparently nobody else from a better appointed position has an answer yet so you must have felt obligated I'm sure.
ID: 10764 · Report as offensive     Reply Quote
Profile cykodennis

Send message
Joined: 31 May 10
Posts: 234
Credit: 4,896,378
RAC: 0
Message 10765 - Posted: 15 Apr 2012, 18:25:44 UTC
Last modified: 15 Apr 2012, 18:27:49 UTC

The better answers are in the threads of the last two months within this message board. You didn't notice them because you didn't looked for them.
You just posted here for... whatever.
Additional you seem to have a problem to understand the rules of the message boards. You can find them on the left of your message frame.

btw, have you not decided to not "waste" any more of your precious time with us?
ID: 10765 · Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 28 Aug 07
Posts: 169
Credit: 1,280,875
RAC: 0
Message 11279 - Posted: 20 Jun 2012, 0:06:18 UTC

Running along the same lines are a few work units I have been getting on my 64 bit Fedora 16 Linux machine.
My Windows XP 32 bit machines do not seem to have this problem at all.

I have been getting a number of work units that have run way longer than most work units usually do.
Instead of say 14 to 16 hours which is the usual run times I have been doing they have run for as long as 32 hours or more. Plus they still get the standard 420 points for the effort.

I have just aborted two work units that appear to have restarted during their normal running time.
One had run for 40 hours, after 30 hours it was at 77% with 7 to go, this morning it was at 40 hours - 66% and 17 hours to go.
The other just stopped after 10 hours with 3 hours to go, at 77.496% for a number of minutes then restarted at 60% with 7 hours to go, so I aborted it as well.

This is only affecting my 64 bit Linux machine, very curious.
Just something else to think about.

I am posting this for information only as I have stopped work for the time being, so wont be doing any more work units.

Conan
ID: 11279 · Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 8 Sep 08
Posts: 3
Credit: 136,770
RAC: 0
Message 11347 - Posted: 1 Jul 2012, 18:32:00 UTC - in response to Message 11279.  
Last modified: 1 Jul 2012, 18:39:38 UTC

Running along the same lines are a few work units I have been getting on my 64 bit Fedora 16 Linux machine.
My Windows XP 32 bit machines do not seem to have this problem at all.

I have been getting a number of work units that have run way longer than most work units usually do.
Instead of say 14 to 16 hours which is the usual run times I have been doing they have run for as long as 32 hours or more. Plus they still get the standard 420 points for the effort.

I have just aborted two work units that appear to have restarted during their normal running time.
One had run for 40 hours, after 30 hours it was at 77% with 7 to go, this morning it was at 40 hours - 66% and 17 hours to go.
The other just stopped after 10 hours with 3 hours to go, at 77.496% for a number of minutes then restarted at 60% with 7 hours to go, so I aborted it as well.

This is only affecting my 64 bit Linux machine, very curious.
Just something else to think about.

I am posting this for information only as I have stopped work for the time being, so wont be doing any more work units.

Conan


I had run into what seems to be a similar problem also with a task I just aborted on comming home. I left for lunch it was at 96+%, and 29 hours running, got home, and it was back at, well 60%... It seems to get itself in an infinite loop, though on my case this is on Windows 7 Professional x64... I've noticed this with a handful of tasks, which over the past month where my team was running this project as PotM would amount to perhaps 1%... It's not a significant percentage of tasks, and is much smaller then those that complete without incident after it sucessfully downloads tasks, but there's a couple. From past experience, they just get themselves stuck and go back and re-process what was done previously.

It's certainly an issue worth documenting, though if there's back and forth on it existing, eh, I'll stay outa any politics that might be going on, if some of what I've read in some of the above replies suggests any angst between some participants...
ID: 11347 · Report as offensive     Reply Quote
Profile cykodennis

Send message
Joined: 31 May 10
Posts: 234
Credit: 4,896,378
RAC: 0
Message 11348 - Posted: 1 Jul 2012, 22:57:59 UTC - in response to Message 11347.  

As far as i know is 60% one of the few checkpoints of a Cosmo WU.
I ask because i had similar issues (or better: surprises) with Windows 7 64Bit:
Are there some power safe modes active Nuadormrac?

Maybe you've got some simple Win 7 problem instead of the problem that Conan describes.
ID: 11348 · Report as offensive     Reply Quote
Profile Benjamin Wandelt
Volunteer moderator
Project administrator
Project scientist
Avatar

Send message
Joined: 24 Jun 07
Posts: 192
Credit: 15,273
RAC: 0
Message 11410 - Posted: 13 Jul 2012, 21:59:01 UTC

The code controling the progress bar is on the least tested parts of our BOINC instrumentation, so it is quite posaible for that to have a bug which makes progress appear to jump backwards at various points (essentially right after checkpoints). This may explain some of this behavior.
Also, some work units compute cosmology in open or closed universes which makes them take a lot longer than the ones where the universe is exactly flat. To simplify allocation of credit we decided to award a constant amount of credit for all work packages. This credit amount is somewhat high on purpose to make up for these special work units.

Thank you for highlighting this problem again. We will look at it in our pending update to the new kernel.

Best regards,
Ben
Creator of Cosmology@Home
ID: 11410 · Report as offensive     Reply Quote
Gary Wilson

Send message
Joined: 30 Nov 08
Posts: 8
Credit: 5,000,011
RAC: 0
Message 11414 - Posted: 14 Jul 2012, 17:36:14 UTC - in response to Message 11410.  

Actually, the main problem is the checkpointing. It doesn't checkpoint very often, and you probably have the default to switch between tasks at 60 minutes. Sometimes it doesn't checkpoint for 2 hours or more. So what happens is if you are running other projects, it switches to the other one for 60 minutes, then comes back and picks up cosmology from the last checkpoint. If it doesn't checkpoint in that 60 minute period, it will appear to keep restarting from the same point, over and over. You might get lucky and it checkpoints in that interval so it appears to make a little progress.

The only way to run multiple projects with cosmology is to check the "leave applications in memory while suspended" check box on the disk and memory usage tab. The main problem with that is the cosmo tasks take a HUGE amount of memory compared to other projects.

So in reality, it's hard to run cosmo on machines running multiple projects.
ID: 11414 · Report as offensive     Reply Quote
Profile Benjamin Wandelt
Volunteer moderator
Project administrator
Project scientist
Avatar

Send message
Joined: 24 Jun 07
Posts: 192
Credit: 15,273
RAC: 0
Message 11416 - Posted: 14 Jul 2012, 19:18:14 UTC - in response to Message 11414.  

Thank you - that makes sense. There weren't very many points where checkpointing was easy without a lot of additional coding. If you want to share your machine between different projects, it seems the best way to do that for now is to run cosmology@home exclusivelyfor a while and then switch over to the other projects.

But we'll keep this in mind for future versions of the kernel.

Best,
Ben
Creator of Cosmology@Home
ID: 11416 · Report as offensive     Reply Quote
Profile cykodennis

Send message
Joined: 31 May 10
Posts: 234
Credit: 4,896,378
RAC: 0
Message 11419 - Posted: 15 Jul 2012, 20:13:29 UTC

Don't wanna tell something wrong but afaik, you are able to run a mixed client with Cosmo.
Project priorities have to be equal and the number of the projects have to be less or equal as the number of active CPU Cores. The only thing which has to be avoided is, that the Cosmo WUs become interrupted.




ID: 11419 · Report as offensive     Reply Quote
Gary Wilson

Send message
Joined: 30 Nov 08
Posts: 8
Credit: 5,000,011
RAC: 0
Message 11431 - Posted: 16 Jul 2012, 16:18:10 UTC - in response to Message 11419.  

Yes, you certainly can run multiple projects with Cosmo. Just need to be aware of the need to keep apps in memory while suspended and make sure you have plenty of RAM. With the tasks peaking in the 750MB area, you need about 1.5GB for a dual-core just for the Cosmo apps. Then add any for your other projects on top since they will all be kept in memory when they suspend as well. I run a couple projects on one machine including Cosmo, but I made sure the other project uses a lot less RAM per task.
ID: 11431 · Report as offensive     Reply Quote
k_suresh

Send message
Joined: 25 Sep 12
Posts: 3
Credit: 2,940
RAC: 0
Message 12195 - Posted: 29 Nov 2012, 16:23:21 UTC

I too faced the same problem where C@H tasks seem to run for days altogether. Once it crosses 50%, it seems to fall back to 42% and start over again. An initial task of 9 hours or so ran close to 36 hours (4 X). Sensing some problem, I had to abort the tasks. This is second time, I did so. I'm looking for more fine-grained tasks that are shorter but effective.
ID: 12195 · Report as offensive     Reply Quote

Forums : Technical Support : Tasks will not complete