Advanced search

Forums : General Topics : 100+ Hour Workunits
Message board moderation

To post messages, you must log in.

AuthorMessage
Chris

Send message
Joined: 18 Nov 11
Posts: 1
Credit: 58,380
RAC: 0
Message 9856 - Posted: 17 Jan 2012, 0:53:08 UTC

I'm not posting this to gripe. I'm posting this as feedback. I recently (as in less than 1 hour ago), aborted all C@H work units, disabled new work units, and disconnected C@H from my BOINC account manager.

The reason is that over the past few weeks, I have consistently received work units that take in excess of 48 hours to complete. Even moreso, at the time of the abort, I had a machine claiming 70-something percent completion on a work unit that logged 120+ hours.

I've searched and saw another thread where "from the administrator's point of view it's better for long work units to reduce the load on the server." That administrator-interest has to be balanced against the user-interest that the user's machine is actually accomplishing something. At 120 hours, what assurance does the user have that the job *will complete* and is not stuck in a checkpoint-problem infinite loop?

I am also aware of the "Leave application in memory while suspended." I do not want to do that. None of the other projects I crunch for require that option to work properly. As a compromise, I increased the standard work time from 1 hour to 3 hours to allow each task more time to reach a checkpoint.

Prior to my disconnect from C@H, I was actively crunching for 5 projects:
1. Cosmology (~31% resource share)
2. Milkyway (~31% resource share)
3. Einstein (~15% resource share)
4. SAT (~15% resource share)
5. Rosetta (~8% resource share)

The computer in question: Intel Core2 Quad @ 2.4GHz with 4GiB RAM.

When compared to other projects that (on average) complete a job in 4-8 hours, 48+ hours takes some faith. 120+ hours takes *blind* faith.

I'm not saying that C@H needs to behave like other projects. C@H can do whatever it wants. The consequence is, there will be users like myself that disagree. So I'm leaving this feedback so the admins can decide whether this is an "acceptable loss" or if something needs to be changed.

Again, this message is not to gripe but to explain why I disconnected from the project. I'm not a credit-chaser, but I do care that the computer time I donate is not wasted. If the C@H application(s) cannot reach a checkpoint in 3 hours worth of processing time on the above processor, then I disagree with the program author's decisions regarding processing-to-checkpointing ratio. And because this is a recurring problem (a 140+ hour work unit on a quad core AMD processor a few weeks back), I choose to divert my processor cycles to projects where this doesn't happen; where there is no uncertainty that I'm getting a useful-work-to-processor-cycle ratio closer to 1.

I wish the best of luck to the C@H project.
ID: 9856 · Report as offensive     Reply Quote
Profile cykodennis

Send message
Joined: 31 May 10
Posts: 234
Credit: 4,896,378
RAC: 0
Message 9857 - Posted: 17 Jan 2012, 6:21:47 UTC

Sorry to hear about that.
When i`m looking at your Data, i see that you had some headaching WUs indeed.
But i think you delivered the explaination for that already.

See, if you configure your system like that:
31 percent for cosmo
no leaving the app in memory while suspended
switching every three hours

your reached performance is under this circumstances (in my opinion) not very surprising for cosmo.

I guess, the reason for the checkpointing issue lies not only in administrator-interest. Maybe Ben would be so kind to explain this further.
But the cosmoguys would be real fools if they would configure cosmo like that, just for fun. Maybei C@H is running of an IT-infrastructure which does not belong to C@H exclusively.

Anyway, its a pity that you have leaved the project.
On the other hand it`s a good example for other users, how to not configure their boinc-clients. :)

I wish you the very best.
ID: 9857 · Report as offensive     Reply Quote
.clair.

Send message
Joined: 4 Nov 07
Posts: 604
Credit: 10,881,302
RAC: 0
Message 9870 - Posted: 19 Jan 2012, 20:22:35 UTC - in response to Message 9857.  

no leaving the app in memory while suspended

<snip>

On the other hand it`s a good example for other users, how to not configure their boinc-clients. :)


not leaving the app in memory while suspended is the main problem. . . . .
All it dose is use swap space / virtual ram.
ID: 9870 · Report as offensive     Reply Quote

Forums : General Topics : 100+ Hour Workunits