1) Forums : General Topics : 100+ Hour Workunits (Message 9856)
Posted 17 Jan 2012 by Chris
Post:
I'm not posting this to gripe. I'm posting this as feedback. I recently (as in less than 1 hour ago), aborted all C@H work units, disabled new work units, and disconnected C@H from my BOINC account manager.

The reason is that over the past few weeks, I have consistently received work units that take in excess of 48 hours to complete. Even moreso, at the time of the abort, I had a machine claiming 70-something percent completion on a work unit that logged 120+ hours.

I've searched and saw another thread where "from the administrator's point of view it's better for long work units to reduce the load on the server." That administrator-interest has to be balanced against the user-interest that the user's machine is actually accomplishing something. At 120 hours, what assurance does the user have that the job *will complete* and is not stuck in a checkpoint-problem infinite loop?

I am also aware of the "Leave application in memory while suspended." I do not want to do that. None of the other projects I crunch for require that option to work properly. As a compromise, I increased the standard work time from 1 hour to 3 hours to allow each task more time to reach a checkpoint.

Prior to my disconnect from C@H, I was actively crunching for 5 projects:
1. Cosmology (~31% resource share)
2. Milkyway (~31% resource share)
3. Einstein (~15% resource share)
4. SAT (~15% resource share)
5. Rosetta (~8% resource share)

The computer in question: Intel Core2 Quad @ 2.4GHz with 4GiB RAM.

When compared to other projects that (on average) complete a job in 4-8 hours, 48+ hours takes some faith. 120+ hours takes *blind* faith.

I'm not saying that C@H needs to behave like other projects. C@H can do whatever it wants. The consequence is, there will be users like myself that disagree. So I'm leaving this feedback so the admins can decide whether this is an "acceptable loss" or if something needs to be changed.

Again, this message is not to gripe but to explain why I disconnected from the project. I'm not a credit-chaser, but I do care that the computer time I donate is not wasted. If the C@H application(s) cannot reach a checkpoint in 3 hours worth of processing time on the above processor, then I disagree with the program author's decisions regarding processing-to-checkpointing ratio. And because this is a recurring problem (a 140+ hour work unit on a quad core AMD processor a few weeks back), I choose to divert my processor cycles to projects where this doesn't happen; where there is no uncertainty that I'm getting a useful-work-to-processor-cycle ratio closer to 1.

I wish the best of luck to the C@H project.