Advanced search

Forums : Wish list : Thoughts on longer, check-pointed WUs
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 575 - Posted: 29 Jun 2007, 15:40:52 UTC

This is all hypothetical, of course.

Let's say, hypothetically, if we were to make longer WUs (say, at least 1.5 hours) that were check-pointed every 15 mins or so, would there be more interest in these than the current WUs?
Scott Kruger
Project Administrator, Cosmology@Home
ID: 575 · Report as offensive
Nvgnte
Avatar

Send message
Joined: 24 Jun 07
Posts: 49
Credit: 550,216
RAC: 2,796
Message 577 - Posted: 29 Jun 2007, 15:56:21 UTC

Hmm... Hipothetically? :)

We are (pre)alpha testers here - my (our) interest is to help you sorting this project out ;)

Check-pointing is neccesary with longer WU because it's a way to backup the time spent with a WU in case of a crash, power failure or so on - but on this stage of the project, what's important is to test any possible way out; sometimes you'll need longer WUs, sometimes shorters

In the future, when all goes well, I would prefer longer WUs - the trafic load to server would be reduced, also
La Tierra de un Dios que no supo aceptar / su falso derecho a la libertad - Tierra Santa
Descárgate mi primer eBook Amaneceres
ID: 577 · Report as offensive
rbpeake

Send message
Joined: 27 Jun 07
Posts: 118
Credit: 61,883
RAC: 0
Message 579 - Posted: 29 Jun 2007, 16:02:11 UTC - in response to Message 575.  

This is all hypothetical, of course.

Let's say, hypothetically, if we were to make longer WUs (say, at least 1.5 hours) that were check-pointed every 15 mins or so, would there be more interest in these than the current WUs?


That's fine! Either way is OK. Load to servers would of course be reduced with longer work units, but that of course is your determination! We are here to test whatever you like!

In general, looking at other projects, an increase is fine maybe up to a max of one or two units per day. But some projects run for days per each unit, so it is really up to you in terms of what is most efficient for the project, as long as check pointing is enabled.
ID: 579 · Report as offensive
Profile Saenger
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 110
Credit: 282,157
RAC: 0
Message 580 - Posted: 29 Jun 2007, 16:02:31 UTC

I agree with Alcaudon, for the same reasons.

One advantage of the short WUs is on the other hand that it would be easy to give a fixed amount of credit per WU, as it will average a lot faster on a bunch of short ones than on some few long ones.

But on the other hand: 1.5h is not long ;) 24h or more is long!
Grüße vom Sänger
ID: 580 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 583 - Posted: 29 Jun 2007, 16:23:51 UTC - in response to Message 580.  

But on the other hand: 1.5h is not long ;) 24h or more is long!

My mistake =)

Keep in mind that I have no frame of reference for this sort of thing. The only experience I've had with BOINC was using Seti@Home in high school, many years ago.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 583 · Report as offensive
Profile Jayargh
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 25 Jun 07
Posts: 508
Credit: 2,282,158
RAC: 0
Message 584 - Posted: 29 Jun 2007, 16:34:59 UTC

I'm in total agreement with all 3 posters so far...Saenger brings up the good point that if the credit system isn't in place to be uniform in credits granted then the longer the wu's the more the impact ...if you had say 4 hour wu's now with quorum of 2 low claim granted ,some linux clients might only claim 10-12 credits which would be a killer so the 2 different subjects need to be taken together.

No apolgies needed Scott :) Most everyone is here to help you get your project off the ground.

Current Einstein units take 10-40 hours....Nano-hive had units taking 80-100 hours and have heard QMC has monsters too...CPDN models take 2-5k hours... so 1.5 hrs is pretty short in relative terms.
ID: 584 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 591 - Posted: 29 Jun 2007, 19:37:13 UTC - in response to Message 584.  

I'm in total agreement with all 3 posters so far...Saenger brings up the good point that if the credit system isn't in place to be uniform in credits granted then the longer the wu's the more the impact ...if you had say 4 hour wu's now with quorum of 2 low claim granted ,some linux clients might only claim 10-12 credits which would be a killer so the 2 different subjects need to be taken together.

No apolgies needed Scott :) Most everyone is here to help you get your project off the ground.

Current Einstein units take 10-40 hours....Nano-hive had units taking 80-100 hours and have heard QMC has monsters too...CPDN models take 2-5k hours... so 1.5 hrs is pretty short in relative terms.

Well that helps put everything into perspective. I thought that several hours might be too long for a computation, but I suppose not.

Actually, we want to push the accuracy to a higher level in order to get meaningful results. This might mean that we'll eventually have WUs taking something like 5-10 hours to complete.

We'll have to see, though.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 591 · Report as offensive
rbpeake

Send message
Joined: 27 Jun 07
Posts: 118
Credit: 61,883
RAC: 0
Message 594 - Posted: 29 Jun 2007, 20:20:18 UTC - in response to Message 591.  

...Well that helps put everything into perspective. I thought that several hours might be too long for a computation, but I suppose not.

Actually, we want to push the accuracy to a higher level in order to get meaningful results. This might mean that we'll eventually have WUs taking something like 5-10 hours to complete.

We'll have to see, though.


5-10 hours is no problem, even slower machines could handle those I would guess in a day or so which is fine! And meaningful results are what it's all about, so we are certainly behind you on that! :)

Just a thought, you may be surprised how many people may eventually sign up for this project, so please keep your server capabilities in mind, they will hopefully have a very heavy load which longer work units could help alleviate! ;)

ID: 594 · Report as offensive
Profile Scott
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 1 Apr 07
Posts: 662
Credit: 13,742
RAC: 0
Message 596 - Posted: 29 Jun 2007, 20:36:27 UTC - in response to Message 594.  

5-10 hours is no problem, even slower machines could handle those I would guess in a day or so which is fine! And meaningful results are what it's all about, so we are certainly behind you on that! :)

Just a thought, you may be surprised how many people may eventually sign up for this project, so please keep your server capabilities in mind, they will hopefully have a very heavy load which longer work units could help alleviate! ;)

Yes, this is a worrying thought. Right now, C@H runs on a single (albeit powerful) server, which also runs our group website. I've noticed that some projects have several servers to handle project tasks, which leads me to believe that we may be under-equipped.

We'll just have to play it by ear, I suppose.
Scott Kruger
Project Administrator, Cosmology@Home
ID: 596 · Report as offensive
Profile [B^S] Acmefrog
Volunteer tester
Avatar

Send message
Joined: 8 Jun 07
Posts: 175
Credit: 446,074
RAC: 0
Message 599 - Posted: 29 Jun 2007, 21:53:05 UTC - in response to Message 575.  

This is all hypothetical, of course.

Let's say, hypothetically, if we were to make longer WUs (say, at least 1.5 hours) that were check-pointed every 15 mins or so, would there be more interest in these than the current WUs?


I'm fine either way. Short ones are nice because I can see my pc burning through them quickly but longer ones are fine as well if they checkpoint. As JRenkar mentioned, NanoHive's WUs drove me nuts. It was a crap shoot for me. Sometimes after several hours, it would crash and it would have to start over. That was frustrating. Docking's WUs took around 2 hours to complete and checkpointed. I had no problems with it. As was mentioned were are here to help you.
ID: 599 · Report as offensive
Profile Ageless
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 Jun 07
Posts: 345
Credit: 50,500
RAC: 0
Message 604 - Posted: 29 Jun 2007, 22:16:31 UTC

The problem with sending out results that 'take 1:30.00' is that it is impossible to do so. As maybe your own computer may take 90 minutes to do them, but fast computers do them in 30 minutes, while slow computers do them in 3 hours to 24 hours, or more if Astro puts his Pentium 60 on here to test. ;)

Checkpoints isn't bad, no matter how long they take between CPs in the unit, per CPU-time (not wall-clock time).

Why not test them out on us?
Jord.
ID: 604 · Report as offensive
Profile KSMarksPsych
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 92
Credit: 57,682
RAC: 0
Message 619 - Posted: 30 Jun 2007, 11:25:21 UTC

Jord brings up a good point. I'd say an hour and a half on an "average" machine (what is average these days??) is doable even without checkpoints. As more people phase in the latest version of BOINC (>5.8), it becomes somewhat of a non-issue as apps without checkpoints will run to completion.

And for those of who've successfully run CPDN models, 1.5 hours is a quick one :)

But to give you an idea.

Rosetta has user configured run times from 1 to 24 hours. At one point there were some QMC and RCN units that took hundreds of hours. Predictor can run into the dozens of hours, same with Einstein. PrimeGrid has some short ones (about 5 minutes on a P4 2.8). Can't remember others off the top of my head.
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 619 · Report as offensive
Profile Saenger
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 110
Credit: 282,157
RAC: 0
Message 620 - Posted: 30 Jun 2007, 11:44:30 UTC

I'd expect this 1.5h WUs take about 2.5h on my old machine, but that's not long imho. But checkpoints would be appreciated. Non-checkpointed WUs will always loose if the puter is stopped while not finished with it. The lomger the WU, the more loss is possible. Most of us ATAs here probably have their puters on 24/7. we will not recognize it, but the "normal" cruncher will crunch some futile rounds.

The definition of the length will probably differ between different OS/CPU setups as well, despite the same benchmark, as the application will suit one setup more than the other, atm it seems that Linux is better suited for crunching here, if I read the other thread correct.

I'd say the actual WU size doesn't require checkpoints, it takes about 15min on average on my machine, that's fine with me.
Grüße vom Sänger
ID: 620 · Report as offensive
Profile [B^S] Gamma^Ray
Volunteer tester
Avatar

Send message
Joined: 7 Jun 07
Posts: 47
Credit: 70,587
RAC: 0
Message 689 - Posted: 2 Jul 2007, 16:37:10 UTC

I personally would rather see work units that are more uniform with each other, i.e. in or around an hour each as the ceiling for them. This is basically just a personal preference as acme said, I tend to be a micro-manager with BOINC, And so like to actually see something happening with the work units, As opposed to the longer ones that just sit there for hours on end. I think it depends really on the users, As with some you have users that are using Pc's that are maybe checked once a day or similar, Just to make sure there is work on them for the next 24 hours or so. With them the preference would be longer work units. I would say IMO., that for the users main pc at home that they use/crunch on, They would rather smaller ones based on the opposite reason as they check on it often to see its status, And thus probably like seeing quicker turn arounds and or results.

What a really dislike is projects that send out work units that can be 30 minutes to 30 hours without knowing beforehand which one you will get, Those projects I generally stay away from as being in the minority, I do turn my pc off at night occasionally(Thus also love checkpointing), And always feel like I am "stuck" with this long running monster that ties up my one core for alot longer than expected. Especially when you don't know how long it will run at all such as some of those Nano's.

Regardless checkpointing is a useful tool no matter who long the work units are, Especially when you have to factor in the BOINC Client itself will usually switch over after a predetermined amount of time to another projects work unit based on the users own preferences for the amount of time before switching, And also the Resource Share that the user has set up for each project. The default is usually an hour I believe for the client to switch over and so if there is no checkpointing at that point, Then the work unit is lost.

Again this is just my own personal feelings on BOINC and what projects I do tend to crunch on, As of now my Pc farm is only one dual core as my other slower p4 is unavailable as of now, And really of not much help when it is online as it is an ME. :( But as has been said, This is your ballpark to run as you see fit that works best with you/your results. Whether its a 3 inning game, Or a 9 inning one. hehe But for me the 20/30 minute ones a great. :)

Regards
G^R
Windows-XP-Pro, AMD 3800X2, 5.10.28
ID: 689 · Report as offensive
Profile [AF>Futura Sciences>Linux] Thrr-Gilag
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 32
Credit: 145,576
RAC: 0
Message 811 - Posted: 4 Jul 2007, 11:46:03 UTC

If checkpoints are avaible, no matter how long the WU would take.

With longer Wu you will reduce the load on your server.

But It will drasticaly raise the Deadline in order to permit old computers to complete one WU... especialy if they're working only a few hours a day.
------
Thrr-Gilag Kee'rr

L'Alliance Francophone
ID: 811 · Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 25 Jun 07
Posts: 77
Credit: 4,853,984
RAC: 0
Message 900 - Posted: 10 Jul 2007, 2:25:52 UTC

IMO, all results should either complete *or* checkpoint in a very short time (15-30 min), to guard against power (or HW, or whatever) failure.

Also, as long as there is checkpointing, the run length makes no difference to me, to an extent. Consistent run times are more important, so that slow machines don't over-download results. That can lead to missed due dates.

However, even *with* checkpoints, things go wrong and work lost. Look at Climate Prediction. I have had tasks run for months, only to crap out due to a power failure. In their case, they issue credits every percent completed (or something like that) via "trickle" credits. So the to-date credits aren't lost, but the completed work is. Climate Prediction say that they can use partially run simulation, so not all is lost I guess. I am not sure if partially completed tasks are helpful here.

Not that "trickle" credits are needed here. Just understand that if you make the tasks *too* long, and something goes wrong, some people will get upset that they lost out on all that credit. And the project will lose out on the work too. So I suggest crunch times of a day or less (on a P4) to minimize loss exposure.
Dublin, CA
Team SETI.USA
ID: 900 · Report as offensive
Profile Webmaster Yoda

Send message
Joined: 28 Jun 07
Posts: 21
Credit: 1,632,327
RAC: 0
Message 901 - Posted: 10 Jul 2007, 2:43:40 UTC
Last modified: 10 Jul 2007, 2:44:31 UTC

IMHO Checkpointing should be implemented if result run length goes beyond the default "save to disk at most every" setting on an average Pentium 4 or equivalent.

As for run length, I don't particularly like work units that run for longer than a couple of hours. Things can go wrong and work units crash. Losing an hour's work occasionally is not a big issue but I hate it when my machines have been crunching for many hours, only to have a work unit crash and burn. Those results are (in most cases) no use to the project either.

Depending on how your application works, running longer might also mean higher memory requirements?
ID: 901 · Report as offensive
Profile KSMarksPsych
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 92
Credit: 57,682
RAC: 0
Message 904 - Posted: 10 Jul 2007, 11:23:41 UTC - in response to Message 901.  

IMHO Checkpointing should be implemented if result run length goes beyond the default "save to disk at most every" setting on an average Pentium 4 or equivalent.

...



Do you mean the project switch interval? Default on the write to disk is 60 seconds.

Or am I misreading/missing something?
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 904 · Report as offensive
Profile Webmaster Yoda

Send message
Joined: 28 Jun 07
Posts: 21
Credit: 1,632,327
RAC: 0
Message 905 - Posted: 10 Jul 2007, 11:30:44 UTC - in response to Message 904.  

Do you mean the project switch interval? Default on the write to disk is 60 seconds.

Or am I misreading/missing something?


Ah, I got confused... Thought the default was 10 minutes (600 seconds).

Still, checkpointing every 10 or so minutes wouldn't hurt. Computers crash, get switched off, rebooted etc.
ID: 905 · Report as offensive
Profile sysfried

Send message
Joined: 24 Jun 07
Posts: 114
Credit: 5,296,905
RAC: 14
Message 982 - Posted: 13 Jul 2007, 8:37:43 UTC - in response to Message 905.  

Do you mean the project switch interval? Default on the write to disk is 60 seconds.

Or am I misreading/missing something?


Ah, I got confused... Thought the default was 10 minutes (600 seconds).

Still, checkpointing every 10 or so minutes wouldn't hurt. Computers crash, get switched off, rebooted etc.


agreed. But that would make sense if the workunits would run longer than 15 min in average.

What I find quite interesting is that my P4 dual core 2.4 GHz has only about 1/3 of the cosmo crunching power compared to my 1.6 GHz Xeon (woodcrest)

greetings,

sysfried

Happy member of Team: Planet 3D Now!

ID: 982 · Report as offensive
1 · 2 · Next

Forums : Wish list : Thoughts on longer, check-pointed WUs