Advanced search

Forums : News : New beta app for analyzing Planck data
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile DoctorNow
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 24
Credit: 203,321
RAC: 0
Message 20801 - Posted: 2 Feb 2016, 10:45:08 UTC
Last modified: 2 Feb 2016, 10:46:57 UTC

Resetting or reattaching didn't help either, whole thing seems still running with only one core. I aborted the tasks and giving up for the time being, wasted too much time on this.
While aborting an older running task this morning there was a log created, however aborting the current one didn't create a log of this sort.

Still wondering why this machine is getting work, while my other one doesn't get any at all... there are no different settings etc. from my side what could cause this.
Life is Science, and Science rules. To the universe and beyond
Member of BOINC@Heidelberg
My BOINC-Stats
ID: 20801 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20802 - Posted: 2 Feb 2016, 11:36:20 UTC - in response to Message 20801.  

Resetting or reattaching didn't help either, whole thing seems still running with only one core. I aborted the tasks and giving up for the time being, wasted too much time on this.

Yea sorry this may be the same compute error I'm seeing a lot. I've made some progress on a possible fix, I may be able to test it out today.

While aborting an older running task this morning there was a log created, however aborting the current one didn't create a log of this sort.

There's a bug on the website I noticed yesterday where some logs aren't showing up right. This should be fixed soon.

Still wondering why this machine is getting work, while my other one doesn't get any at all... there are no different settings etc. from my side what could cause this.

I glanced at the logs for this machine, it looks like the server is correctly seeing it as 64-bit with VBox installed and VT-x-capable. Do you have VT-x enabled? If so, my guess is the machine is a victim of this bug.
ID: 20802 · Report as offensive     Reply Quote
Profile DoctorNow
Volunteer tester
Avatar

Send message
Joined: 22 May 07
Posts: 24
Credit: 203,321
RAC: 0
Message 20804 - Posted: 2 Feb 2016, 17:04:03 UTC - in response to Message 20802.  
Last modified: 2 Feb 2016, 17:31:34 UTC

Do you have VT-x enabled? If so, my guess is the machine is a victim of this bug.

Looks like this was indeed the problem, deleting the entry in the client_state.xml did the trick to get work.
However, I've now run into the "Postponed: VM Hypervisor failed" problem mentioned in one of the other threads. The percentage was crunching nicely for 7 minutes before it stopped, but according to the task manager there was no CPU usage of the VM at all.

Edit:
Looks like my Vista machine got it working finally, there's a unit crunching with full CPU usage now! I'll upgrade my Win 7 host with 5.0.14 now, that probably seems to work.

Edit 2:
Yep, it helped. Both comps are now crunching correctly! Yay.
Life is Science, and Science rules. To the universe and beyond
Member of BOINC@Heidelberg
My BOINC-Stats
ID: 20804 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20805 - Posted: 2 Feb 2016, 20:49:44 UTC

Ok, I think I've got the compute errors largely fixed, the success rate is looking very good right now (might take a little while to flush the old jobs out of everyone's system).

The empty logs should also be showing up now.
ID: 20805 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20807 - Posted: 2 Feb 2016, 20:57:58 UTC - in response to Message 20805.  

Fun fact, it might have taken me significantly longer to fix this (pretty nasty) compute error had I not run into this thread and had that guy not answered his own question three years after posting it! Thanks, wherever you are!
ID: 20807 · Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 10 Feb 13
Posts: 9
Credit: 244,929
RAC: 0
Message 20809 - Posted: 2 Feb 2016, 22:18:27 UTC - in response to Message 20807.  

Fun fact, it might have taken me significantly longer to fix this (pretty nasty) compute error had I not run into this thread and had that guy not answered his own question three years after posting it! Thanks, wherever you are!


OK. Terrific!

So having read this I thought I'd try a brand new Windows machine.

The experience was like an episode of Twilight Zone.

I downloaded 24 tasks. So far, so good.
I suspended all the other CPU projects because I am impatient like that. One C@H task started with all 4 CPU cores. So far, so good.
Now things start getting weird...
First all the tasks are marked "Postponed: Please upgrade BOINC to the latest version". You what?? I'm on 7.6.22!!
Additional weirdness is all the tasks in BOINC have between 5 and 7 seconds elapsed time.
Then I figure I'd look at all my tasks here. Apparently all 24 are in progress. Hmm. What about that task that was in progress (at least 3 minutes). I check BOINC but I have only 23!!
I wait a few minutes, still 23 on BOINC, 24 in progress. Weirder and weirder.
Eventually #24 shows up as Error while computing (apparently 10 minutes of that)
Here it is: http://www.cosmologyathome.org/result.php?resultid=36322953
I'm still bamboozled as to the Latest Version of BOINC.

FWIW: System is Win 10 + BOINC 7.6.22 + whatever Vbox version comes with it from the download page.
ID: 20809 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20810 - Posted: 2 Feb 2016, 22:55:36 UTC - in response to Message 20809.  

Hi Steve, based on that description and log I'm thinking you're also hit by this?
ID: 20810 · Report as offensive     Reply Quote
Profile Steve Hawker*

Send message
Joined: 10 Feb 13
Posts: 9
Credit: 244,929
RAC: 0
Message 20811 - Posted: 3 Feb 2016, 1:20:03 UTC - in response to Message 20810.  

Hi Steve, based on that description and log I'm thinking you're also hit by this?


Well no. What I was hit by was a large concrete block of stupidity on my part. I thought I'd enabled virtualization, but I had not. Pays to double check eh?

No need to detach and reattach, WUs sitting there quite happily while one of them crunches on. So far, so good.

Thanks Marius!

S.
ID: 20811 · Report as offensive     Reply Quote
Profile R.J.Bingham

Send message
Joined: 3 Jan 16
Posts: 4
Credit: 87,991
RAC: 0
Message 20820 - Posted: 6 Feb 2016, 9:21:12 UTC
Last modified: 6 Feb 2016, 9:23:27 UTC

Hi,
Running Boinc on my MacBook Pro. I have completed and validated several Plank_Param jobs. But the most recent one has now run for 13 hours. The remaining estimated time is 3days 17hours (this is constantly increasing).
Should I kill this job? plank_param_sims_30_2150_141_2

Thanks
Richard
ID: 20820 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20821 - Posted: 6 Feb 2016, 21:44:29 UTC - in response to Message 20820.  
Last modified: 6 Feb 2016, 21:45:06 UTC

Hi,
Running Boinc on my MacBook Pro. I have completed and validated several Plank_Param jobs. But the most recent one has now run for 13 hours. The remaining estimated time is 3days 17hours (this is constantly increasing).
Should I kill this job? plank_param_sims_30_2150_141_2

Thanks
Richard

Go ahead and abort, that'll send us log so I can take a look what's going on.

Also, as its been going, is your CPU being used or no?
ID: 20821 · Report as offensive     Reply Quote
Profile R.J.Bingham

Send message
Joined: 3 Jan 16
Posts: 4
Credit: 87,991
RAC: 0
Message 20822 - Posted: 7 Feb 2016, 0:36:35 UTC

ok, I aborted it.
CPU was being used.

Thanks,
Rich, UK.
ID: 20822 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20826 - Posted: 10 Feb 2016, 11:22:45 UTC - in response to Message 20822.  
Last modified: 10 Feb 2016, 11:45:53 UTC

ok, I aborted it.
CPU was being used.

Thanks,
Rich, UK.

Sorry for the long delay responding. The error seems to be related to this message which a few others are seeing as well "VM is paused due to host power management". Basically the job was suspended (either by you by hand or b/c of your BOINC settings), then when it went to resume your computer won't let it because of some power saving thing. There shouldn't be anything wrong with this, but something goes wrong and the job then gets stuck in some weird state where it never actually finishes.

Any chance tinkering with your power usage settings fixes it, while we look into a solution?
ID: 20826 · Report as offensive     Reply Quote
Henrik

Send message
Joined: 7 Dec 15
Posts: 7
Credit: 164,248
RAC: 0
Message 20840 - Posted: 11 Feb 2016, 18:14:17 UTC

Hi Marius,
Since v1.02 I am getting some tasks aborted with the error "202 (0xca) EXIT_ABORTED_BY_PROJECT", one example task (http://www.cosmologyathome.org/result.php?resultid=36605717). I finished over 1200 tasks over the last weeks and never had this problem with v1.00 or 1.01, it started happening with v1.02 and appears every few hours for some tasks.
ID: 20840 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20841 - Posted: 11 Feb 2016, 19:12:43 UTC - in response to Message 20840.  

Hi Marius,
Since v1.02 I am getting some tasks aborted with the error "202 (0xca) EXIT_ABORTED_BY_PROJECT", one example task (http://www.cosmologyathome.org/result.php?resultid=36605717). I finished over 1200 tasks over the last weeks and never had this problem with v1.00 or 1.01, it started happening with v1.02 and appears every few hours for some tasks.

Yep, see here sorry about that! :)
ID: 20841 · Report as offensive     Reply Quote
ku4jb

Send message
Joined: 6 Oct 12
Posts: 2
Credit: 207,514
RAC: 0
Message 20866 - Posted: 19 Feb 2016, 14:55:25 UTC

FWIW, I've struggled for some time now trying to get tasks to run on an older W10 preview machine with 4GB of memory. I tried every combination of vbox versions and BM versions available with no success until now. By changing the vm's memory settings in the lsplitsims_1.02_vbox_job.xml from 2048 to 3193 they started running. This while running vbox 5.0.10 and BM 7.6.23, I've not tried any other combination as I finally got it to run.. :)
ID: 20866 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 20867 - Posted: 19 Feb 2016, 15:10:34 UTC - in response to Message 20866.  
Last modified: 19 Feb 2016, 15:10:49 UTC

FWIW, I've struggled for some time now trying to get tasks to run on an older W10 preview machine with 4GB of memory. I tried every combination of vbox versions and BM versions available with no success until now. By changing the vm's memory settings in the lsplitsims_1.02_vbox_job.xml from 2048 to 3193 they started running. This while running vbox 5.0.10 and BM 7.6.23, I've not tried any other combination as I finally got it to run.. :)

Interesting, thanks for tinkering and for reporting back. Can you tell me the hostid of this machine, so I can take a look at the logs for some of the failed attempts? I can't see why increasing the memory would have any effect.
ID: 20867 · Report as offensive     Reply Quote
ku4jb

Send message
Joined: 6 Oct 12
Posts: 2
Credit: 207,514
RAC: 0
Message 20868 - Posted: 19 Feb 2016, 23:01:18 UTC

Sorry for not including hostid originally.

hostID: 280955
ID: 20868 · Report as offensive     Reply Quote
Previous · 1 · 2

Forums : News : New beta app for analyzing Planck data