Advanced search

Message boards : Technical Support : Stalled tasks

Author Message
David
Send message
Joined: 8 Oct 11
Posts: 2
Credit: 1,711,760
RAC: 0
Message 21417 - Posted: 6 May 2017, 14:11:12 UTC

Have two legacy 2.17 tasks running but going nowhere. Percent complete is stalled but status is 'running'.
Been like this a few weeks. My points average has gone from over 330 to 50!
____________

Paulie
Send message
Joined: 1 Dec 11
Posts: 6
Credit: 607,181
RAC: 315
Message 21458 - Posted: 30 May 2017, 12:19:54 UTC - in response to Message 21417.

I also have the 2.17 tasks running with the remaining time slowly increasing. Should I abort the 2 ?
____________

UnionJack
Send message
Joined: 8 Jan 10
Posts: 5
Credit: 1,879,787
RAC: 0
Message 21459 - Posted: 31 May 2017, 12:26:38 UTC

I've had no C@H jobs finish for nearly a month now. The log has always said "Not fetching jobs: don't need" when I looked. Seven other projects continue to run normally, some of those also using VBox.

Twice in the last week I've reset the project, and today I disconnected it and reconnected. The one job I received finished with a computation error.

I noticed that, although the manager said it was running on all 12 cores, % CPU was down to single figures. I wrote a local app_config.xml to limit it to one core. That was before I dis- and re-connected today.

Something isn't right, but what? Beats me...
____________
Rgds
Peter.

Uzzy Booboo
Send message
Joined: 8 Apr 08
Posts: 1
Credit: 374,245
RAC: 1
Message 21517 - Posted: 20 Sep 2017, 21:06:49 UTC

I've been in the same boat for a couple weeks now. 'Time Remaining' goes up a second or two every couple of minutes and if I pause it while on battery, it jumps 8-9 hours for a bit then goes back to the 23-25hr mark.

Cancelled the tasks and even reinstalled BOINC without a change.

What's odd is that one of the three tasks that came down last week finished.

negrada
Send message
Joined: 12 Aug 17
Posts: 1
Credit: 121,559
RAC: 259
Message 21518 - Posted: 21 Sep 2017, 10:40:12 UTC

I'd advise to forget about running 2.17 tasks in Windows. I have hosts running Windows and Linux and I noticed that I can only get a stable behaviour if I only run camb_boinc2docker v2.04 in Windows and only run camb_legacy v2.17 in Linux.

Graham Gill
Send message
Joined: 13 Nov 17
Posts: 1
Credit: 24,970
RAC: 910
Message 21590 - Posted: 9 Dec 2017, 16:52:06 UTC

I have camb_legacy 2.17 tasks running and finishing under Linux and Windows. I have camb_boinc2docker tasks running and finishing under Linux and Windows with Virtualbox. (Although, sometimes I get "VM unmanageable, trying again later" warnings, but the tasks do complete "later".)

However, I have two camb_legacy 2.17 tasks on Windows that display the behaviour described in this thread. Time elapsed increases normally, % done crawls up very slowly, and estimated time remaining decreases only by a few minutes in 24 hours of run time. (One task is always around 9h 11m remaining, the other 8h 50m remaining.) If I hibernate my machine or exit Boinc manager and stop tasks, when I restart, these two tasks will have lost any % done gains since the last time: one always returned to 91% done and the other to 88.2% done. On top of this, the time elapsed will also reset to a lower figure, losing many hours. The time remaining estimate will return to the numbers above: 9h 11m or 8h 50m remaining, respectively.

I aborted one of the two tasks a few days after the Deadline. The other is still running:
task name: wu_112717_103414_0_0_0
task id: 61007043
work unit: 46175852

Likely I'll abort this one too.

Jim1348
Send message
Joined: 17 Nov 14
Posts: 51
Credit: 2,415,981
RAC: 2,662
Message 21591 - Posted: 9 Dec 2017, 17:05:39 UTC - in response to Message 21459.
Last modified: 9 Dec 2017, 17:57:29 UTC

I've had no C@H jobs finish for nearly a month now. The log has always said "Not fetching jobs: don't need" when I looked. Seven other projects continue to run normally, some of those also using VBox.

I ran boinc2docker 2.04 on five cores (as set by an app_config), and LHC/Atlas on two cores (as set in the project settings) for a couple of weeks with no problems. That was on an i7-4790 running Ubuntu 16.04 (VBox 5.1.30 and BOINC 7.8.3). The other core was reserved to support a GPU. I had set the resource shares accordingly so that neither project would run out of tasks.

Then, I just stopped getting boinc2docker with the message that "no jobs are available", even though the server status showed 500 jobs. This happened over a period of several days, and so I finally detached from Cosmology. Something is wrong, though it may be due to how VirtualBox interfaces with BOINC.

Message boards : Technical Support : Stalled tasks