Advanced search

Forums : Technical Support : One boinc2docker v2.00 job stuck on 0.1% for 10 minutes
Message board moderation

To post messages, you must log in.

AuthorMessage
Phoneman1

Send message
Joined: 5 Nov 07
Posts: 113
Credit: 3,100,327
RAC: 0
Message 21091 - Posted: 6 Jun 2016, 12:19:35 UTC

Marius asked for some feeback on the v2.00 jobs first issued last week. They seem to be very stable and consistent and my machines have processed over two hundred of these boinc2docker jobs without any noticeable drama. I was therefore surpised to see one task stuck at 0.1% progress for over ten minutes this morning. I suspect it would have hit the 100 minute job limit if I had not intervened. I just suspended the input queue and waited for the short running GPU job from another project to finish before suspending this task, quitting boinc and rebooting. As I half expected it then went on to proecess normally.

I've isolated two bits of the job log which I think are quite telling; from the first attempt (note the time of the D-Bus message and the next):


2016-06-06 10:04:34 (4508): Guest Log: Running boinc_app...
2016-06-06 10:04:34 (4508): Guest Log: Importing Docker image from BOINC...
2016-06-06 10:04:44 (4508): Guest Log: 00:00:10.049290 vminfo Error: Unable to connect to system D-Bus (3/3): D-Bus not installed
2016-06-06 10:14:55 (4508): VM state change detected. (old = 'running', new = 'paused')
2016-06-06 10:15:15 (4508): Powering off VM.
2016-06-06 10:15:16 (4508): Successfully stopped VM.


And at the same point in the re-started part of the log (again note the time of the D-Bus message and the next):


2016-06-06 10:18:34 (84): Guest Log: Running boinc_app...
2016-06-06 10:18:34 (84): Guest Log: Importing Docker image from BOINC...
2016-06-06 10:18:34 (84): Guest Log: 00:00:10.048631 vminfo Error: Unable to connect to system D-Bus (3/3): D-Bus not installed
2016-06-06 10:18:44 (84): Guest Log: Prerun diagnostics...
2016-06-06 10:18:44 (84): Guest Log: REPOSITORY TAG IMAGE ID CREATED SIZE
2016-06-06 10:18:44 (84): Guest Log: marius311/camb_boinc2docker 1.0.0-slim f383d587a9c8


So whatever was meant to happen in the 10 seconds after the D-bus message didn't happen on time in the first run.

All this caused me to check the error jobs again and I've two more instances of jobs timing out but the problem seems to be in different places in the run. The tasks are 40881287 and 40839379. This last one was suspended at checkpoint and resumed the next day, my normal routine.

I've also one example of the finish file present too long (known problem) task 40881241.
ID: 21091 · Report as offensive     Reply Quote
Profile Marius
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 29 Jun 15
Posts: 470
Credit: 4,276
RAC: 0
Message 21092 - Posted: 6 Jun 2016, 13:34:44 UTC - in response to Message 21091.  

Very helpful info thanks. What's happening right at that "Importing Docker image from BOINC..." step is a few tar files are getting extracted from inside the VM, concatenated into a single tar, then fed into Docker. Something about this must have hung. I think the D-Bus thing is just a red-herring, that line is always present. I will have to think of how to best get a reasonable log of what's going on...

The other errors look unrelated to me, but generally similar to the types of errors that are on my radar.
ID: 21092 · Report as offensive     Reply Quote
maeax

Send message
Joined: 21 Dec 17
Posts: 18
Credit: 1,737,784
RAC: 4,410
Message 22043 - Posted: 8 Jan 2019, 11:12:50 UTC

Have now boinc2docker running in Windows:
Win10pro-Vbox5.2.22
This D-Bus is always shown in a successful finished task
2019-01-08 11:18:27 (4948): Guest Log: Running boinc_app...
2019-01-08 11:18:27 (4948): Guest Log: Importing Docker image from BOINC...
2019-01-08 11:18:33 (4948): Guest Log: 00:00:10.027536 vminfo Error: Unable to connect to system D-Bus (3/3): D-Bus not installed
Is this a normal function?
ID: 22043 · Report as offensive     Reply Quote

Forums : Technical Support : One boinc2docker v2.00 job stuck on 0.1% for 10 minutes