Advanced search

Forums : Technical Support : active_frac too low
Message board moderation

To post messages, you must log in.

AuthorMessage
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3565 - Posted: 27 Oct 2007, 8:41:02 UTC
Last modified: 27 Oct 2007, 9:02:38 UTC

Hi guys -

Sorry to have yet another issue after yesterday's minor attachment SNAFU, but this has been an ongoing problem for many months, and is irritating the heck out me.

In my client_state file the active_frac value is ridiculously low; something on the order of 0.00048! This has been keeping me from getting more than one WU per core on this specific machine (my other machines are fine), and also forcing one or two of the four WUs to always be running in "high priority" mode (the machine is a Q6600 C2Q). I have no idea how this came about.

To make matters more difficult, I've even manually edited the active_frac value using an HTML text editor, only to have it always revert back to the same extremely low value whenever the BOINC MGR contacts the project (including on initial start-up)! I'm trying to ascertain if the state reverts only during comm or if during other actions of the BOINC MGR, but I'm not fully knowledgeable on when comm is taking place in the normal launching and running cycles of the BOINC MGR.

Is the active_frac value set by the project server upon comm? Or possibly in some (XP) registry key (I checked and didn't see it in the usual BOINC reg keys)?

The only BOINC project currently running on this machine is C@H. To add further insult, I even removed and re-installed the BOINC MGR, to no avail.

I'm really at a loss on this one.

If this issue is beyond the scope of the C@H project forum, I'll head over to the BOINC forums. I hate to bother you guys with something "generic" BOINC if you believe this doesn't fall under the realm of C@H.

Thanks.

Eddie

Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3565 · Report as offensive     Reply Quote
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3567 - Posted: 27 Oct 2007, 9:17:57 UTC

Is it 1679 that is the problem? It looks like you are over-clocking the snot out of it. My Q6600 (running @2.8 GHz) benchmarks at 2700, yours is 3600 or so. That may or may not be part of the issue, but I am worried when I see an average turn-around time or 0.07 (just over an hour).
Are you stopping Boinc (including the service, if running as a service) before doing the edit on client_state.xml? You can't change anything while boinc is active.


Boinc Button Abuser In Training >My Shrubbers<
ID: 3567 · Report as offensive     Reply Quote
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3568 - Posted: 27 Oct 2007, 9:32:09 UTC - in response to Message 3567.  
Last modified: 27 Oct 2007, 9:33:27 UTC

Is it 1679 that is the problem? It looks like you are over-clocking the snot out of it. My Q6600 (running @2.8 GHz) benchmarks at 2700, yours is 3600 or so. That may or may not be part of the issue, but I am worried when I see an average turn-around time or 0.07 (just over an hour).
Are you stopping Boinc (including the service, if running as a service) before doing the edit on client_state.xml? You can't change anything while boinc is active.

Good estimate - 3.6 she is, under water. I validate all my machines for a minimum of 12 - 24 hrs under Prime for all cores. I did this machine for >14 hrs using Quad Prime just recently. She's run thousands of DC WUs, including months of C@H over the past year or so, w/o any error issues, and I just re-validated her a few weeks ago. I'll see if I can host a screenie of the stress test completion, but I'm very confident of stability.

I did the HTML editing several times, including with BOINC exited. I do admit I'm no HTML pro, tho. I'm considering another R & R of the BOINC MGR, but this time checking if the BOINC folder is removed. IIRC, the BOINC diredctory remains after an uninstall....well it worked that way a while ago, not sure if it still does.
Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3568 · Report as offensive     Reply Quote
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3570 - Posted: 27 Oct 2007, 9:39:33 UTC

Nah -

Just tried again, with BOINC fully exited. I could launch BOINC while suspended, and the active_frac file was at the 0.99 value I entered. As soon as I took the project off suspend, it reverted back to the low value.
Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3570 · Report as offensive     Reply Quote
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3571 - Posted: 27 Oct 2007, 9:46:40 UTC

What does the time stats section of client_state.xml look like? For instance mine is:
<time_stats>
<on_frac>0.981773</on_frac>
<connected_frac>-1.000000</connected_frac>
<active_frac>0.999890</active_frac>
<cpu_efficiency>0.980228</cpu_efficiency>
<last_update>1193476410.968750</last_update>
</time_stats>


Boinc Button Abuser In Training >My Shrubbers<
ID: 3571 · Report as offensive     Reply Quote
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3572 - Posted: 27 Oct 2007, 9:52:24 UTC

Here's the issue:

<time_stats>
<on_frac>1.000000</on_frac>
<connected_frac>1.000000</connected_frac>
<active_frac>0.004791</active_frac>
<cpu_efficiency>0.881525</cpu_efficiency>
<last_update>1198426454.078125</last_update>
</time_stats>


active_frac is way too low!
Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3572 · Report as offensive     Reply Quote
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3573 - Posted: 27 Oct 2007, 10:01:48 UTC

Outside of the active_frac it looks normal. I have heard of this problem on other projects, one fix was to do a project reset. That will wipe out any active WUs, but will reset all vars and start clean. Rather than kill the current WUs, you could set "no new tasks" and let the queue run out, then do the reset. (There may be something inside Boinc we can't see. We would need one of the Boinc gurus to tell. Ageless??).

Boinc Button Abuser In Training >My Shrubbers<
ID: 3573 · Report as offensive     Reply Quote
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3574 - Posted: 27 Oct 2007, 10:12:56 UTC
Last modified: 27 Oct 2007, 10:36:01 UTC

Here ya' go Mike:

Q6600 Stress Test @ 3.6 Running

Q6600 Stress Test @ 3.6 Stopped

This was an >14 hr stress test done just a week or two ago. The initial validation period was approx 24 hrs Quad Prime95, 8 hrs Memtest86+, and 2 or 3 hours looping 3DMark2001, before initial deployment. I then re-validate (varying time frames) every few months. All machines are on APC UPSs with USB enabled s/w shutdown.

Trust me, I'm glad you're concerned about stability. I'm somewhat of a stability freak myself. To be any less, especially on a alpha or beta project, is unconscionably irresponsible, IMHO.

Thanks for bringing it up, tho.
Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3574 · Report as offensive     Reply Quote
StratCat

Send message
Joined: 20 Jul 07
Posts: 26
Credit: 263,710
RAC: 0
Message 3576 - Posted: 27 Oct 2007, 10:22:16 UTC - in response to Message 3573.  
Last modified: 27 Oct 2007, 10:47:27 UTC

Outside of the active_frac it looks normal. I have heard of this problem on other projects, one fix was to do a project reset. That will wipe out any active WUs, but will reset all vars and start clean.

Interesting you mention "reset" rather than attach/detach. I did R & R the BOINC MGR, but didn't try a reset.


Rather than kill the current WUs, you could set "no new tasks" and let the queue run out, then do the reset.

Absolutely! There's too much at stake here for all involved, and I expect the same consideration from my fellow wingmen. I am a little teed off though, that on one of my attempts at modifying the active_frac file I got a download error. Haven't had any errors since the faulty WUs of several months back, so I suspect my fiddling around had something to do with it.

Thanks for your suggestions.

<edit

I take that back: Seems 638479 has errored out on 3 other peeps w/i minutes of mine.

</edit
Team Ars Technica
The Dogs of War - Chicago Chapter
ID: 3576 · Report as offensive     Reply Quote
Profile ohiomike
Avatar

Send message
Joined: 17 Jul 07
Posts: 302
Credit: 5,006,319
RAC: 0
Message 3580 - Posted: 27 Oct 2007, 11:27:00 UTC - in response to Message 3574.  
Last modified: 27 Oct 2007, 11:31:45 UTC

Here ya' go Mike:
Q6600 Stress Test @ 3.6 Running

Q6600 Stress Test @ 3.6 Stopped
<snip>

I'm surprised, 60 deg @ 3.6 GHz. I can only get to 2.8 on air before I break 60. What cooler are you using?
PS- I generally use OCCT for burn-in. It's simpler than starting x cases of prime95. ocbase- OCCT.


Boinc Button Abuser In Training >My Shrubbers<
ID: 3580 · Report as offensive     Reply Quote
Profile Sou'westerly

Send message
Joined: 1 Jul 07
Posts: 37
Credit: 208,284
RAC: 0
Message 3584 - Posted: 27 Oct 2007, 13:10:32 UTC - in response to Message 3565.  

Hi guys -

Sorry to have yet another issue after yesterday's minor attachment SNAFU, but this has been an ongoing problem for many months, and is irritating the heck out me.

In my client_state file the active_frac value is ridiculously low; something on the order of 0.00048! This has been keeping me from getting more than one WU per core on this specific machine (my other machines are fine), and also forcing one or two of the four WUs to always be running in "high priority" mode (the machine is a Q6600 C2Q). I have no idea how this came about.

To make matters more difficult, I've even manually edited the active_frac value using an HTML text editor, only to have it always revert back to the same extremely low value whenever the BOINC MGR contacts the project (including on initial start-up)! I'm trying to ascertain if the state reverts only during comm or if during other actions of the BOINC MGR, but I'm not fully knowledgeable on when comm is taking place in the normal launching and running cycles of the BOINC MGR.

Is the active_frac value set by the project server upon comm? Or possibly in some (XP) registry key (I checked and didn't see it in the usual BOINC reg keys)?

The only BOINC project currently running on this machine is C@H. To add further insult, I even removed and re-installed the BOINC MGR, to no avail.

I'm really at a loss on this one.

If this issue is beyond the scope of the C@H project forum, I'll head over to the BOINC forums. I hate to bother you guys with something "generic" BOINC if you believe this doesn't fall under the realm of C@H.

Thanks.

Eddie


Eddie, I wonder if your problem is that BOINC is reverting to the client state back up file. Two possible reasons for this could be:
1, You have the file locked open in another program when BOINC wants to read it.
2, Your xml editor is doing something to the file which BOINC does not like. When it reads it to do the comm it fails and reverts to the backup. I use Notepad to edit this file and have just used <active_frac>0.990000</active_frac> I then saved it under all files with a .xml extension. It has work fine even after a couple of project updates and has not reverted to the old value. Dave.

ID: 3584 · Report as offensive     Reply Quote

Forums : Technical Support : active_frac too low