1) Forums : Technical Support : Lots and lots of Invalid tasks (Message 20908)
Posted 28 Feb 2016 by fractal
Post:
All problems fixed! Thank you. No more invalid tasks.

No more i686 camb_legacy units today and I am getting planck vbox64_mt units on the machine that I fixed by following the instructions in the FAQ.

Thanks for the quick response.
2) Forums : Technical Support : Lots and lots of Invalid tasks (Message 20905)
Posted 27 Feb 2016 by fractal
Post:
Thank you for the reply, Marius

I got pretty good at recognizing the i686 work units without having to verify them on the site by run times. I aborted a hundred or so this morning that would not have validated. The project kept giving me i686 work units as fast as I aborted them. I am down to only x86_64 for now.

That FAQ was very helpful. I found <p_vm_extensions_disabled> in my client_state.xml and deleting that line got the machine to accept vbox work units. I will be installing vbox on more machines over time when I upgrade the memory in them.
3) Forums : Technical Support : Lots and lots of Invalid tasks (Message 20898)
Posted 27 Feb 2016 by fractal
Post:
I just put a bunch of machines on this project and am seeing a large number of Invalid tasks with Validate errors. Checking the forum I see that this appears to be a known issue when an x86_64 linux machine is given i686 work.

A bit of googling says I can add

<no_alt_platform>0|1</no_alt_platform>

to my cc_config file but that would affect all projects, not just Cosmology@home. Cosmology@home really shouldn't give work to my machines when it knows it won't like the result.

I have virtualbox installed on several of the machines and I allow test applications but I am not getting any VB work.

Work units take about 10 hrs to run whether they are going to work or not and my luck since I came back is 15 invalid tasks and 33 valid.

Is there any way to stop getting i686 work, or is there any way for it to validate properly?
4) Forums : Technical Support : Pending Because Work Unit Remains "Unsent" (Message 3039)
Posted 1 Oct 2007 by fractal
Post:
1) Generic CPU classifications (i.e. "Intel", "AMD", "AMDAthlon") had a bunch of unsent results. That's mostly due to the failure of the system to identify certain processors correctly. Therefore, I combined these generic classifications with their more common counterparts (i.e. AthlonXP, Pentium4, etc). This will probably slightly increase the invalidation rate, but it should create too big a problem.

I ran my machine with 13k backlog through the HR classification algorithm and found out that indeed, it was classified as "intel", but then again, so were another 4 or 5 machines. The good news is it is down to 12k backlog. The bad news is it is still only getting NEW work, and old work is not being validated. It is almost as if the only work that is getting validated is that which is assigned to two machines during the brief interval that the WU is in shared memory. This shouldn't be a problem for machines with common HR classes as the backlog builds slowly. It was a bigger problem for machines with uncommon HR classes. Changing the HR class won't convince the scheduler to give out old work before it gives out new work...
5) Forums : Technical Support : Pending Because Work Unit Remains "Unsent" (Message 2957)
Posted 26 Sep 2007 by fractal
Post:
It seems things are getting better. This is the first time my number of "unsents" has not increased.
... snip ...
However, I still have 6 that have been hanging around almost a month now.
All are Intel P4/Linux, which shouldn't be that rare.

None of mine from August have been touched either, not from the p3/linux nor the p4/linux. The only credit I received today has been from http://www.cosmologyathome.org/workunit.php?wuid=458978 where my p4 verified the work of someone else's p4 from almost exactly 2 weeks ago. I am still getting almost exclusively new (generated yesterday or today) work. The month old stuff ( like http://www.cosmologyathome.org/workunit.php?wuid=407621 ) is still waiting.


6) Forums : Technical Support : Pending Because Work Unit Remains "Unsent" (Message 2941)
Posted 25 Sep 2007 by fractal
Post:
It's not a permanent fix, but I'm going to add the order clause to the enumerate_all function to see if that helps out the problem.

That might work but I would still be concerned with the "and r1.id>start_id" clause in enumerate_all. I don't have enough exposure to the boinc code to know if it is safe to just call enumerate instead of enumerate_all even if HR is enabled, but that would be my initial hunch to take you back to the behavior you had before the june patch.
7) Forums : Technical Support : Pending Because Work Unit Remains "Unsent" (Message 2926)
Posted 24 Sep 2007 by fractal
Post:
I'm experimenting with the feeder options right now. I'm using "priority_order_create_time", but that doesn't seem to be doing the job. I'll try just priority_order for now and see if I can't get the high-priority results to be sent.

If you are talking about feeder command line options, neither "-priority_order" nor "-priority_order_create_time" have any affect when homogeneous_redundancy is enabled if I read feeder.C correctly.

What confuses me even more is that if homogeneous_redundancy is set, feeder.C invokes DB_WORK_ITEM::enumerate_all instead of DB_WORK_ITEM::enumerate which would take the order_clause, and DB_WORK_ITEM::enumerate_all seems to walk forward through the job list by keeping track of the last unit fetched giving little chance of getting old wu's unless the feeder makes a complete pass through the file.

Anyway, I am no boinc expert but it seems to me that https://boinc.berkeley.edu/trac/changeset/12988 with HR enabled would cause ... well, what we are seeing. The comment in the changeset
With extremely minimal testing, the new HR stuff seems to work.
doesn't instill confidence either ;)
8) Forums : Technical Support : Pending Because Work Unit Remains "Unsent" (Message 2857)
Posted 19 Sep 2007 by fractal
Post:
Have now set Boinc Manager to "No New Tasks". Will wait and see what happens.


I'm joining Conan on this. I recruited several of my teammates who have identical CPU/OS and my Pendings are still increasing, even though I had only a couple of cores running this project.


I have only been participating in this project for three weeks and while I have more pending credits than granted and some of the first work I submitted three weeks ago is still pending, I don't see this as an unsolvable issue. So, my cores crunch away, cause if all that pending gets assigned in one day, wowzer what a day my stats will have ;)
9) Forums : Technical Support : Sempron HR Classification (Message 2763)
Posted 16 Sep 2007 by fractal
Post:
Still see about 46 unsent that were issued prior to Sept 5th roughly. This is about the same time you changed the HR on Semprons I believe.


Strange but I see almost the samething with my Sempron.
Some older WU's are not send out for the 2nd time, and are still looking for a 2nd CPU. About the HR class, never seen an other CPU then a sempron for the moment.
host id 2855

I don't think it is a HR issue. I have 2-3 week old wu's looking for a second opinion. I look at the machines that offered second opinions and they are getting work, NEW work. Based on that it is fairly clear that a) HR in and by itself is not the issue and b) the scheduler is not working correctly. That is if you believe, as I do, that the scheduler should first issue the oldest wu for the current HR class that has already had work on it, followed by the oldest wu for the current HR class.

What APPEARS to be happening is that it giving out the NEWEST WU period and only by luck do people happen to have a WU assigned to two people at the same time. Sometimes new work runs out and old work manages to get given out.

my p3 demonstrates this clearly with (currently) one completed WU, its wingman off doing first run work with a bunch of pending and four wu's over 2 weeks old awaiting a second opinion and COS HR table showing over a half dozen pentium 3's running linux having done work the past week.
10) Forums : General Topics : Pending credits (Message 2559)
Posted 7 Sep 2007 by fractal
Post:
This thread seems to be wandering and while I am not replying to it, it is the same subject, so ...

Is there any plan to figure out what the heck is going on with pending wu's? Half the work I have done is stuck "pending". A machine I haven't used for a week still has half its work "pending". There are machines in the same HR class doing work. I can tell this by looking at the WU's that have been completed and look at the machines that did the other half. I look at them and they are doing work and like me, half of their results are "pending".

I fully understand that this is an alpha project, and suspect that this is a known issue, but thought I would ask all the same. I did a little investigation in an earlier thread and convinced myself that it is not a specific result of HR as there ARE other machines in the same HR class as mine and many of them are getting new work instead of verification work.

Is this a known, under investigation issue? If so, I will go hide back under my rock...
11) Forums : General Topics : CPU and O/S list as of 18 Aug, 2007 (Message 2472)
Posted 1 Sep 2007 by fractal
Post:
Just out of curiousity, are you counting each core as a processor? The reason I ask, is because on your AMD page, you show 2 Dual CoreAMD Opteron(tm) Processor 175's, but I'm pretty sure that all those points were (are) mine.

http://cosmos.astro.uiuc.edu/cosmohome/show_host_detail.php?hostid=1464

-jim

I don't know how he counted, but http://www.cosmologyathome.org/show_host_detail.php?hostid=567, http://www.cosmologyathome.org/show_host_detail.php?hostid=1464 and http://www.cosmologyathome.org/show_host_detail.php?hostid=2702 are all dual Opteron 175's with the last one being created after his post.

12) Forums : General Topics : CPU and O/S list as of 18 Aug, 2007 (Message 2447)
Posted 30 Aug 2007 by fractal
Post:
I did write the little php script I described above. It takes a while to parse the xml file so I generated a static snapshot here. I guess it explains why I have such a long pending list as there are only 3 linux pentium D machines out there, one of which is mine. It uses the stock boinc classifier to determine os / processor, and yes, my generated HTML looks crappy ;)
13) Forums : General Topics : CPU and O/S list as of 18 Aug, 2007 (Message 2441)
Posted 30 Aug 2007 by fractal
Post:
Have you taken http://www.cosmologyathome.org/stats/host.gz, applied the classification from http://boinc.berkeley.edu/trac/browser/trunk/boinc/sched/hr.C ? I was going to do that until I realized I couldn't really define what "active" was. The best guess I had was to set a threshold on expavg_credit. I may still write a php script to do it as with only 2 days on the project, I have twice as many pending wu's as I do resolved on fairly common hardware.