Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 60
Posts: 60   Pages: 6   [ Previous Page | 1 2 3 4 5 6 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 24362 times and has 59 replies Next Thread
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

knreed, thank you for spending the time on a Friday night (why is it almost always 4:00 PM Friday when everything goes south?) to put these changes/work arounds in place applause

As this set of problems go...Early christians = hungry lions I guess biggrin
----------------------------------------
Bill P

[Nov 17, 2012 4:49:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
pirogue
Veteran Cruncher
USA
Joined: Dec 8, 2008
Post Count: 685
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

This should return life to normal for now.
Yea!
----------------------------------------

[Nov 17, 2012 5:31:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
pirogue
Veteran Cruncher
USA
Joined: Dec 8, 2008
Post Count: 685
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

I just found several PVs where 2 wingmen have returned, but there's no Try Validation link. A couple examples:

Project Name: Help Conquer Cancer
Created: 11/02/2012 16:55:10
Name: X0900073110755200608212043
Minimum Quorum: 2
Replication: 2


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
X0900073110755200608212043_ 6-- 656 Pending Validation 11/16/12 23:35:50 11/17/12 03:32:18 0.02 27.5 / 0.0
X0900073110755200608212043_ 5-- 656 Error 11/16/12 23:34:10 11/16/12 23:35:24 0.00 32.0 / 0.0
X0900073110755200608212043_ 4-- - No Reply 11/15/12 23:02:25 11/16/12 23:25:31 0.00 0.0 / 0.0
X0900073110755200608212043_ 3-- - No Reply 11/13/12 03:50:19 11/15/12 23:02:19 0.00 0.0 / 0.0
X0900073110755200608212043_ 2-- - No Reply 11/10/12 08:38:07 11/13/12 03:50:07 0.00 0.0 / 0.0
X0900073110755200608212043_ 0-- 656 Pending Validation 11/3/12 08:37:53 11/3/12 09:08:41 0.03 34.2 / 0.0
X0900073110755200608212043_ 1-- 656 Error 11/3/12 08:37:53 11/10/12 15:56:17 0.00 0.0 / 0.0


Project Name: Help Conquer Cancer
Created: 11/03/2012 04:50:42
Name: X0900073660464200608171115
Minimum Quorum: 2
Replication: 2


Result Name App Version Number Status Sent Time Time Due /
Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit
X0900073660464200608171115_ 5-- 656 Pending Validation 11/16/12 06:24:53 11/17/12 05:20:24 0.02 27.5 / 0.0
X0900073660464200608171115_ 4-- - No Reply 11/13/12 11:03:11 11/16/12 06:15:45 0.00 0.0 / 0.0
X0900073660464200608171115_ 3-- 656 Error 11/13/12 10:57:25 11/13/12 11:03:03 0.00 32.2 / 0.0
X0900073660464200608171115_ 2-- - No Reply 11/10/12 15:45:18 11/13/12 10:55:17 0.00 0.0 / 0.0
X0900073660464200608171115_ 0-- 656 Pending Validation 11/3/12 15:45:08 11/3/12 16:20:03 0.03 34.4 / 0.0
X0900073660464200608171115_ 1-- - No Reply 11/3/12 15:45:08 11/10/12 15:45:08 0.00 0.0 / 0.0
----------------------------------------

[Nov 17, 2012 5:50:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

We are going to have to modify this section of (the code) so that it works correctly.
The use of an app_info.xml file exposed a weakness in WCG's scheduling-server. Thanks to Ingleside for pointing out the weakness, and thanks to knreed for acknowledging the need to fix the weakness. Like mistakes, the only thing that one can not be proud of in weaknesses -- is the absence of efforts at correction. coffee
;
[Nov 17, 2012 6:40:13 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

11/17/2012 2:41:40 AM | World Community Grid | Sending scheduler request: To fetch work.
11/17/2012 2:41:40 AM | World Community Grid | Reporting 30 completed tasks, requesting new tasks for ATI
11/17/2012 2:41:43 AM | World Community Grid | Scheduler request completed: got 0 new tasks
11/17/2012 2:41:43 AM | World Community Grid | Project is temporarily shut down for maintenance
----------------------------------------
[Nov 17, 2012 7:43:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

11/17/2012 3:03:16 AM | World Community Grid | Sending scheduler request: To fetch work.
11/17/2012 3:03:16 AM | World Community Grid | Reporting 37 completed tasks, requesting new tasks for ATI
11/17/2012 3:03:33 AM | World Community Grid | Scheduler request completed: got 3 new tasks
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076970829200611141915_0 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076970835200611141916_0 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076971287200611141908_1 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480784200610171342_0 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480755200610171343_1 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480777200610171343_0 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480779200610171343_1 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480787200610171342_1 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480792200610171342_0 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480793200610171342_1 (expired)
11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480795200610171342_0 (expired)
----------------------------------------
[Nov 17, 2012 8:08:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Ingleside
Veteran Cruncher
Norway
Joined: Nov 19, 2005
Post Count: 974
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

We are the only ones using homogenous_app_version - and I believe we are the first. And after reviewing the BOINC code today, I've discovered that it has a complete lack of support for the app_info.xml/anonymous mechanism. Additionally, not all projects use the resend lost results feature.

"Homogenous redundancy" was added by Predictor@home so WCG is definitely not the 1st. and some other projects have also used and possibly still uses this.

WCG being the 1st. project with large-scale usage of both HR & anonymous platform is probably correct, so while normal scheduling & Einstein@home's special scheduling has the neccessary code to handle anonymous platform having overloocked HR is understandable, since remembers many instances where a code-change has broken anonymous platform and later bug-fixes has been added.
----------------------------------------


"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
[Nov 17, 2012 2:26:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

Ingleside - there are two types of homogeneous redundancy. There is the basic version 'hr_class' that is based on operating system version and processor version. This is what Predictor@home created.

However, with GPU, we can now have three application versions that that will match a single hr_class and the only differentiating factor between these is whether it ran on the cpu, ati gpu or nvidia gpu. Help Conquer Cancer does not compare between these three versions. Extending the basic hr mechanism to handle this level of difference had the potential to create a massive set of hr_classes and David (BOINC) didn't want to go that route. As a result, he created a new type of homogenous redundancy called homogeneous app version. This works be assigning all jobs in a workunit to be run using the same app version. It is this new mechanism that we are the 1st to use and we are therefore find the gaps in its capabilities.

I have started conversing with David about the fix for this and we will see where it goes.
[Nov 17, 2012 3:51:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

For all I remember the original "classic" HR functionality has been in use at WCG since about the days that BOINC was expanded beyond just running on Linux i.e. it was used to distribute wingman to the same platform as the original copy of a work-unit went... would be ca. Q1-2006. Wiki: http://boinc.berkeley.edu/trac/wiki/HomogeneousRedundancy (referenced in our Start Here FAQ index).
[Nov 17, 2012 4:22:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
branjo
Master Cruncher
Slovakia
Joined: Jun 29, 2012
Post Count: 1892
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: PV comedy of error (and no replies)

Got (at least) few 7.05 WU's with length as 6.56's

e.g. X0960075010237200609191443_ 2-- branjo-PC Valid 17.11.2012 19:25:40 17.11.2012 19:38:28 0.05 / 0.20 33.5 / 33.2 (elapsed time of normal 7.05's is ca. 0.40)
----------------------------------------

Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006

[Nov 17, 2012 7:55:08 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 60   Pages: 6   [ Previous Page | 1 2 3 4 5 6 ]
[ Jump to Last Post ]
Post new Thread