Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Support Forum: GPU Support Forum Thread: PV comedy of error (and no replies) |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 60
|
Author |
|
wplachy
Senior Cruncher Joined: Sep 4, 2007 Post Count: 423 Status: Offline |
knreed, thank you for spending the time on a Friday night (why is it almost always 4:00 PM Friday when everything goes south?) to put these changes/work arounds in place
----------------------------------------As this set of problems go...Early christians = hungry lions I guess
Bill P
|
||
|
pirogue
Veteran Cruncher USA Joined: Dec 8, 2008 Post Count: 685 Status: Offline Project Badges: |
This should return life to normal for now. Yea! |
||
|
pirogue
Veteran Cruncher USA Joined: Dec 8, 2008 Post Count: 685 Status: Offline Project Badges: |
I just found several PVs where 2 wingmen have returned, but there's no Try Validation link. A couple examples:
----------------------------------------Project Name: Help Conquer Cancer Created: 11/02/2012 16:55:10 Name: X0900073110755200608212043 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit X0900073110755200608212043_ 6-- 656 Pending Validation 11/16/12 23:35:50 11/17/12 03:32:18 0.02 27.5 / 0.0 X0900073110755200608212043_ 5-- 656 Error 11/16/12 23:34:10 11/16/12 23:35:24 0.00 32.0 / 0.0 X0900073110755200608212043_ 4-- - No Reply 11/15/12 23:02:25 11/16/12 23:25:31 0.00 0.0 / 0.0 X0900073110755200608212043_ 3-- - No Reply 11/13/12 03:50:19 11/15/12 23:02:19 0.00 0.0 / 0.0 X0900073110755200608212043_ 2-- - No Reply 11/10/12 08:38:07 11/13/12 03:50:07 0.00 0.0 / 0.0 X0900073110755200608212043_ 0-- 656 Pending Validation 11/3/12 08:37:53 11/3/12 09:08:41 0.03 34.2 / 0.0 X0900073110755200608212043_ 1-- 656 Error 11/3/12 08:37:53 11/10/12 15:56:17 0.00 0.0 / 0.0 Project Name: Help Conquer Cancer Created: 11/03/2012 04:50:42 Name: X0900073660464200608171115 Minimum Quorum: 2 Replication: 2 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time / Elapsed Time (hours) Claimed/ Granted BOINC Credit X0900073660464200608171115_ 5-- 656 Pending Validation 11/16/12 06:24:53 11/17/12 05:20:24 0.02 27.5 / 0.0 X0900073660464200608171115_ 4-- - No Reply 11/13/12 11:03:11 11/16/12 06:15:45 0.00 0.0 / 0.0 X0900073660464200608171115_ 3-- 656 Error 11/13/12 10:57:25 11/13/12 11:03:03 0.00 32.2 / 0.0 X0900073660464200608171115_ 2-- - No Reply 11/10/12 15:45:18 11/13/12 10:55:17 0.00 0.0 / 0.0 X0900073660464200608171115_ 0-- 656 Pending Validation 11/3/12 15:45:08 11/3/12 16:20:03 0.03 34.4 / 0.0 X0900073660464200608171115_ 1-- - No Reply 11/3/12 15:45:08 11/10/12 15:45:08 0.00 0.0 / 0.0 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
We are going to have to modify this section of (the code) so that it works correctly. The use of an app_info.xml file exposed a weakness in WCG's scheduling-server. Thanks to Ingleside for pointing out the weakness, and thanks to knreed for acknowledging the need to fix the weakness. Like mistakes, the only thing that one can not be proud of in weaknesses -- is the absence of efforts at correction. ; |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: |
11/17/2012 2:41:40 AM | World Community Grid | Sending scheduler request: To fetch work.
----------------------------------------11/17/2012 2:41:40 AM | World Community Grid | Reporting 30 completed tasks, requesting new tasks for ATI 11/17/2012 2:41:43 AM | World Community Grid | Scheduler request completed: got 0 new tasks 11/17/2012 2:41:43 AM | World Community Grid | Project is temporarily shut down for maintenance |
||
|
BladeD
Ace Cruncher USA Joined: Nov 17, 2004 Post Count: 28976 Status: Offline Project Badges: |
11/17/2012 3:03:16 AM | World Community Grid | Sending scheduler request: To fetch work.
----------------------------------------11/17/2012 3:03:16 AM | World Community Grid | Reporting 37 completed tasks, requesting new tasks for ATI 11/17/2012 3:03:33 AM | World Community Grid | Scheduler request completed: got 3 new tasks 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076970829200611141915_0 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076970835200611141916_0 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076971287200611141908_1 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480784200610171342_0 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480755200610171343_1 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480777200610171343_0 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480779200610171343_1 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480787200610171342_1 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480792200610171342_0 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480793200610171342_1 (expired) 11/17/2012 3:03:33 AM | World Community Grid | Didn't resend lost task X0930076480795200610171342_0 (expired) |
||
|
Ingleside
Veteran Cruncher Norway Joined: Nov 19, 2005 Post Count: 974 Status: Offline Project Badges: |
We are the only ones using homogenous_app_version - and I believe we are the first. And after reviewing the BOINC code today, I've discovered that it has a complete lack of support for the app_info.xml/anonymous mechanism. Additionally, not all projects use the resend lost results feature. "Homogenous redundancy" was added by Predictor@home so WCG is definitely not the 1st. and some other projects have also used and possibly still uses this. WCG being the 1st. project with large-scale usage of both HR & anonymous platform is probably correct, so while normal scheduling & Einstein@home's special scheduling has the neccessary code to handle anonymous platform having overloocked HR is understandable, since remembers many instances where a code-change has broken anonymous platform and later bug-fixes has been added. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
||
|
knreed
Former World Community Grid Tech Joined: Nov 8, 2004 Post Count: 4504 Status: Offline Project Badges: |
Ingleside - there are two types of homogeneous redundancy. There is the basic version 'hr_class' that is based on operating system version and processor version. This is what Predictor@home created.
However, with GPU, we can now have three application versions that that will match a single hr_class and the only differentiating factor between these is whether it ran on the cpu, ati gpu or nvidia gpu. Help Conquer Cancer does not compare between these three versions. Extending the basic hr mechanism to handle this level of difference had the potential to create a massive set of hr_classes and David (BOINC) didn't want to go that route. As a result, he created a new type of homogenous redundancy called homogeneous app version. This works be assigning all jobs in a workunit to be run using the same app version. It is this new mechanism that we are the 1st to use and we are therefore find the gaps in its capabilities. I have started conversing with David about the fix for this and we will see where it goes. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
For all I remember the original "classic" HR functionality has been in use at WCG since about the days that BOINC was expanded beyond just running on Linux i.e. it was used to distribute wingman to the same platform as the original copy of a work-unit went... would be ca. Q1-2006. Wiki: http://boinc.berkeley.edu/trac/wiki/HomogeneousRedundancy (referenced in our Start Here FAQ index).
|
||
|
branjo
Master Cruncher Slovakia Joined: Jun 29, 2012 Post Count: 1892 Status: Offline Project Badges: |
Got (at least) few 7.05 WU's with length as 6.56's
----------------------------------------e.g. X0960075010237200609191443_ 2-- branjo-PC Valid 17.11.2012 19:25:40 17.11.2012 19:38:28 0.05 / 0.20 33.5 / 33.2 (elapsed time of normal 7.05's is ca. 0.40) Crunching@Home since January 13 2000. Shrubbing@Home since January 5 2006 |
||
|
|