Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 38
Posts: 38   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 32310 times and has 37 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
DDDT2 Type C -- Possible bad batches

I didn't see a separate thread for potential bad C Type WU's so I started this one.

All my C-Type pr WU's errored out on 3 different machines---

ts06_ b014_ pr34b0_ 0-- starbase2.command Error 4/10/10 16:42:54 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b014_ pr23b0_ 0-- starbase2.command Error 4/10/10 16:42:54 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b013_ pr78b1_ 0-- starbase2.command Error 4/10/10 16:42:35 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b011_ pr45a0_ 0-- starbase4.command Error 4/10/10 16:39:33 4/10/10 16:47:27 0.00 0.0 / 0.0
ts06_ b008_ pr45a0_ 0-- starbase4.command Error 4/10/10 16:36:44 4/10/10 16:47:27 0.00 0.0 / 0.0
ts06_ b006_ pr02b1_ 1-- starbase2.command Error 4/10/10 16:34:51 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b006_ pr02a1_ 0-- starbase2.command Error 4/10/10 16:34:49 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b005_ pr02a0_ 0-- starbase2.command Error 4/10/10 16:34:11 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b004_ pr91b1_ 0-- starbase2.command Error 4/10/10 16:34:10 4/10/10 16:54:12 0.00 0.0 / 0.0
ts06_ b003_ pr91b1_ 1-- starbase4.command Error 4/10/10 16:33:28 4/10/10 16:47:27 0.00 0.0 / 0.0

Make Up WU's:

ts06_ b008_ pr23a1_ 3-- starbase1.command Error 4/10/10 17:22:58 4/10/10 17:30:36 0.00 0.0 / 0.0
ts06_ a013_ pr89a0_ 2-- starbase1.command Error 4/10/10 17:22:58 4/10/10 17:30:36 0.00 0.0 / 0.0

Same error log for each WU including wingmen on make up WU's:

<core_client_version>6.10.17</core_client_version>
<![CDATA[
<message>
process exited with code 2 (0x2, -254)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
At line 10923 of file pbeq2.f
Fortran runtime error: End of file

</stderr_txt>
]]>

Athlon64x2, 2GB, RHEL5.4, 6.10.17
Phenom 9600, 4GB, Fedora 12, 6.10.17
Phenom II 945, 4GB, Fedora 11, 6.10.17
[Apr 10, 2010 5:45:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

I posted here on the 10 or so that failed for me on all OS platforms.
[Apr 10, 2010 5:58:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

All of the "pr" WUs that I received in this last shower crashed, too.
However, they behaved differently on different machines. Sorry, but I am uncertain of which machine behaved in each way, because things happened quickly and I did not make written notes.
All machines that got these WUs are Intel Yorkfield quads (Q9650, QX9650).
2 Run XP-32, 2 run XP-64, all run BOINC 6.2.19 32-bit.
All machines are overclocked.

On the 2 XP-64 machines, the WUs crashed immediately, with error 29, and some diagnostics in the error logs. See descriptions below for devices RJB-Q9650A and rjb-q9650c.
On the 2 XP-32 machines, the WU ran for about 1-2 minutes, with the BOINC CPU Time field either empty or showing 00:00:00. Then CHARMM crashed, and Windows popped up a crash report window. On one machine, the window invited reporting the error to Microsoft, but on the other, it didn't. ([Edit]: Is this an XP system setting?).
I think now that there would have been some Windows Error Report files (WER*) available until I clicked "Don't Send" sad.
The WCG error logs say "exit code 1282 (0x502)".
([Edit]: The initial time spent "running" may have been while Windows was dumping the crashed process memory image to disc.)

More info for each machine:
--------------------------------
Device RJB-Q9650A (XP-64): CHARMM crash popup windows probably did not occur.
The error logs all show Error 0x1d, plus 12 lines of diagnostic parameters:
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
forrtl: severe (29): file not found, unit 30, file D:\BOINC_Data\slots\4\fort.30
Image PC Routine Line Source
wcg_dddt2_charmm_ 00B15D6E Unknown Unknown Unknown
wcg_dddt2_charmm_ 00B13028 Unknown Unknown Unknown
wcg_dddt2_charmm_ 00ABCFB2 Unknown Unknown Unknown
wcg_dddt2_charmm_ 00ABCBCF Unknown Unknown Unknown
wcg_dddt2_charmm_ 00AAE8E1 Unknown Unknown Unknown
wcg_dddt2_charmm_ 008EE66E Unknown Unknown Unknown
wcg_dddt2_charmm_ 0056892F Unknown Unknown Unknown
wcg_dddt2_charmm_ 004469FF Unknown Unknown Unknown
wcg_dddt2_charmm_ 00445640 Unknown Unknown Unknown
wcg_dddt2_charmm_ 0042D6E2 Unknown Unknown Unknown
wcg_dddt2_charmm_ 00B03052 Unknown Unknown Unknown
kernel32.dll 7D4E7D42 Unknown Unknown Unknown
-------------
Device rjb-q9650c (XP-64): Same error (0x1d) as for RJB-Q9650A,
and same diagnostics in the WCG error log.
I think that rjb-q9650c threw popup windows, but there was no invitation to send error reports to MS. (Unsure of this).
-------------
Device Rjb-q9650b (XP-32): Error log contained only the header, footer, and error code line
>> - exit code 1282 (0x502)
There may have been a popup window due to CHARMM crashing, but without an invitation to send an error report to Microsoft. A screen dump of such a popup is at http://i293.photobucket.com/albums/mm57/BlindFreddie/CHARMMcrash1.gif
-----------
Device Rjb-q9650d (XP-32):
The same short error logs as for Rjb-q9650b, with exit code 1282 (0x502).
This one definitely threw popup windows, and a screen dump is at http://i293.photobucket.com/albums/mm57/BlindFreddie/CHARMMcrash4.gif
-------------
The dicussion of this batch of error WUs is now split between this thread and thread It's raining Dengue, Hallelujah!. Check there before posting here.
----------------------------------------
[Edit 2 times, last edit by Rickjb at Apr 11, 2010 5:17:43 AM]
[Apr 10, 2010 6:09:03 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

Here is another candidate for server abortion:

ts01_ a189_ pe0000_ 6-- - In Progress 10.04.10 13:09:46 13.04.10 08:21:46 0.00 0.0 / 0.0
ts01_ a189_ pe0000_ 5-- 617 Error 10.04.10 12:15:04 10.04.10 17:08:02 0.63 9.6 / 0.0
ts01_ a189_ pe0000_ 4-- 617 Error 10.04.10 08:32:22 10.04.10 13:09:44 1.53 9.6 / 0.0
ts01_ a189_ pe0000_ 3-- 617 Error 10.04.10 07:38:21 10.04.10 12:14:58 0.54 6.2 / 0.0
ts01_ a189_ pe0000_ 2-- 617 Error 10.04.10 05:49:30 10.04.10 07:38:18 0.39 5.8 / 0.0
ts01_ a189_ pe0000_ 1-- 617 Error 09.04.10 05:58:02 10.04.10 08:32:20 0.32 6.7 / 0.0
ts01_ a189_ pe0000_ 0-- 617 Error 09.04.10 05:58:01 10.04.10 05:49:28 0.48 6.6 / 0.0
[Apr 10, 2010 6:39:47 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

Only have 1 Wu that errored out: ts01_ b028_ se0000_ 2--

The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
ENERGY CHANGE TOLERANCE EXCEEDED
Encountered error. Exiting.
[Apr 10, 2010 6:51:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

I'm getting some pr retreads. Hopefully they will be server-aborted before I get to them, but I will not interfere with them otherwise.
----------------------------------------

[Apr 10, 2010 7:11:30 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

I posted here on the 10 or so that failed for me on all OS platforms.

Sorry Brink, I didn't see your post earlier. Maybe one of the CA's can combine things to a single C-Type problem thread.
[Apr 10, 2010 7:26:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Somervillejudson@netscape.net
Veteran Cruncher
USA
Joined: May 16, 2008
Post Count: 1065
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

yes 3 errors on latest batch. Hopefully that is all!
[Apr 10, 2010 7:34:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Trotador
Senior Cruncher
Joined: Mar 26, 2009
Post Count: 154
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

Same here, all -pr errored just at the starting second either in XP32 or Ubuntu 64.

Edit total of 21 WUs
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by Trotador at Apr 10, 2010 7:48:15 PM]
[Apr 10, 2010 7:44:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
wplachy
Senior Cruncher
Joined: Sep 4, 2007
Post Count: 423
Status: Offline
Reply to this Post  Reply with Quote 
Re: DDDT2 Type C -- Possible bad batches

11 20 so far.

The first 11 with the exception of the \slots\x directory all my results look the same. The next 9 are a different error code/results.

Wingman and repair WUs either In Progress or error with most of the errors the same as mine.

All errors occur at start and report 0.0 CPU hrs.

10 Vista 64 & 1 Win 7

Result Name: ts06_ a010_ pr56a0_ 1--
Result Name: ts06_ b003_ pr56a1_ 0--
Result Name: ts06_ b004_ pr23b1_ 0--
Result Name: ts06_ b005_ pr02b0_ 0--
Result Name: ts06_ b005_ pr91a1_ 1--
Result Name: ts06_ b009_ pr56a0_ 1--
Result Name: ts06_ b009_ pr56a1_ 1--
Result Name: ts06_ b010_ pr23a0_ 1--
Result Name: ts06_ b012_ pr34b1_ 0--
Result Name: ts06_ b012_ pr67b1_ 1--
Result Name: ts06_ b013_ pr89a1_ 1--

<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
forrtl: severe (29): file not found, unit 30, file C:\WCG\BOINC Data\slots\5\fort.30
Image PC Routine Line Source
wcg_dddt2_charmm_ 00B15D6E Unknown Unknown Unknown

Stack trace terminated abnormally.

</stderr_txt>

9 - XP32
Result Name: ts06_ b008_ pr67a0_ 0--
Result Name: ts06_ b008_ pr56b0_ 1--
Result Name: ts06_ b008_ pr56b1_ 1--
Result Name: ts06_ b003_ pr89b0_ 0--
Result Name: ts06_ b003_ pr78a0_ 0--
Result Name: ts06_ b003_ pr02a1_ 1--
Result Name: ts06_ b002_ pr34b1_ 0--
Result Name: ts06_ b002_ pr34a1_ 0--
Result Name: ts06_ b002_ pr23a1_ 1--

<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
- exit code 1282 (0x502)
</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.

</stderr_txt>

Edit: added 9 new WUs
----------------------------------------
Bill P

----------------------------------------
[Edit 1 times, last edit by wplachy at Apr 11, 2010 12:34:04 AM]
[Apr 10, 2010 9:05:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 38   Pages: 4   [ 1 2 3 4 | Next Page ]
[ Jump to Last Post ]
Post new Thread