Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 86
Posts: 86   Pages: 9   [ 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 54080 times and has 85 replies Next Thread
shanen0
Cruncher
Joined: Feb 4, 2021
Post Count: 20
Status: Offline
Reply to this Post  Reply with Quote 
Neverending MCM tasks

Have three of them now. Normal working time is around 4 hours, but two of them are close to three days and one day past their deadline, and the third is over one day already. Eventually they apparently do get killed off, but I think no credit is granted and it isn't my fault that the code is buggy. I noticed another a few days ago. So far it's only been on my largest machine, but I don't watch the others as closely.

Anyone else seeing this sort of thing.
[Sep 5, 2021 11:48:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7242
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

I am going to hazard a guess you are running some version of Windows. I have seen this before and the easiest thing to do to correct the problem (if you notice it) is to reboot. Then the work units should come to a normal conclusion. If that does not solve the problem (when it occurs) please post some log entries and perhaps there will be a clue in there. I have not seen nor heard of this problem on Linux. If anyone has, please post and let us know of a proposed solutions. I don't think it is faulty code from MCM, but faulty memory management from the OS.
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Sep 6, 2021 1:26:35 AM]
[Sep 6, 2021 1:24:32 AM]   Link   Report threatening or abusive post: please login first  Go to top 
shanen0
Cruncher
Joined: Feb 4, 2021
Post Count: 20
Status: Offline
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

Yes, that was from a Windows machine and perhaps I should have noted that I am aware that rebooting often fixes the hung-task problem. But that's a machine that I prefer to avoid rebooting. All of my machines have primary uses and WCG runs on unused cycles. (Kind of reminds me of how IBM managed WCG, actually.)

My general concern is that buggy software produces unreliable results. Bugs include tasks that hang or that fail to checkpoint. Perhaps the new "management" will do a better job. I also hope they will stop with the short-deadline tasks. Annoying and I generally nuke them on sight, even on the machines that will probably be able to complete the tasks within their short deadlines.

Never understood the point of the deadlines except to waste donated cycles when some machines can't achieve some arbitrary deadline. If the deadline encourages people to run machines they otherwise wouldn't run, then I tend to see the deadlines as counterproductive as well as wasteful.
[Sep 21, 2021 11:06:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7242
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

Just curious - How often do you get a hung MCM task ? Do you have hung tasks on any other projects ? Do you ever have any other software which hangs your system or is just a BOINC problem ?
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Sep 22, 2021 1:07:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ca05065
Senior Cruncher
Joined: Dec 4, 2007
Post Count: 325
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

If the problem is with one task an alternative to try is:
set 'leave application in memory' to off
suspend the task
wait a couple of minutes to ensure it is removed from memory
resume task
set 'leave application in memory' to on
It should then resume from the last saved checkpoint
[Sep 22, 2021 7:42:09 AM]   Link   Report threatening or abusive post: please login first  Go to top 
shanen0
Cruncher
Joined: Feb 4, 2021
Post Count: 20
Status: Offline
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

Not monitoring WCG that closely these days, but I've only noticed those stuck tasks on my main machine, which rarely gets rebooted. Not so much RAM that I want to encourage any apps to remain in RAM, though I haven't noticed any problems that seem symptomatic of memory problems.

Basically I just want WCG to run with fewer problems. The most common reason I move from one project to another is because of persistent intrusions.

However, remembering back to my days with researchers, buggy software reduces confidence in the results.
[Sep 23, 2021 10:55:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sam6861
Advanced Cruncher
Joined: Mar 31, 2020
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

MCM1 tasks works fine for me on my amazing 106 days uptime on Windows 10.
For those with hung MCM1 tasks and the other computer wingman completed the same work unit, I guess this is probably a hardware problem of RAM, CPU, Motherboard, or something.

On non-ECC, I have seen some hung MCM1 tasks, invalid ARP1, MIP1 random computation errors, computer BSOD / crashes, freezes. The worst was file system corruption and can no longer start Windows 10. I switched to ECC unbuffered DIMM (UDIMM) for most of my computers with supported CPU and Motherboard. Works fine with nice uptime.

I have never seen any hung MCM1 tasks with ECC RAM, but ECC RAM can fail different from non-ECC. I have had a failing ECC DDR4 in which Windows log show hundreds of WHEA corrected errors and have frequent reset and reboot. Just sitting idling on Windows 10 at less then 1% CPU usage on 1 faulty ECC, got me 6 reboots in 1 hour. Removed faulty memory and after all this, no more random reboots, no invalids, no computation errors, no file system corruptions.

My computers: 107 days ago was a power outage in my area.
Ryzen 3900x, Asus B550-E, Win10, 32GB (2x16) DDR4-3200 ECC, Uptime 2 days (changed ECC RAM)
Ryzen 2700x, Asus Prime B350 Plus, Win10, 32GB (2x16) DDR4-3200 ECC, Uptime 106 days
AMD FX-4100, Asus M5A97 R2.0, Linux Debian, 32GB (4x8) DDR3-1600 ECC, Uptime 106 days, edac-util 1 corrected
intel Atom N270, HP Mini 100-1000, Linux Debian 32bit, 2GB DDR2 no-ecc, Uptime 42 days
intel i7-2600, Asus P8H77-m, Linux Debian, 16GB (2x8) DDR3-1333 no-ecc, Uptime 3 days
- Unused: Ryzen 2400g not-pro don't support ECC. MSI Tomahawk B450 don't support ECC.
----------------------------------------
[Edit 3 times, last edit by sam6861 at Sep 25, 2021 12:08:41 AM]
[Sep 24, 2021 11:59:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rcthardcore
Cruncher
United States
Joined: Jan 29, 2009
Post Count: 13
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

I have verified that it IS faulty MCM code. No other stuck/forever running tasks on any of my other BOINC projects. It ONLY happens on MCM.
----------------------------------------
AMD Ryzen 9 5950x
NVIDIA RTX 3090 FE
128 GB DDR4-3200
Windows 10 64-bit 21H1
[Sep 30, 2021 8:16:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7242
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

I have verified that it IS faulty MCM code. No other stuck/forever running tasks on any of my other BOINC projects. It ONLY happens on MCM.

If you have verified the code used for MCM is faulty, did you indeed look at the code and find the offending bug? If you have found it, did you notify the project you found a bug ?
I have run over 250,000 MCM units on both Windows and Linux and have never seen a stuck unit. That doesn't mean that there is not a bug somewhere in the code, but I would speculate that the code is not the problem, but there is a hardware issue on the machine in question. In addition , if there were a large number of users seeing this problem, that would be more likely to be a code problem, but there does not seem to be a lot of complaints about this issue. In addition, if there was a software bug, why would a simple reboot fix the problem for a specific work unit ? If it were a code problem, it would probably continue to occur in the same work unit even after a reboot. Ergo, back to a probable hardware issue. Good luck.
Cheers
Edit:spelling
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
----------------------------------------
[Edit 1 times, last edit by Sgt.Joe at Sep 30, 2021 8:47:23 PM]
[Sep 30, 2021 8:46:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Grumpy Swede
Master Cruncher
Svíþjóð
Joined: Apr 10, 2020
Post Count: 1884
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Neverending MCM tasks

11,191 MCM units crunched here. Not a single stuck unit.

Correlation does not imply causation. Cum hoc ergo propter hoc
----------------------------------------

----------------------------------------
[Edit 2 times, last edit by Grumpy Swede at Sep 30, 2021 8:57:25 PM]
[Sep 30, 2021 8:55:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 86   Pages: 9   [ 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread