Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Tasks are not checkpointing proporly |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 54
|
Author |
|
I need a bath
Senior Cruncher USA Joined: Apr 12, 2007 Post Count: 347 Status: Offline Project Badges: |
Is there any chance that some change that was made in the Windows beta version of CEP2 was also ported to the Linux version? It seems as if a lot of us who've been crunching CEP2 for quite a while on Linux are now experiencing problems that we didn't used to have. Hmmm I have 1 year 132 days on this project with a computer that has the same exact setup. Only NOW do I have issues. NOTHING has changed except possilbly Ubuntu update. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Is there any chance that some change that was made in the Windows beta version of CEP2 was also ported to the Linux version? It seems as if a lot of us who've been crunching CEP2 for quite a while on Linux are now experiencing problems that we didn't used to have. That I think would require a version control failure... how likely is that? Nope, it's still the same 6.19 for Linux. I've stuck to the 2.6.32 kernel for Linux, which is 10.04.1 LTS (Long Term Support). The ultra minor update is now on something like 2.6.32.25. CEP2 keep on churning fine, max 2 of 4 cores, with HPF2, HCMD2, C4CW on the side. Flip flopping the device profile once a day incrementing the cache slowly to get the right mix, then set cache back to 1 day. It's reasonable handleable as the run times are now in general 8.5+ hours (still 30-40 minutes overhead per CEP2 task). Don't need to buffer too many to get through a couple of days and since my client is set to jump the repairs ahead on top, seem to be staying with average return times below 2 days... to get more repair work. Have the impression there are not very many R++ devices on Linux, as the ratio is < 4 day deadlined tasks is nearing 20%, a mix of selected sciences, but mostly CEP2, which then run fine...old Q6600, stock speed. edit: both the currently running CEP2 jobs are repairs ;O)
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
NightBlade
Advanced Cruncher Joined: Jun 10, 2008 Post Count: 89 Status: Offline Project Badges: |
What are repair WUs?
---------------------------------------- |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2955 Status: Offline Project Badges: |
Repair WU's are simply replacement WU's that are sent out to a known reliable host (see the FAQ's for details), to replace WU's that either haven't been returned in time, errored out/aborted, or are verification WU's.
---------------------------------------- |
||
|
I need a bath
Senior Cruncher USA Joined: Apr 12, 2007 Post Count: 347 Status: Offline Project Badges: |
do the techs know about this problem? How do I find a log of what's going on to send to them?
---------------------------------------- |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The techs look at
----------------------------------------Maybe entirely unrelated saw a bug report on NVidia driver 260.19.06 and all GPU tasks failing. My host got them, but all GPU crunching functions are disabled in the the cc_config.xml
WCG Global & Research > Make Proposal Help: Start Here!
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Oct 5, 2010 5:35:12 PM] |
||
|
I need a bath
Senior Cruncher USA Joined: Apr 12, 2007 Post Count: 347 Status: Offline Project Badges: |
what should stand out is that a lot of jobs are reporting finished in about the third of the cpu time that they should be. I think maybe the checkpoints are ok but for some reason the cpu time for some of the intervals is reset back to the last checkpoint. It can't be that the project keeps starting over or it would never finish. There is just a huge (and I mean hours and hours) of time differential between elapsed and reported cpu time. If folks dont mind, or don't notice that they are only getting credit for 3-4 hours for jobs that ran over 8 hours, this might not be noticed. But the folks who noticed are complaining. I don't know what is causing it. I didn't change boinc versions or anything.
---------------------------------------- |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Post a full Result log of such a result with the relative CPU and Elapsed times annotated. If time gets lost, something that was reported early in the project, techs aware, then the actual CPU times as best monitored from BOINCTasks or an old BOINC Manager, 6.2.28 or earlier, should display skipping back and % progress retreat, which checkpoint_debug logging switched on.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
I need a bath
Senior Cruncher USA Joined: Apr 12, 2007 Post Count: 347 Status: Offline Project Badges: |
ok so where do I find the Result Log? and also how do I switch on checkpoint_debug logging?
----------------------------------------Thanks |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Result logs you find by clicking on the links in the Status column of the Result Status page.
----------------------------------------<checkpoint_debug> logflag has to be added to the cc_config.xml file. By default that file already exists on Linux in the /var/lib/boinc-client directory for Lucid. Don't know where is goes for other distros. How to FAQ: http://boinc.berkeley.edu/wiki/Cc_config.xml
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All! |
||
|
|