Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 98
Posts: 98   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 87787 times and has 97 replies Next Thread
Cyclops
Senior Cruncher
Joined: Jun 13, 2022
Post Count: 295
Status: Offline
Reply to this Post  Reply with Quote 
2022-08-26 Update (increased WU output & WCG backend changes)

Dear volunteers,

We have taken additional measures to increase the quantity of WUs we can send out, and we have been able to increase the quantity of WUs in flight at any given time. Volunteers should see this reflected on their devices now, and perhaps even over this past week.

We are also relieved to share that the hosting data centre has assigned additional personnel on site to resolve our networking issues, meaning a fix is imminent. We will share with you any further updates we receive from the data centre. The network fix will allow us to bring our remaining servers online, stabilizing and further increasing the WU supply.

Thus, until we are able to deploy all dedicated servers, we must continuously adjust and monitor tasks scheduled in Aurora/Mesos to keep the tasks balanced and the workunits flowing, and so far this process is unduly intensive and sporadic. For example, a recurring job may saturate the scheduler by creating a large number of downstream jobs. This flood of new jobs might then throttle the processing rate of other waiting jobs and thereby interrupt the supply of work. To fix the problem, we would need to temporarily deschedule the parent job, decrease its frequency, or decrease the priority of its children in such a way that does not starve other stages of the pipeline.

Last week, we mentioned that we have begun to investigate concerns over statistics, credit, streaks, and database dumps raised by volunteers. We will have an update on some of these issues next week. We also plan to release a more structured breakdown from the tech team similar to a CHANGELOG starting next week or the week after so that we can increase the frequency and clarity of updates.

Future Plans for Aurora/Mesos Replacement by SLURM at the WCG
With the above in mind, although we should be able to immediately deploy additional server resources for Aurora/Mesos job scheduling once networking issues are resolved, our team has greater familiarity and experience with the SLURM scheduler, an alternative to Aurora/Mesos. SLURM is a mature technology currently in use at many of the world’s foremost supercomputing centres, and we intend a full transition to SLURM soon after WCG full restart.

Pending some investigation, we may also look to expand our message-passing layer and implement a publisher/subscriber model and some notion of back-pressure to dictate the chain of downloading data from researchers and creating workunits with which to stock the feeder. From what we have observed, we can expect the move to SLURM will distribute our internal server resources more efficiently than Aurora/Mesos currently does, while losing no functionality. This should be relatively straightforward to port since it overlaps with the existing skill-set of the team.

However, this work is not a higher priority than addressing long-standing concerns of volunteers, which we are finally carving out the bandwidth to address.

Thanks for your patience and have a great weekend!
-WCG Tech Team
[Aug 27, 2022 2:45:58 AM]   Link   Report threatening or abusive post: please login first  Go to top 
dough boy
Cruncher
Joined: May 22, 2012
Post Count: 8
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

I have been able to download several days worth of data for the last almost week.
[Aug 27, 2022 2:54:59 AM]   Link   Report threatening or abusive post: please login first  Go to top 
bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 275
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thank you for the detailed status report.

Many volunteers, myself included, will find the additional details of what has been done, what will be done and the MORE DETAILED timeline that is provided by this update refreshing and welcomed. As well as the CHANGELOG info you will be implementing.

Thanks to the WCG Tech Team in contributing to this update! I have commented elsewhere on the WUs provided and the ongoing but lessening http errors.

Thanks again,
Bruce
[Aug 27, 2022 4:17:56 AM]   Link   Report threatening or abusive post: please login first  Go to top 
danwat1234
Cruncher
Joined: Apr 18, 2020
Post Count: 22
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thank you for the update! All of my machines seem to have been getting regular wcg work this past week or so. I'm surprised how smooth it was once the day or two server hiccup was through. Keep it up!
----------------------------------------
[Edit 1 times, last edit by danwat1234 at Aug 27, 2022 4:23:57 AM]
[Aug 27, 2022 4:23:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Foxus
Cruncher
Joined: Oct 22, 2008
Post Count: 2
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thanks for the Update and confirmed I get some WU - nice that the new environment gets to life.

Great Job of your Team - we cross our thumbs that the remaining problems will soon be vanished and you get some sleep after all these impressions last weeks.

Good luck and may a light shine on all your ways.
[Aug 27, 2022 7:02:15 AM]   Link   Report threatening or abusive post: please login first  Go to top 
phillipspencer
Advanced Cruncher
France
Joined: Apr 9, 2015
Post Count: 71
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Appreciate the detailed update and the indication of future priorities. Good to understand your allocation of resources too.
[Aug 27, 2022 8:39:47 AM]   Link   Report threatening or abusive post: please login first  Go to top 
mdparkhill
Advanced Cruncher
Joined: May 2, 2007
Post Count: 60
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Good news to hear that someone is finally taking the networking issue seriously. It was nice to learn more about the backend for the scheduling. I had not even considered that as issue. I do mainframes and some times out schedulers go ape and it's all my fault even when it's user error. Thanks again for the updates and i just got 60-60 tasks, down loads still a little slow but it appears to working better(I hope crossed fingers).
----------------------------------------

[Aug 27, 2022 11:00:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
nivrip
Senior Cruncher
North Yorkshire
Joined: Sep 13, 2007
Post Count: 258
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Thanks for the info. Getting plenty of WUs now but still occasional hiccups with some of them stuck in Transfers. Using the Retry button always does the trick over a minute or two.
----------------------------------------
ЮРКШИР КРУНЧЕР
[Aug 27, 2022 11:49:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
spRocket
Senior Cruncher
Joined: Mar 25, 2020
Post Count: 234
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

Still having to babysit my zoo, but the work is coming in steadily. Thanks for the update!
[Aug 27, 2022 2:17:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 743
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: 2022-08-26 Update (increased WU output & WCG backend changes)

love the detailed update. I'm getting WUs with some minor hiccups, but it is great to know my machine is useful again.
[Aug 27, 2022 2:22:22 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 98   Pages: 10   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread