Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 89
Posts: 89   Pages: 9   [ 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 35223 times and has 88 replies Next Thread
knreed
Former World Community Grid Tech
Joined: Nov 8, 2004
Post Count: 4504
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Screen Scrapers - Please Discuss

We rolled out the first change of the changes to our website that we are going to be frequently doing over the next 3-6 months. The HTML structure is going to be changing a fair amount as we do this rework and screen scraping will not be a reliable way to access data on an ongoing basis during this work stream. I'd like to hear from those people who are doing screen scraping and let us know what you are doing and what data you are going after and we can see what we can do to help you let your tools remain stable during these changes.
[Nov 8, 2013 10:56:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Not directly related to this website change since it happens periodically and fixes itself as mysteriously as it happens:

It would be fine if pages could pick the correct format for formatting numbers from the system or whatever else:
- currently for French users the format is wrong: xxx.xxx.xxx,xx
- the correct format for France is: xxx xxx xxx,xx

The wrong format makes scraping numbers from stats pages to spreadsheets to appear as character strings and to need manual corrections.
Not a big deal for me since I cut/paste only a small number of fields every day, but still.

If it can help you trace the problem, the latest change to the current wrong format occurred about one or two weeks ago.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Nov 9, 2013 12:34:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
widdershins
Veteran Cruncher
Scotland
Joined: Apr 30, 2007
Post Count: 673
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Perhaps it would have been better to have shut this stable door before the horse had bolted? i.e. discuss how the data can be delivered by alternate means before work started and stats gathering packages stopped working.
[Nov 9, 2013 1:32:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

We rolled out the first change of the changes to our website that we are going to be frequently doing over the next 3-6 months. The HTML structure is going to be changing a fair amount as we do this rework and screen scraping will not be a reliable way to access data on an ongoing basis during this work stream. I'd like to hear from those people who are doing screen scraping and let us know what you are doing and what data you are going after and we can see what we can do to help you let your tools remain stable during these changes.


There are lot of coments, from other people, abourt many changes. Will contribute with a small one, but important fro me (of course !).

I noted in the initial page (old MY GRID page, now MY Statitics page...) that is difficult to acess the device manager and the device profiles, once there isn't more a link to do it...

I think that it's a very helpful link for all crunchers and need be there.

Now to acess it we need walk many steps...

Think about please. wink
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Nov 9, 2013 8:41:20 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Meanwhile you can bookmark the link to go straight to the Device Manager. smile
Anyway these topics belong to these threads (for example) and not to the present one:
Website redesign
New Design - Observations and Notes
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at Nov 9, 2013 9:51:39 AM]
[Nov 9, 2013 9:47:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

It would be fine if pages could pick the correct format for formatting numbers from the system or whatever else

OK, I have found the trick, and a mistake in the language selection of the stats pages of the new website format.

The trick is to append &language=fr_FR to the links of the pages I am using. smile

The mistake is that this parameter is misspelled &langauge when you use the language selection in the Stats section. sad

PS1: Of course replace the & with a ? if &language is the first (or only) parameter of the link.

PS2: In fr_FR fr is the language and FR is the country for the formatting rules. For example Quebec users would use fr_CA.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at Nov 9, 2013 10:38:55 AM]
[Nov 9, 2013 10:10:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4835
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

I'd like to hear from those people who are doing screen scraping and let us know what you are doing and what data you are going after....

I only scrape off of the My Statistics page. The only change I have had to make to my spreadheet was caused by the usual rearranging of Statistics by Project that happens when a project ends or a new one starts. It would be better for me if the order of projects did not change. However it only takes a few minutes to reformat things to make it right again.
----------------------------------------

[Nov 9, 2013 1:11:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

I have three principal instances of screen scrapping:

My Statistics and My Team to capture daily Project Stats for myself and the UK Team ( Project order is not a concern).

Capture of All Time Stats and Last Result Returned for individual Team Members via Multiple Member Comparison .

I also occasionally use a screen scrape to capture data for members identified as having a Great Britain location in Statistics by Geography . (Data by country does not appear to be available in XML format.)

Is it anticpated that the data currently available in XML format will be affected by the ongoing website redesign?
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Nov 10, 2013 11:33:18 AM]   Link   Report threatening or abusive post: please login first  Go to top 
OldChap
Veteran Cruncher
UK
Joined: Jun 5, 2009
Post Count: 978
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Screen scraping? What exactly is that?

I regularly pull results from results status into a spreadsheet in order to see a machines average ppd and therefore daily expectation as this no longer seems to settle but fluctuates wildly even on a single project.

It would be so much better for me if I could choose the number of results displayed in a similar manner to that found on my statistics member statistics history.
----------------------------------------

[Nov 10, 2013 12:02:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Screen Scrapers - Please Discuss

Screen scraping? What exactly is that?


Everybody probably has a slightly different definition of screen scraping and how they implement it. This seems a particuarly apt definition:

Parsing the HTML in generated web pages with programs designed to mine out particular patterns of content. In either guise screen-scraping is an ugly, ad-hoc, last-resort technique that is very likely to break on even minor changes to the format of the data

A lot of the WCG data is available in XML format which is easier to handle and (hopefully) resilient to changes in website design. So for example if you are interested in the AllTime Runtime stats of XtremeSystems team members shown at http://www.worldcommunitygrid.org/team/viewTe...&numRecordsPerPage=10

adding "&xml=true" will provide the same data in XML format which is easily imported into a spreadsheet (in Excel using "from Web" on the Data tab).
http://www.worldcommunitygrid.org/team/viewTe...dsPerPage=10&xml=true

Unfortunately in your example Results Status is not available in XML format. I would have suggested that you use pirogue's utility programme WCGDAWS (World Community Grid Device and Workunit Stats) see thread but it's broken until updated for Fridays's changes.
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Nov 10, 2013 12:44:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 89   Pages: 9   [ 1 2 3 4 5 6 7 8 9 | Next Page ]
[ Jump to Last Post ]
Post new Thread