- From: Deirdre Lee <deirdre@derilinx.com>
- Date: Fri, 27 Mar 2015 15:36:45 +0000
- To: public-dwbp-wg@w3.org
- Message-ID: <5515790D.7080203@derilinx.com>
Hi Steve, This is a good example of current practice. But current practice doesn't necessarily make best practice. While Chris could scrape the 1000 files, how much effort was involved? I'm sure many of these files were different and required building custom scrapers. Would it have been easier for him to reuse the data if it were in excel? Would it have been even easier if it were all in csv? Would it have been even even easier if the data was described in a standard way? The scope of the BP document includes BPs that: * are specifically relevant to data published on the Web; * *encourage publication or re-use of data on the Web;* * can be tested by machines, humans or a combination of the two Chris might go to the effort to scrape 1000 pdf files, but would everyone? By promoting data publication in formats such as csv, json, rdf and via apis, are we not encouraging wider potential re-use of data on the web? Maybe Makx is right and we should be clearer with our objectives, e.g. in the scope, 're-use' is defined as data processing / analysis / integration, etc. Cheers, Deirdre On 27/03/2015 14:09, Steven Adler wrote: > > You may recall I submitted a use case about this example from NYC last > year. The developer, Chris Wong, who works for Socrata, wrote a Ruby > routine to scrape 1000 PDF files for property tax data to fill out > this map app: > > http://www.w3.org/2013/dwbp/track/issues/56 > > Chris is a self-taught developer, by no means a pro. I think this > story well demonstrates that Data on the Web today is quite innovative > and PDF, JPG, AVI, MP3, and MP4 are commonly machine readable. > > Restricting our recommendations to file formats that conform only > those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the reality of > how Open Data is published and used. > > > Best Regards, > > Steve > > Motto: "Do First, Think, Do it Again" > -- -------------------------------------- Deirdre Lee, Director Derilinx - Linked & Open Data Solutions Web: www.derilinx.com Email: deirdre@derilinx.com Tel: +353 (0)1 254 4316 Mob: +353 (0)87 417 2318 Linkedin: ie.linkedin.com/in/leedeirdre/ Twitter: @deirdrelee
Received on Friday, 27 March 2015 15:37:20 UTC