W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2015

Re: NY Property Tax Explorer

From: Deirdre Lee <deirdre@derilinx.com>
Date: Fri, 27 Mar 2015 15:36:45 +0000
Message-ID: <5515790D.7080203@derilinx.com>
To: public-dwbp-wg@w3.org
Hi Steve,

This is a good example of current practice. But current practice doesn't 
necessarily make best practice. While Chris could scrape the 1000 files, 
how much effort was involved? I'm sure many of these files were 
different and required building custom scrapers.

Would it have been easier for him to reuse the data if it were in excel? 
Would it have been even easier if it were all in csv? Would it have been 
even even easier if the data was described in a standard way?

The scope of the BP document includes BPs that:

  * are specifically relevant to data published on the Web;
  * *encourage publication or re-use of data on the Web;*
  * can be tested by machines, humans or a combination of the two

Chris might go to the effort to scrape 1000 pdf files, but would 
everyone? By promoting data publication in formats such as csv, json, 
rdf and via apis, are we not encouraging wider potential re-use of data 
on the web?

Maybe Makx is right and we should be clearer with our objectives, e.g. 
in the scope, 're-use' is defined as data processing / analysis / 
integration, etc.

Cheers,
Deirdre



On 27/03/2015 14:09, Steven Adler wrote:
>
> You may recall I submitted a use case about this example from NYC last 
> year.  The developer, Chris Wong, who works for Socrata, wrote a Ruby 
> routine to scrape 1000 PDF files for property tax data to fill out 
> this map app:
>
> http://www.w3.org/2013/dwbp/track/issues/56
>
> Chris is a self-taught developer, by no means a pro.  I think this 
> story well demonstrates that Data on the Web today is quite innovative 
> and PDF, JPG, AVI, MP3, and MP4 are commonly machine readable.
>
> Restricting our recommendations to file formats that conform only 
> those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the reality of 
> how Open Data is published and used.
>
>
> Best Regards,
>
> Steve
>
> Motto: "Do First, Think, Do it Again"
>

-- 
--------------------------------------
Deirdre Lee, Director
Derilinx - Linked & Open Data Solutions
  
Web:      www.derilinx.com
Email:    deirdre@derilinx.com
Tel:      +353 (0)1 254 4316
Mob:      +353 (0)87 417 2318
Linkedin: ie.linkedin.com/in/leedeirdre/
Twitter:  @deirdrelee
Received on Friday, 27 March 2015 15:37:20 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 March 2015 15:37:21 UTC