- From: Steven Adler <adler1@us.ibm.com>
- Date: Fri, 27 Mar 2015 12:05:07 -0400
- To: Phil Archer <phila@w3.org>
- Cc: DWBP WG <public-dwbp-wg@w3.org>
- Message-ID: <OFE9E4A02C.EAC9F2BD-ON85257E15.0057E433-85257E15.00585C45@us.ibm.com>
yes, the world is messy. It will never be clean. Best Regards, Steve Motto: "Do First, Think, Do it Again" |------------> | From: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Phil Archer <phila@w3.org> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | To: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Steven Adler/Somers/IBM@IBMUS | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Cc: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |DWBP WG <public-dwbp-wg@w3.org> | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Date: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |03/27/2015 11:47 AM | >--------------------------------------------------------------------------------------------------------------------------------------------------| |------------> | Subject: | |------------> >--------------------------------------------------------------------------------------------------------------------------------------------------| |Re: NY Property Tax Explorer | >--------------------------------------------------------------------------------------------------------------------------------------------------| On 27/03/2015 14:41, Steven Adler wrote: > > Bart, > > A PDF might not conform to your definition of a best practice, It does not. but NYC is > publishing tens of thousands of PDF's that describe property taxes, > hospitals, crime reports, and housing inspections. All of which were derived from actual data somewhere, data that may have been buried and obfuscated along the way. The NYC tax office does not use PDFs to do their calculations, the NYPD doesn't use it to record their crime stats etc. We want to encourage NYC to publish that stuff as close to the original format as they can, preferably as part of business as usual, part of the everyday workflow. > > My point is that if we restrict our recommendations of best practices to > only conform to what we define as the best file types, we are deliberately > limiting the relevance of our work in the real world. PDF, JPGs or whatever, are one star data (assuming it's openly licensed). Yes it's there. Yes, you can access it - but you have to work hard to do so, essentially reverse-engineering the document to get what you want out of it. Structured data like spreadsheets are better, non-proprietary and structured data, like CSV, is better still (because you're not locked into a vendor's tools). Publishing data in PDF is not best practice. It's lazy practice. It's "I really don't care about this but my boss says I have to do it" practice, or it's "I can't be bothered" practice, or it's "if I do this will they leave me alone?" practice, it's "how can I present the story I want to tell" practice. It's better than not doing it at all but any document that calls itself a Best Practice doc won't be taken seriously if it encourages data publication in PDF, or videos, or images of graphs. That's what you get *after* you've done something with the data so that humans can understand it. As discussed in the 5 star thread, I don't think we should push everyone into publishing 5 star Linked Data, but I *do* think we should encourage people to publish data that can easily be transformed into it, or any other format. And, again, PDF fails that test. CSV+ (i.e. the output of the CSV on the Web WG) is an example that passes it. Taking Makx's words: "... if you want to do A, then if you publish data as X you will have the following advantages and disadvantages, and you should really consider format Y to increase usefulness of your data." If you want to present a report, PDF is fine since you're publishing information for a human to read and understand. The disadvantage of PDF is that it is more difficult to extract data from it. HTML is better. But, what you should consider doing is publishing your report in PDF, OK, but also publishing the underlying data in CSV (plus metadata) so other people can manipulate the data for themselves. Phil. > > > |------------> > | From: | > |------------> > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |Bart van Leeuwen <bart_van_leeuwen@netage.nl> | > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | To: | > |------------> > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |Steven Adler/Somers/IBM@IBMUS | > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Cc: | > |------------> > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |"DWBP WG" <public-dwbp-wg@w3.org> | > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Date: | > |------------> > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |03/27/2015 10:35 AM | > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |------------> > | Subject: | > |------------> > >--------------------------------------------------------------------------------------------------------------------------------------------------| > |Re: NY Property Tax Explorer | > >--------------------------------------------------------------------------------------------------------------------------------------------------| > > > > > > I think we try to assemble a 'best practice' with this working group. > I sincerely hope you don't consider data published in a PDF to conform to > this best practice. > > I'm not arguing that it is possible to get usable data from these formats, > but they were not intended to carry data in a machine readable way. > > Bart > > Steven Adler <adler1@us.ibm.com> wrote on 27-03-2015 15:09:32: > >> From: Steven Adler <adler1@us.ibm.com> >> To: "DWBP WG" <public-dwbp-wg@w3.org> >> Date: 27-03-2015 15:10 >> Subject: NY Property Tax Explorer >> >> You may recall I submitted a use case about this example from NYC >> last year. The developer, Chris Wong, who works for Socrata, wrote >> a Ruby routine to scrape 1000 PDF files for property tax data to >> fill out this map app: >> >> http://www.w3.org/2013/dwbp/track/issues/56 >> >> Chris is a self-taught developer, by no means a pro. I think this >> story well demonstrates that Data on the Web today is quite >> innovative and PDF, JPG, AVI, MP3, and MP4 are commonly machine readable. > >> >> Restricting our recommendations to file formats that conform only >> those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the reality >> of how Open Data is published and used. >> >> >> Best Regards, >> >> Steve >> >> Motto: "Do First, Think, Do it Again" > -- Phil Archer W3C Data Activity Lead http://www.w3.org/2013/data/ http://philarcher.org +44 (0)7887 767755 @philarcher1
Attachments
- image/gif attachment: graycol.gif
- image/gif attachment: ecblank.gif
Received on Friday, 27 March 2015 16:05:52 UTC