Re: NY Property Tax Explorer

yes, the world is messy.  It will never be clean.


Best Regards,

Steve

Motto: "Do First, Think, Do it Again"


|------------>
| From:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Phil Archer <phila@w3.org>                                                                                                                        |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| To:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Steven Adler/Somers/IBM@IBMUS                                                                                                                     |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Cc:        |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |DWBP WG <public-dwbp-wg@w3.org>                                                                                                                   |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Date:      |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |03/27/2015 11:47 AM                                                                                                                               |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
|------------>
| Subject:   |
|------------>
  >--------------------------------------------------------------------------------------------------------------------------------------------------|
  |Re: NY Property Tax Explorer                                                                                                                      |
  >--------------------------------------------------------------------------------------------------------------------------------------------------|





On 27/03/2015 14:41, Steven Adler wrote:
>
> Bart,
>
> A PDF might not conform to your definition of a best practice,

It does not.

  but NYC is
> publishing tens of thousands of PDF's that describe property taxes,
> hospitals, crime reports, and housing inspections.

All of which were derived from actual data somewhere, data that may have
been buried and obfuscated along the way. The NYC tax office does not
use PDFs to do their calculations, the NYPD doesn't use it to record
their crime stats etc.

We want to encourage NYC to publish that stuff as close to the original
format as they can, preferably as part of business as usual, part of the
everyday workflow.

>
> My point is that if we restrict our recommendations of best practices to
> only conform to what we define as the best file types, we are
deliberately
> limiting the relevance of our work in the real world.

PDF, JPGs or whatever, are one star data (assuming it's openly
licensed). Yes it's there. Yes, you can access it - but you have to work
hard to do so, essentially reverse-engineering the document to get what
you want out of it. Structured data like spreadsheets are better,
non-proprietary and structured data, like CSV, is better still (because
you're not locked into a vendor's tools).

Publishing data in PDF is not best practice. It's lazy practice. It's "I
really don't care about this but my boss says I have to do it" practice,
or it's "I can't be bothered" practice, or it's "if I do this will they
leave me alone?" practice, it's "how can I present the story I want to
tell" practice. It's better than not doing it at all but any document
that calls itself a Best Practice doc won't be taken seriously if it
encourages data publication in PDF, or videos, or images of graphs.
That's what you get *after* you've done something with the data so that
humans can understand it.

As discussed in the 5 star thread, I don't think we should push everyone
into publishing 5 star Linked Data, but I *do* think we should encourage
people to publish data that can easily be transformed into it, or any
other format. And, again, PDF fails that test. CSV+ (i.e. the output of
the CSV on the Web WG) is an example that passes it.

Taking Makx's words:

"... if you want to do A, then if you publish data as X you will have
the following advantages and disadvantages, and you should really
consider format Y to increase usefulness of your data."

If you want to present a report, PDF is fine since you're publishing
information for a human to read and understand. The disadvantage of PDF
is that it is more difficult to extract data from it. HTML is better.

But, what you should consider doing is publishing your report in PDF,
OK, but also publishing the underlying data in CSV (plus metadata) so
other people can manipulate the data for themselves.

Phil.

>
>
> |------------>
> | From:      |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>    |Bart van Leeuwen <bart_van_leeuwen@netage.nl>
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | To:        |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>    |Steven Adler/Somers/IBM@IBMUS
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Cc:        |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>    |"DWBP WG" <public-dwbp-wg@w3.org>
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Date:      |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>    |03/27/2015 10:35 AM
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

> |------------>
> | Subject:   |
> |------------>
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>    |Re: NY Property Tax Explorer
|
>
>--------------------------------------------------------------------------------------------------------------------------------------------------|

>
>
>
>
>
> I think we try to assemble a 'best practice' with this working group.
> I sincerely hope you don't consider data published in a PDF to conform to
> this best practice.
>
> I'm not arguing that it is possible to get usable data from these
formats,
> but they were not intended to carry data in a machine readable way.
>
> Bart
>
> Steven Adler <adler1@us.ibm.com> wrote on 27-03-2015 15:09:32:
>
>> From: Steven Adler <adler1@us.ibm.com>
>> To: "DWBP WG" <public-dwbp-wg@w3.org>
>> Date: 27-03-2015 15:10
>> Subject: NY Property Tax Explorer
>>
>> You may recall I submitted a use case about this example from NYC
>> last year.  The developer, Chris Wong, who works for Socrata, wrote
>> a Ruby routine to scrape 1000 PDF files for property tax data to
>> fill out this map app:
>>
>> http://www.w3.org/2013/dwbp/track/issues/56
>>
>> Chris is a self-taught developer, by no means a pro.  I think this
>> story well demonstrates that Data on the Web today is quite
>> innovative and PDF, JPG, AVI, MP3, and MP4 are commonly machine
readable.
>
>>
>> Restricting our recommendations to file formats that conform only
>> those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the reality
>> of how Open Data is published and used.
>>
>>
>> Best Regards,
>>
>> Steve
>>
>> Motto: "Do First, Think, Do it Again"
>

--


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Friday, 27 March 2015 16:05:52 UTC