W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2015

Re: NY Property Tax Explorer

From: Laufer <laufer@globo.com>
Date: Fri, 27 Mar 2015 15:55:31 -0300
Message-ID: <CA+pXJihj0QnnRMsvgrvV8XchYQ5cjxUgPodK7gZeiZdum4LSww@mail.gmail.com>
To: Steven Adler <adler1@us.ibm.com>
Cc: Bart van Leeuwen <bart_van_leeuwen@netage.nl>, Christophe Guéret <christophe.gueret@dans.knaw.nl>, Makx Dekkers <mail@makxdekkers.com>, DWBP WG <public-dwbp-wg@w3.org>
When I say that it is a best practice to provide metadata, I am saying that
this applies to all kind of data and formats.

We do not have any best practice saying to publish data in a specific
format, have we?

Best, Laufer

2015-03-27 15:50 GMT-03:00 Steven Adler <adler1@us.ibm.com>:

> I mean that a best practice applies even when you are doing things that
> are less than perfect.  For example:
>
> We recommend that published Open Data uses DCAT+ metadata.  This should
> apply to JSON, RDF, CSV *and* PDF, JPEG, AVI, or even to "ancient"
> Wordperfect documents from the 1980's.
>
> I would not want us to say that our best practices only apply to W3C
> blessed file types, because:
>
> 1.  It ignores the reality of the way the rest of the world publishes data
> (which btw, is exactly the issue the CSV WG is designed to address because
> W3C was rightly critized before CSV of only advocating for its own
> standards)
>
> 2.  It limits the audience who will care about what we write
>
>
>
>
> Best Regards,
>
> Steve
>
> Motto: "Do First, Think, Do it Again"
>
> [image: Inactive hide details for Laufer ---03/27/2015 02:40:15
> PM---Steve, I understand your concerns and, for me, I think that when w]Laufer
> ---03/27/2015 02:40:15 PM---Steve, I understand your concerns and, for me,
> I think that when we say that there
>
>
>
>    From:
>
>
> Laufer <laufer@globo.com>
>
>    To:
>
>
> Steven Adler/Somers/IBM@IBMUS
>
>    Cc:
>
>
> Christophe Guéret <christophe.gueret@dans.knaw.nl>, Bart van Leeuwen <
> bart_van_leeuwen@netage.nl>, Makx Dekkers <mail@makxdekkers.com>, DWBP WG
> <public-dwbp-wg@w3.org>
>
>    Date:
>
>
> 03/27/2015 02:40 PM
>
>    Subject:
>
>
> Re: NY Property Tax Explorer
> ------------------------------
>
>
>
> Steve,
>
> I understand your concerns and, for me, I think that when we say that
> there are some best practices, we are not saying to people to not publish
> if they cannot do the best practices. If they don't have a choice, well, it
> is better to publish in PDF. But it is not a best practice. It is a
> practice better than no practice.
>
> As I was discussing in the thread of 5 stars LOD (as a scale of quality
> that is understood many times as the absolute scale of quality of Data
> Published on The Web), the LOD scale is not the absolute scale of quality
> but it is one of them. But besides this scale, there are other quality axes
> that could be enhanced, even using PDFs, for example, good metadata (about
> licenses, SLAs, versions, update periods, etc.) good data, etc.
>
> So, IMHO, what we can say to someone that publish in PDF, and have no
> other choice, is that the quality of the publication could be enhanced in
> different ways, aggregating good metadata for example, etc. And when the
> PDF could be replaced by another format, so, do it.
>
> Abraços,
> Laufer
>
> 2015-03-27 13:07 GMT-03:00 Steven Adler <*adler1@us.ibm.com*
> <adler1@us.ibm.com>>:
>
>    So, does our BP document only apply to data published in the future in
>    the file types we bless?
>
>
>    Best Regards,
>
>    Steve
>
>    Motto: "Do First, Think, Do it Again"
>
>    [image: Inactive hide details for Christophe Guéret ---03/27/2015
>    11:40:10 AM---Hoi, We are not writing a document that describes how p]Christophe
>    Guéret ---03/27/2015 11:40:10 AM---Hoi, We are not writing a document that
>    describes how people publish and consume
>
>
>
>    From:
>
>
> Christophe Guéret <*christophe.gueret@dans.knaw.nl*
> <christophe.gueret@dans.knaw.nl>>
>
>    To:
>
>
> Makx Dekkers <*mail@makxdekkers.com* <mail@makxdekkers.com>>
>
>    Cc:
>
>
> Steven Adler/Somers/IBM@IBMUS, DWBP WG <*public-dwbp-wg@w3.org*
> <public-dwbp-wg@w3.org>>, Bart van Leeuwen <*bart_van_leeuwen@netage.nl*
> <bart_van_leeuwen@netage.nl>>
>
>    Date:
>
>
> 03/27/2015 11:40 AM
>
>    Subject:
>
>
> RE: NY Property Tax Explorer
>
>    ------------------------------
>
>
>
>    Hoi,
>
>    We are not writing a document that describes how people publish and
>    consume open data, we are writing guidelines on how they can best do it.
>
>    The concept of "best" is obviously subjective but I hope we can at
>    list agree on some points.
>
>    I was recently sitting with people dealing with crisis. They need a
>    lot of data and when asking for it they sometimes get a PDF with a picture
>    of a hand written table in it. According to the publisher this is good open
>    data. Is it really so? The consumers spent a lot of time extracting the
>    data from it...
>
>    Our document could help there by letting the consumers having
>    something to help arguing with the publisher and hopefully get something
>    more usable.
>
>    As for every best practices, there is no guarantee ours will be
>    followed but having somewhere an officially endorsed way of publishing good
>    open data will surely be welcomed by many data publishers and consumers.
>
>    Cheers,
>    Christophe
>
>    --
>    Sent with difficulties. Sorry for the brievety and typos...
>
>    Op 27 mrt. 2015 16:19 schreef "Makx Dekkers" <*mail@makxdekkers.com*
>    <mail@makxdekkers.com>>:
>
>       Apologies for missing the call, again, today.
>
>
>
>       In my mind, we really need to say what we mean with ‘best
>       practice’. Do we really think we can define one best practice implying that
>       all the rest is ‘bad practice’? I don’t think so. What I would like to see
>       is ‘practice related to objectives’ and then try to determine what kinds of
>       behaviour make sense for what kinds of objectives.
>
>
>
>       For example, certain forms of PDF are really good if you want to
>       enable out-loud reading of documents for the blind, but not so good to
>       extract tabular information. If you want to make your tabular data useful
>       for applications, there are better ways to publish the data than PDF.
>
>
>
>
>       As I earlier argued for metadata best practices, I think the most
>       useful kind of advice should be something like: if you want to do A, then
>       if you publish data as X you will have the following advantages and
>       disadvantages, and you should really consider format Y to increase
>       usefulness of your data.
>
>
>
>       Makx.
>
>
>
>
>
>       *De:* Steven Adler [mailto:*adler1@us.ibm.com* <adler1@us.ibm.com>]
> * Enviado el:* 27 March 2015 15:41
> * Para:* Bart van Leeuwen
> * CC:* DWBP WG
> * Asunto:* Re: NY Property Tax Explorer
>
>
>
>       Bart,
>
>       A PDF might not conform to your definition of a best practice, but
>       NYC is publishing tens of thousands of PDF's that describe property taxes,
>       hospitals, crime reports, and housing inspections.
>
>       My point is that if we restrict our recommendations of best
>       practices to only conform to what we define as the best file types, we are
>       deliberately limiting the relevance of our work in the real world.
>
>
>
>
>
>       Best Regards,
>
>       Steve
>
>       Motto: "Do First, Think, Do it Again"
>
>       [image: Inactive hide details for Bart van Leeuwen ---03/27/2015
>       10:35:44 AM---I think we try to assemble a 'best practice' with this w]Bart
>       van Leeuwen ---03/27/2015 10:35:44 AM---I think we try to assemble a 'best
>       practice' with this working group. I sincerely hope you don't con
>
>
>
>       From:
>
>    Bart van Leeuwen <*bart_van_leeuwen@netage.nl*
>    <bart_van_leeuwen@netage.nl>>
>
>       To:
>
>    Steven Adler/Somers/IBM@IBMUS
>
>       Cc:
>
>    "DWBP WG" <*public-dwbp-wg@w3.org* <public-dwbp-wg@w3.org>>
>
>       Date:
>
>    03/27/2015 10:35 AM
>
>       Subject:
>
>    Re: NY Property Tax Explorer
>       ------------------------------
>
>
>
>
>       I think we try to assemble a 'best practice' with this working
>       group.
>       I sincerely hope you don't consider data published in a PDF to
>       conform to this best practice.
>
>       I'm not arguing that it is possible to get usable data from these
>       formats, but they were not intended to carry data in a machine readable way.
>
>
>       Bart
>
>       Steven Adler <*adler1@us.ibm.com* <adler1@us.ibm.com>> wrote on
>       27-03-2015 15:09:32:
>
>       > From: Steven Adler <*adler1@us.ibm.com* <adler1@us.ibm.com>>
>       > To: "DWBP WG" <*public-dwbp-wg@w3.org* <public-dwbp-wg@w3.org>>
>       > Date: 27-03-2015 15:10
>       > Subject: NY Property Tax Explorer
>       >
>       > You may recall I submitted a use case about this example from NYC
>       > last year.  The developer, Chris Wong, who works for Socrata,
>       wrote
>       > a Ruby routine to scrape 1000 PDF files for property tax data to
>       > fill out this map app:
>       >
>       > *http://www.w3.org/2013/dwbp/track/issues/56*
>       <http://www.w3.org/2013/dwbp/track/issues/56>
>       >
>       > Chris is a self-taught developer, by no means a pro.  I think
>       this
>       > story well demonstrates that Data on the Web today is quite
>       > innovative and PDF, JPG, AVI, MP3, and MP4 are commonly machine
>       readable.
>       >
>       > Restricting our recommendations to file formats that conform only
>       > those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the
>       reality
>       > of how Open Data is published and used.
>       >
>       >
>       > Best Regards,
>       >
>       > Steve
>       >
>       > Motto: "Do First, Think, Do it Again"
>
>
>
>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>
>


-- 
.  .  .  .. .  .
.        .   . ..
.     ..       .


graycol.gif
(image/gif attachment: graycol.gif)

ecblank.gif
(image/gif attachment: ecblank.gif)

Received on Friday, 27 March 2015 18:56:01 UTC

This archive was generated by hypermail 2.3.1 : Friday, 27 March 2015 18:56:02 UTC