W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > March 2015

Re: NY Property Tax Explorer

From: Phil Archer <phila@w3.org>
Date: Sat, 28 Mar 2015 00:17:17 -0000
Message-ID: <755c4cdda044aef1d5229411aa6f40a5.squirrel@webmail.sophia.w3.org>
To: "Laufer" <laufer@globo.com>
Cc: "Annette Greiner" <amgreiner@lbl.gov>, "Steven Adler" <adler1@us.ibm.com>, "DWBP WG" <public-dwbp-wg@w3.org>
+1 to Laufer, Annette etc.

When we're talking metadata, we don't need to concern ourselves with the
format of the thing we're describing (except to say what that format is,
almost certainly using dcterms:format.

For actual data publishing, images from satellites and microscopes make
sense. If the WG wants to include documents in the classic sense of
somthing that can be printed out then we should explain why HTML is better
than ODT or PDF and why PDF is better than .doc which is better than
.docx. But if those docs include tabular data as data or as graphs, then
that data should be published separately alongside the doc. This is what
researchers are increasingly encouraged, and in some cases forced, to do.

Anyone publishing tabular data in a PDF really needs to have a word with
themselves.

Phil




> I agree in not talking about formats of data.
>
> In respect of metadata, we will talk about metadata to humans and metadata
> to machines.
>
> In metadata to machines I think we must talk about formats and
> vocabularies.
>
> If we start to radicalizating the idea that the most important thing is to
> publish, and that this is the real best practice, it would not make sense
> to publish the DWBP document.
>
> Best Regards,
> Laufer
>
> Em sexta-feira, 27 de março de 2015, Annette Greiner <amgreiner@lbl.gov>
> escreveu:
>
>> Steve, I agree that we don't want BPs like the ones about metadata to be
>> taken as applying only to data in certain file formats. I think the
>> choice
>> of file type for the data itself is orthogonal to our recommendations
>> about
>> metadata. For that reason, I would avoid mention of file types in those
>> BPs
>> entirely.
>>
>> On Mar 27, 2015, at 11:50 AM, Steven Adler <adler1@us.ibm.com
>> <javascript:;>> wrote:
>>
>> > I mean that a best practice applies even when you are doing things
>> that
>> are less than perfect.  For example:
>> >
>> > We recommend that published Open Data uses DCAT+ metadata.  This
>> should
>> apply to JSON, RDF, CSV and PDF, JPEG, AVI, or even to "ancient"
>> Wordperfect documents from the 1980's.
>> >
>> > I would not want us to say that our best practices only apply to W3C
>> blessed file types, because:
>> >
>> > 1.  It ignores the reality of the way the rest of the world publishes
>> data (which btw, is exactly the issue the CSV WG is designed to address
>> because W3C was rightly critized before CSV of only advocating for its
>> own
>> standards)
>> >
>> > 2.  It limits the audience who will care about what we write
>> >
>> >
>> >
>> >
>> > Best Regards,
>> >
>> > Steve
>> >
>> > Motto: "Do First, Think, Do it Again"
>> >
>> > <graycol.gif>Laufer ---03/27/2015 02:40:15 PM---Steve, I understand
>> your
>> concerns and, for me, I think that when we say that there
>> >
>> > <ecblank.gif>
>> > From:
>> > <ecblank.gif>
>> > Laufer <laufer@globo.com <javascript:;>>
>> > <ecblank.gif>
>> > To:
>> > <ecblank.gif>
>> > Steven Adler/Somers/IBM@IBMUS
>> > <ecblank.gif>
>> > Cc:
>> > <ecblank.gif>
>> > Christophe Guéret <christophe.gueret@dans.knaw.nl <javascript:;>>,
>> Bart
>> van Leeuwen <bart_van_leeuwen@netage.nl <javascript:;>>, Makx Dekkers <
>> mail@makxdekkers.com <javascript:;>>, DWBP WG <public-dwbp-wg@w3.org
>> <javascript:;>>
>> > <ecblank.gif>
>> > Date:
>> > <ecblank.gif>
>> > 03/27/2015 02:40 PM
>> > <ecblank.gif>
>> > Subject:
>> > <ecblank.gif>
>> > Re: NY Property Tax Explorer
>> >
>> >
>> >
>> > Steve,
>> >
>> > I understand your concerns and, for me, I think that when we say that
>> there are some best practices, we are not saying to people to not
>> publish
>> if they cannot do the best practices. If they don't have a choice, well,
>> it
>> is better to publish in PDF. But it is not a best practice. It is a
>> practice better than no practice.
>> >
>> > As I was discussing in the thread of 5 stars LOD (as a scale of
>> quality
>> that is understood many times as the absolute scale of quality of Data
>> Published on The Web), the LOD scale is not the absolute scale of
>> quality
>> but it is one of them. But besides this scale, there are other quality
>> axes
>> that could be enhanced, even using PDFs, for example, good metadata
>> (about
>> licenses, SLAs, versions, update periods, etc.) good data, etc.
>> >
>> > So, IMHO, what we can say to someone that publish in PDF, and have no
>> other choice, is that the quality of the publication could be enhanced
>> in
>> different ways, aggregating good metadata for example, etc. And when the
>> PDF could be replaced by another format, so, do it.
>> >
>> > Abraços,
>> > Laufer
>> >
>> > 2015-03-27 13:07 GMT-03:00 Steven Adler <adler1@us.ibm.com
>> <javascript:;>>:
>> > So, does our BP document only apply to data published in the future in
>> the file types we bless?
>> >
>> >
>> > Best Regards,
>> >
>> > Steve
>> >
>> > Motto: "Do First, Think, Do it Again"
>> >
>> > Christophe Guéret ---03/27/2015 11:40:10 AM---Hoi, We are not writing
>> a
>> document that describes how people publish and consume
>> >
>> >
>> > From:
>> >
>> > Christophe Guéret <christophe.gueret@dans.knaw.nl <javascript:;>>
>> >
>> > To:
>> >
>> > Makx Dekkers <mail@makxdekkers.com <javascript:;>>
>> >
>> > Cc:
>> >
>> > Steven Adler/Somers/IBM@IBMUS, DWBP WG <public-dwbp-wg@w3.org
>> <javascript:;>>, Bart van Leeuwen <bart_van_leeuwen@netage.nl
>> <javascript:;>>
>> >
>> > Date:
>> >
>> > 03/27/2015 11:40 AM
>> >
>> > Subject:
>> >
>> > RE: NY Property Tax Explorer
>> >
>> >
>> >
>> > Hoi,
>> > We are not writing a document that describes how people publish and
>> consume open data, we are writing guidelines on how they can best do it.
>> >
>> > The concept of "best" is obviously subjective but I hope we can at
>> list
>> agree on some points.
>> >
>> > I was recently sitting with people dealing with crisis. They need a
>> lot
>> of data and when asking for it they sometimes get a PDF with a picture
>> of a
>> hand written table in it. According to the publisher this is good open
>> data. Is it really so? The consumers spent a lot of time extracting the
>> data from it...
>> >
>> > Our document could help there by letting the consumers having
>> something
>> to help arguing with the publisher and hopefully get something more
>> usable.
>> >
>> > As for every best practices, there is no guarantee ours will be
>> followed
>> but having somewhere an officially endorsed way of publishing good open
>> data will surely be welcomed by many data publishers and consumers.
>> >
>> > Cheers,
>> > Christophe
>> >
>> > --
>> > Sent with difficulties. Sorry for the brievety and typos...
>> >
>> > Op 27 mrt. 2015 16:19 schreef "Makx Dekkers" <mail@makxdekkers.com
>> <javascript:;>>:
>> >
>> >
>> > Apologies for missing the call, again, today.
>> >
>> > In my mind, we really need to say what we mean with €˜best
>> practice€™. Do
>> we really think we can define one best practice implying that all the
>> rest
>> is €˜bad practice€™? I don€™t think so. What I would like to see is
>> €˜practice
>> related to objectives€™ and then try to determine what kinds of
>> behaviour
>> make sense for what kinds of objectives.
>> >
>> >
>> > For example, certain forms of PDF are really good if you want to
>> enable
>> out-loud reading of documents for the blind, but not so good to extract
>> tabular information. If you want to make your tabular data useful for
>> applications, there are better ways to publish the data than PDF.
>> >
>> >
>> > As I earlier argued for metadata best practices, I think the most
>> useful
>> kind of advice should be something like: if you want to do A, then if
>> you
>> publish data as X you will have the following advantages and
>> disadvantages,
>> and you should really consider format Y to increase usefulness of your
>> data.
>> >
>> >
>> > Makx.
>> >
>> >
>> >
>> >
>> > De: Steven Adler [mailto:adler1@us.ibm.com <javascript:;>]
>> > Enviado el: 27 March 2015 15:41
>> > Para: Bart van Leeuwen
>> > CC: DWBP WG
>> > Asunto: Re: NY Property Tax Explorer
>> >
>> >
>> > Bart,
>> >
>> > A PDF might not conform to your definition of a best practice, but NYC
>> is publishing tens of thousands of PDF's that describe property taxes,
>> hospitals, crime reports, and housing inspections.
>> >
>> > My point is that if we restrict our recommendations of best practices
>> to
>> only conform to what we define as the best file types, we are
>> deliberately
>> limiting the relevance of our work in the real world.
>> >
>> >
>> >
>> >
>> >
>> > Best Regards,
>> >
>> > Steve
>> >
>> > Motto: "Do First, Think, Do it Again"
>> >
>> > Bart van Leeuwen ---03/27/2015 10:35:44 AM---I think we try to
>> assemble
>> a 'best practice' with this working group. I sincerely hope you don't
>> con
>> >
>> >
>> >
>> >
>> > From:
>> >
>> > Bart van Leeuwen <bart_van_leeuwen@netage.nl <javascript:;>>
>> >
>> >
>> > To:
>> >
>> > Steven Adler/Somers/IBM@IBMUS
>> >
>> >
>> > Cc:
>> >
>> > "DWBP WG" <public-dwbp-wg@w3.org <javascript:;>>
>> >
>> >
>> > Date:
>> >
>> > 03/27/2015 10:35 AM
>> >
>> >
>> > Subject:
>> >
>> > Re: NY Property Tax Explorer
>> >
>> >
>> >
>> > I think we try to assemble a 'best practice' with this working group.
>> > I sincerely hope you don't consider data published in a PDF to conform
>> to this best practice.
>> >
>> > I'm not arguing that it is possible to get usable data from these
>> formats, but they were not intended to carry data in a machine readable
>> way.
>> >
>> > Bart
>> >
>> > Steven Adler <adler1@us.ibm.com <javascript:;>> wrote on 27-03-2015
>> 15:09:32:
>> >
>> > > From: Steven Adler <adler1@us.ibm.com <javascript:;>>
>> > > To: "DWBP WG" <public-dwbp-wg@w3.org <javascript:;>>
>> > > Date: 27-03-2015 15:10
>> > > Subject: NY Property Tax Explorer
>> > >
>> > > You may recall I submitted a use case about this example from NYC
>> > > last year.  The developer, Chris Wong, who works for Socrata, wrote
>> > > a Ruby routine to scrape 1000 PDF files for property tax data to
>> > > fill out this map app:
>> > >
>> > > http://www.w3.org/2013/dwbp/track/issues/56
>> > >
>> > > Chris is a self-taught developer, by no means a pro.  I think this
>> > > story well demonstrates that Data on the Web today is quite
>> > > innovative and PDF, JPG, AVI, MP3, and MP4 are commonly machine
>> readable.
>> > >
>> > > Restricting our recommendations to file formats that conform only
>> > > those covered by W3C WG's (JSON, CSV, RDF, etc) ignores the reality
>> > > of how Open Data is published and used.
>> > >
>> > >
>> > > Best Regards,
>> > >
>> > > Steve
>> > >
>> > > Motto: "Do First, Think, Do it Again"
>> >
>> >
>> >
>> >
>> > --
>> > .  .  .  .. .  .
>> > .        .   . ..
>> > .     ..       .
>> >
>>
>>
>>
>
> --
> .  .  .  .. .  .
> .        .   . ..
> .     ..       .
>


-- 

Sent from my phone. Please excuse typos.
Received on Saturday, 28 March 2015 00:17:34 UTC

This archive was generated by hypermail 2.3.1 : Saturday, 28 March 2015 00:17:35 UTC