Re: Use machine-readable standardized data formats / Use non-proprietary data formats from Annette Greiner on 2015-08-12 (public-dwbp-wg@w3.org from August 2015)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Wed, 12 Aug 2015 11:01:28 -0700
To: Phil Archer <phila@w3.org>
Cc: Public DWBP WG <public-dwbp-wg@w3.org>
Message-Id: <BF323A2A-8F72-40B5-BB21-0C4B19712632@lbl.gov>

I agree with the idea of making the two points into one, though I see we also have a proposal to split the first one into two. I think it’s just a question of whether we want to be lumpers or splitters.  Whatevs.

When it comes to specifying which formats to use, I do think that the best practice is to consider the probable context of use. That is fundamental to any intelligent management of data. Of course we should suggest open formats in particular, but the idea of considering how something will/may be used in the future is important, too, and it helps one decide among the possible open formats. Perhaps just rewording would make that clearer. For the test, it seems to me that conforming to a format in use by anticipated users of the data is a minimum ( Something that already doesn’t work for your users certainly isn’t going to be future-proof.), but it should also say something about checking that the format conforms to an open machine-readable standard. That BP has a sentence fragment right now as well, and I don’t think it should mention vocabularies.
-Annette
--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Aug 12, 2015, at 5:00 AM, Phil Archer <phila@w3.org> wrote:

> Looking at issue-138 and the BPs on Use machine-readable standardized data formats and Use non-proprietary data formats - I can't see that they need to be separate.
> 
> We want to say that things like CSV, XML, RDF and JSON are good and that PDF, Excel etc. are bad. It's not that they're not machine readable, they are, but they're just much more difficult to process.
> 
> Splitting up machine readable standardised and non-proprietary suggests we'd need to come up with a proprietary format that's machine readable that's OK in one BP and then in the next say that, oh no, hang on, don't use that, use this non-proprietary one instead.
> 
> And, Microsoft and Adobe have both made their respective formats available as ISO standards so we can't refer to formal standards as a differentiator.
> 
> There's also text in there that I have problems with. The how to test section of BP: Use machine-readable standardized data formats  says: "Check that the data format conforms to a known machine-readable data format specification in current use among anticipated data users."
> 
> I believe the point of sharing data on the Web is that the publisher shouldn't anticipate what someone else will do with the data.
> 
> So... I'd like to propose to merge those two BPs and amend the text to talk about the value of open standards in making data available with no preconceived ideas of what it might be used for.
> 
> WDYT?
> 
> Phil.
> 
> 
> -- 
> 
> 
> Phil Archer
> W3C Data Activity Lead
> http://www.w3.org/2013/data/
> 
> http://philarcher.org
> +44 (0)7887 767755
> @philarcher1
>

Received on Wednesday, 12 August 2015 18:02:18 UTC