Re: Use machine-readable standardized data formats / Use non-proprietary data formats

Your own doc says "Documents should use standardized data metamodels such as CSV, XML, RDF, or JSONĒ. Why would you not want the DWBP doc to do the same?
-AG
--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Aug 12, 2015, at 11:11 AM, Erik Wilde <dret@berkeley.edu> wrote:

> hello annette.
> 
> On 2015-08-12 11:01, Annette Greiner wrote:
>> When it comes to specifying which formats to use, I do think that the best practice is to consider the probable context of use. That is fundamental to any intelligent management of data. Of course we should suggest open formats in particular, but the idea of considering how something will/may be used in the future is important, too, and it helps one decide among the possible open formats. Perhaps just rewording would make that clearer. For the test, it seems to me that conforming to a format in use by anticipated users of the data is a minimum ( Something that already doesnít work for your users certainly isnít going to be future-proof.), but it should also say something about checking that the format conforms to an open machine-readable standard. That BP has a sentence fragment right now as well, and I donít think it should mention vocabularies.
> 
> i would be in favor of not recommending any specific models or metamodels. that's up for the domain specialists to decide, and depends on what their goals and constraints are.
> 
> what matters is:
> 
> - the format should be easily parseable, and thus it is better to reuse some metamodel than to invent your own or simply invent a proprietary format that needs proprietary parsing.
> 
> - after parsing, the model should be documented so that it is well-defined what the parsed data model means and most importantly, how it has to be processed (what to do with unknown parts, for example: safely ignore or stop processing?). therefore a clearly defined processing model is essential at this level.
> 
> - the model should be hypermedia, so that clients can follow meaningful links for accomplishing application goals. those links may simply link to related data, or they may the RESTful and link to web services and not just web data. again, that's for the domain to decide what kind of hypermedia semantics they need.
> 
> because these issues are essential, https://github.com/dret/webdata has these three points as its very core, as 2, 3, and 4 stars. these issues are at the core of being webby data, and not just some data that happens to be accessible via HTTP (i.e., web data has to be "of the web" and not just "on the web").
> 
> cheers,
> 
> dret.
> 
> -- 
> erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-2061079 |
>           | UC Berkeley  -  School of Information (ISchool) |
>           | http://dret.net/netdret http://twitter.com/dret |

Received on Wednesday, 12 August 2015 18:17:42 UTC