RE: Use machine-readable standardized data formats / Use non-proprietary data formats

Yes, I am: the original data should always be made available. In addition to more more appropriate formats for further processing such as XML or plain text: just variants of one resource.


From: Annette Greiner []
Sent: 12 August 2015 19:31
Subject: Re: Use machine-readable standardized data formats / Use   non-proprietary data formats

You’re not seriously suggesting people should make data available in word perfect format, are you?
This discussion seems to be wandering into the realm of publishing documents.

Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory

On Aug 12, 2015, at 7:28 AM, wrote:

> One should have at least the following variants of the resource:
> - Original     : foo.wp  - WordPerfect 3.0 ~1982, perhaps still processable
> - Content      : foo.txt - textual, hopefully processable in 100 years
> - Presentation : foo.tif - TIFF ~1986, perhaps still viewable, might be
> So:
>  -     - negotiate and give me the best
>  -  - I can still process WP
>  - - I want to process the text, no presentation
>  - - I really want to see how the doc looks
> Regards
> Tomas
>> Perhaps the way we can formulate this is to say that some document
>> formats (such as PDF, .doc / .docx and even .xls / .xlsx ) are
>> concerned with presentation of information in a particular format or
>> layout and therefore carry a significant amount of typesetting /
>> formatting information overhead in addition to the underlying data.
>> Furthermore, at the time those document-centric formats were
>> developed, ease of access to the underlying data and the unambiguous
>> meaning of specific data fields might not have been the main priority
>> in their design.
>> When the main priority is to ensure that the underlying data is
>> available on the web so that others can re-use it, we recommend using
>> simpler data formats such as CSV, TSV, JSON (or better still JSON-LD),
>> RDF or XML.

Received on Thursday, 13 August 2015 08:53:01 UTC