Re: Use machine-readable standardized data formats / Use non-proprietary data formats

The original question was about what formats to suggest. I objected to suggesting that people use word perfect on the basis that we are not giving guidance in how to post word processed documents. My concern is that people with tabular data will think that is a good way to publish their data.
-Annette
--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Aug 14, 2015, at 6:48 PM, Erik Wilde <dret@berkeley.edu> wrote:

> hello all.
> 
> On 2015-08-14 20:50, Laufer wrote:
>> If we have BPs that orient publishers to provide metadata about
>> structure, license, etc., to provide version information, and a lot of
>> other BPs, why we have to explain what data is ruled out or ruled in?
>> Why we have to forbid some publishers of following our BPs?
> 
> thanks for this, laufer, that was exactly what i was thinking when reading annette's email. what's the problem with legislation documents, if all the BP talks about is how to represent them well in a webby way as part of a legislative dataset? all the BP should talk about are the webby parts, so it can safely stay away from any issues that pertain to a specific aspect of the data that's not specifically about being webby.
> 
> starting from https://github.com/dret/webdata, let's see how you could talk about webby legislative documents:
> 
> 1: Linkable
> 
> publish all your documents at stable URIs, so that they can be referenced. at the very least, give them unique and stable URIs, if you don't want to make them directly accessible.
> 
> for legislation, fragments (any news about this from the group, btw?) would be very essential, so that references can not just refer to documents, but all relevant parts of it.
> 
> but again, reference culture in legislation is complex and hard, but they should think about the things they want to reference (as resources and sub-resources), and make sure all of those get stable identifiers. the BP would simply tell them *to do it*, not *how to do it* for their particular scenario.
> 
> 2: Parseable
> 
> probably use XML which is a good foundation for document-ish content. if you better like SGML or whatever floats your boat, that's fine, too.
> 
> 3: Understandable
> 
> use or define a documented format for your legal documents. use whatever schema language makes you happy (DTD, XSD, RNG, ...), but define and document the schema so that people accessing your data know what the XML represents.
> 
> 4: Linked
> 
> when cross-referencing legislation (such as a law from a ruling), use the URI of the referenced resource so that references are established at the web level.
> 
> 5: Usable
> 
> label your document with a license, so that others know how they can use it. there are many licenses to choose from, and picking any one of those is better than not picking one at all.
> 
> so, what's the difficulty in making these recommendations, and maybe adding more that's in the BP but not (currently) in web data? BP wouldn't have to go out on a limb and try to explain how to design an XML schema, that's a different issue. so why exclude this scenario?
> 
> cheers,
> 
> dret.
> 
> -- 
> erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-2061079 |
>           | UC Berkeley  -  School of Information (ISchool) |
>           | http://dret.net/netdret http://twitter.com/dret |
> 

Received on Saturday, 15 August 2015 02:15:34 UTC