Re: Updated version of the BP document

Minor comments from me to:

The BP template subheading says:
"Short description of the BP, including the relevant RFC2119 keyword(s) 
from the Intended Outcome"

I think it should just be

"Short description of the BP, including the relevant RFC2119 keyword(s)"

(and I know we may well drop MUST and SHOULD from the BPs altogether in 
future which is Issue-146 but I think we can close ISSUE-135 now).

If we keep RFC2119 keywords then IMO the first two BPs (Document 
metadata  and Use machine-readable formats to provide metadata) should 
be MUST, not SHOULD. This would improve consistency with,for example BP: 
Provide discovery metadata (currently no 4) which says MUST.

Consider making the first one explicitly about human readable metadata 
to complement the second which is about machine readability.

BP: Provide data license information (currently 6) ends by saying:

How to Test

Check for the presence of one or more of:

the presence of an RDF predicate;
an HTML Link element;
an HTTP Link header;

that links the dataset, or the description of the dataset, to a license 
document.

In the next iteration I think this will need more work to look at 
linking to rights statements etc. but for now I think that last like 
should be:

"that links the dataset to a license and/or rights information."

Getting the subject of the triple is important, otherwise you end up 
saying that people have a right to sue the metadata and say nothing 
about being able to use the data itself!

Those are the main issues for me and I know I may have missed some 
things that I'll wish I hadn't. There will be more detailed discussion 
as we move forward but for now, I'm done


Phil.


On 12/02/2015 22:34, Annette Greiner wrote:
> I have a few notes about the versioning and formats sections.
>
> The first data versioning BP links to a brief schema.org discussion about how a schema will be versioned. I think this is intended as evidence that versioning is the subject of debate, but it doesn’t seem to me a very relevant example. It’s not about versioning data, and it’s not much of a debate. Even if we feel that versioning data is somehow highly debatable, I don’t think it helps the reader to be told that, at least not in a vague way that suggests versioning may not be a good idea. I suggest we remove that sentence. There is also a reference in that same section to the Vocabulary Versioning BP for more on assigning stable URIs, but that BP doesn’t say anything about that topic that isn’t already said in the data versioning BP, so I would suggest removing that reference as well.
>
> In the version history BP, we say "It should be possible for data consumers to understand how the data typically changes from version to version.” I would like to add “and how any two specific versions differ.”
>
> In the introduction to Data Access, we say "For all data on the Web, APIs should be available…” I definitely want to encourage use of APIs, but I don’t think we can say that all data should be made available that way. Many datasets are too small or of interest to too few people to make setting up an API worthwhile.
>
> The first Data Formats BP still has old text in the Intended Outcome that should have been removed. “A machine must be able to :” and the three numbered items below it should be deleted.
>
> For the BP about providing data in multiple formats, I’d like to add the word “consumer” in the Why section, so that it reads "Providing data in more than one format reduces consumer costs incurred in data transformation."
>
> The introduction to the Data Formats section doesn’t match the BPs in that section very well anymore. I think what is there now is just leftover from placeholder text. How about if we replace it with something like this?
>
> "The formats in which data is made available to consumers are a key aspect of making that data usable. The best, most flexible access mechanism in the world is pointless unless it serves data in formats that enable use and reuse. Below we detail best practices in selecting formats for your data, both at the level of files and that of individual fields. W3C encourages use of formats that can be used by the widest possible audience and processed most readily by computing systems. Source formats, such as database dumps or spreadsheets, used to generate the final published format, are out of scope. This document is concerned with what is actually published rather than internal systems used to generate the published data.”
>
> -Annette
>
> --
> Annette Greiner
> NERSC Data and Analytics Services
> Lawrence Berkeley National Laboratory
> 510-495-2935
>
> On Feb 12, 2015, at 9:45 AM, Bernadette Farias Lóscio <bfl@cin.ufpe.br> wrote:
>
>> Hi all,
>>
>> In the last weeks we've been working on the BP document and we have an updated version available in [1].
>>
>> We made a lot of changes in the metadata section trying  to solve some of the problems identified during the reviewing process: some sections (Data Provenance, Data Quality, Data License and Data Versioning) were merged with the metadata section and  some metadata best practices were removed.
>>
>> We also removed the Data lifecycle section and made changes on the Provide Unique Identifiers BP, but more improvements are needed. Other minor changes were made throughout the whole document.
>>
>> Looking forward to have your feedback!
>>
>> cheers,
>> Bernadette, Caroline e Newton
>>
>> [1] https://github.com/bernafarias/dwbp/blob/gh-pages/bp.html
>>
>> --
>> Bernadette Farias Lóscio
>> Centro de Informática
>> Universidade Federal de Pernambuco - UFPE, Brazil
>> ----------------------------------------------------------------------------
>
>

-- 


Phil Archer
W3C Data Activity Lead
http://www.w3.org/2013/data/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Friday, 13 February 2015 11:43:13 UTC